Big data, big bandwidth: this week on Google Cloud Platform
GCP Blog Editor, Google Cloud Platform Blog
If you’re into big data or big bandwidth, or both, Google Cloud Platform has you covered.
For instance, there’s a new 3+ terabyte GitHub Archive hosted on BigQuery, our cloud data warehouse and analytics service. According to the documentation, the archive “contains a full snapshot of the content of more than 2.8 million open source GitHub repositories, including more than 145 million unique commits, over 2 billion different file paths and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions.”
How and what can you learn from this dataset? Google Developer Advocate Francesc Campoy, for one, asks how many Go files are in the GitHub Archive. Six seconds and two billion rows later, BigQuery had an answer: 12,624,178. In the same vein, GitHub asks who has the most commits among .edu contributors. As of a couple of days ago, berkeley.edu led the pack with 816. (Perhaps we can interest Berkeley computer science students with some free GCP credits?)
But the really big news was the undersea FASTER Cable System, which Google and consortium members turned on this week. Google invested $300 million to lay this 9,000km fibre optic cable, which, when fully lit, will carry 10 terabits of data from Oregon and Japan. To put that into context, that’s about 10 million times faster than the average cable modem. Further, FASTER puts the total number of operational, Google-owned undersea cables up to four — four more than any other technology company can lay claim to.