This dataset contains all stories and comments from Hacker News from its launch in 2006. Each story contains a story ID, the author that made the post, when it was written, and the number of points the story received.
You can start exploring this data in the BigQuery console:
Exploring the data using Looker
Looker provides tools to quickly analyze and find insights in your BigQuery datasets. Below are some example visualizations of Hacker News Data in BigQuery, built using Looker.
How are Hacker News story points distributed?
If you use the score as a dimension (group by score, in SQL) and count the number of posts with each score, you can get an idea about how likely a story is to get a given score.
#legacySQL SELECT stories.score AS stories_score, COUNT(stories.id) AS stories_count FROM [bigquery-public-data:hacker_news.stories] AS stories WHERE NOT (stories.score IS NULL) GROUP EACH BY 1 ORDER BY 1 LIMIT 500
Where do the stories live?
By parsing out the host from the URL you can see where Hacker News stories originate.
#legacySQL SELECT REGEXP_EXTRACT((REGEXP_EXTRACT(stories.url,'http://([^/]+)/')),'([^\\.]+\\.[^\\.]+(?:\\.[a-zA-Z].)?)$') AS stories_url_domain, COUNT(stories.id) AS stories_count, COUNT(CASE WHEN stories.score >= 7 THEN 1 ELSE NULL END) AS storiescountscore_7_plus, 100.0 * (COUNT(CASE WHEN stories.score >= 7 THEN 1 ELSE NULL END)) / (COUNT(stories.id)) AS stories_percent_7_plus FROM [bigquery-public-data:hacker_news.stories] AS stories GROUP EACH BY 1 HAVING (COUNT(CASE WHEN stories.score >= 7 THEN 1 ELSE NULL END) >= 6) ORDER BY 3 DESC LIMIT 500
Who is the star of Hacker News?
Making the front page of Hacker News is not easy. Let's see if we can figure out which authors consistently get on the front page. Assuming a score of 7 is a good threshold for popularity, we’ll sort on count of stories that scored over 7 to find the person with the most posts that scored well.
#legacySQL SELECT * FROM ( SELECT *, MIN(z___rank) OVER (PARTITION BY stories_author) AS z___min_rank FROM ( SELECT *, RANK() OVER (PARTITION BY stories_score_7_plus ORDER BY stories_count DESC) AS z___rank FROM ( SELECT stories.author AS stories_author, CASE WHEN stories.score >= 7 THEN 'Yes' ELSE 'No' END AS stories_score_7_plus, COUNT(stories.id) AS stories_count FROM [bigquery-public-data:hacker_news.stories] AS stories GROUP EACH BY 1, 2) ww ) aa ) xx WHERE z___min_rank <= 500 LIMIT 30000
About the data
Dataset Source: Hacker News
Category: Media, Social
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — https://github.com/HackerNews/API — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Update Frequency: Daily
View in BigQuery: Go to the Hacker News dataset