Leveling up your data analysis skills as a student
Kelci Mensah
Cloud Architect, Google
Leveling up your data analysis skills as a student
If you're a college student like me and are gearing up to enter the “big kids” job market, as I like to call it, then you've probably been wondering (or worrying) about how to get ahead of the curve and stand out amongst your peers.
When I think about which high-value fields to target for improving my technical skills and qualifications, the one area I keep coming back to is data analysis.
Why learn data analysis?
Data in all forms is becoming increasingly valuable in our technology-driven society, and that is because of the insights that it brings! The amount of data being generated around us is growing exponentially in all fields. This is great news for students, because now you can benefit from learning data analysis to complement your existing skills, whether you’re majoring in computer science, marketing, or even music! Having the skills necessary to manipulate, process, analyze, and display data in a meaningful way will push you ahead regardless of your background.
What I consider when learning a new skill
Learning new tech skills and tools on top of coursework, jobs, and internships can seem daunting. Trust me, I know your pain. That’s why it's so important for us as students to be strategic and efficient in how we discern the best resources to use for learning.
Whenever I want to learn a new software or skill, there are a few factors I take into account that I’m sure are important to you too:
- How much is this going to cost me?
- How much time is this going to take?
- How applicable is this to my job prospects?
Cost
I’m not even going to pretend this isn't one of the first things I think about. Knowing how to allocate your financial resources is a vital skill – especially when it comes to career-oriented self-improvement.
Time
And time? Well that is a cost as well. Time is valuable for us students, sometimes even more valuable than money itself. We’re juggling coursework and studying, commute time, extracurriculars, career development, and sometimes even a job to help pay for it all. We’re looking for skills that are relatively straightforward to learn and that we can learn on our own time, on a flexible, self-paced schedule.
Applicability
Lastly, I want to be able to learn a skill or tool that is applicable to my job search, something that I can list directly on my resume that will be attractive to the type of companies that I will be applying to. After all, furthering your career is one of the main drivers for this kind of self-study. That is why I always look for opportunities to learn directly using industry-standard software and services.
Learning data analysis with Google Cloud
During my internship here at Google, I've been given ample opportunities to build my data analysis skills using Google Cloud services. In this blog post, I focus specifically on two of those services: BigQuery and Data Studio.
What is BigQuery?
BigQuery is a cloud data warehouse that companies use for running analytics on large datasets. It also happens to be a great place for learning and practicing SQL (the language for analyzing data). The “getting-started” experience with BigQuery is smooth and saves students tons of time. Instead of downloading and installing database software, sourcing data, and loading it into tables, you can login to the BigQuery sandbox and immediately start writing SQL queries (or copying sample ones) to analyze data provided as part of the Google Cloud public datasets program (which you will see for yourself soon!).
What is Data Studio?
Data Studio is an online business intelligence tool (integrated with BigQuery) for visualizing data in customizable and informative tables, dashboards, and reports! You can use it to visualize the results of your SQL queries; it’s also great for analyzing data without SQL, and for sharing insights with non-technical users.
Because Data Studio is already part of Google Cloud, there’s no need to export queried, processed data to an external tool. Data visualization can be completed through direct connections to the BigQuery environment, which saves you lots of time and headaches from having to worry about things like data file compatibility, size, and so on.
You can go from the BigQuery Console to visualizing your query results in Data Studio in one click.
Both BigQuery and Data Studio can be used for little to no cost, within the free tier of Google Cloud. This tier allocates users a starting amount of data storage (if you want to upload your own) and allows a certain amount of bytes processed for your queries each month. You can even create a BigQuery “sandbox” environment that stays within this free tier and doesn't require any credit card to set up (I’ll give you instructions on how to set one up later ?).
So, you can get started quickly with BigQuery and Data Studio for free; let’s talk about applicability. Both BigQuery and Data Studio are used across many industries in production workloads today. Just search BigQuery or Data Studio on LinkedIn, and you'll see what I mean!
Getting started with BigQuery and Data Studio
Now let's get to the action. I want to show you just how simple it is to get started with both of these tools, so here's a quick tutorial on using BigQuery and Data Studio with a real public dataset!
Let’s dive into an example scenario that BigQuery can help solve:
Congrats! You’re a new intern who recently got hired by Pistach.io. Pistach.io is adamant that for the first couple of weeks, new hires come into the office for training programs. So, you must make sure that you show up on time. Pistach.io is in New York City, and the office does not have accessible parking nearby. You know that New York City has reimplemented its public bike program so you’ve decided to use bike sharing to get to work.
Because you must be at work on time, you need answers to a few key questions:
- Which nearby stations have bikes you can use in the morning?
- Where is the drop-off location that is closest to the office?
- What are the busiest stations that you should avoid?
It would be great to answer these questions using a public dataset! Luckily for you, BigQuery has tons of datasets available for you to use for no cost. The data that you’ll be analyzing for this example is in the New York Citi Bike public dataset.
Getting set up
First, create a BigQuery sandbox, which is essentially an environment for you to work in. Follow these steps to set one up: https://cloud.google.com/bigquery/docs/sandbox.
In the Google Cloud console, go to the BigQuery page (documentation).
In the Explorer pane, click +Add Data > Pin a project > Enter project name.
Type “bigquery-public-data” and click Pin. This project contains all the datasets available in the public datasets program.
To see underlying datasets, expand the bigquery-public-data project in the Explorer pane and scroll to find “new_york_citibike”.
Click to highlight the dataset or expand to see the citibike_stations and citibike_trips tables. You can then highlight the tables themselves to see more details like the schema and a preview of the data.
Now time to query
Ok, on to the analysis! Let’s figure out which stations are closest to home. For this tutorial, you will be using the Port Authority Bus Terminal in NYC as your "home".
This query calculates the distance between each Citi Bike station and your "home", and then returns the result with the closest station listed first. The ST_DISTANCE function calculates the shortest distance between two points. So more like a bird flying than taking a bike to work, but it'll work for this use case!
Next, let’s find the stations closest to the office. Let’s use the coordinates for the Google NYC Chelsea Market office since that is where I worked this summer. You can use essentially the same query as the last:
Finally, let’s identify the most popular Citi Bike stations around the office so that we can avoid them!
This query uses a subquery to calculate the number of overall trips for each station, and then joins it with a list of each station and its distance to the office to list the closest first. Looks like I'll be avoiding the 8th and Greenwich station!
Visualize the results
One of the great things about BigQuery is that you can visualize your results easily with Data Studio (just press the Explore Data button in the query results page!). This will give you a better idea of what exactly you queried.
If you want to try out Data Studio for yourself, I recommend following this tutorial. (It’s also about bike share trips, but this time in Austin, Texas!)
Next steps
It's really that simple! Google Cloud is easy to learn and use, so you spend less time “getting started” and more time analyzing data and designing visualizations. You can see the potential in using something like this in your personal and professional tech development, and there are so many ways to boost your skills and early career in data science with Google Cloud tools such as BigQuery.
You can also supplement what you’ve learned in this post by completing the From Data to Insights with Google Cloud specialization on Coursera.
That's all I have to share for now. If you found this blog post helpful, be sure to share! You can find more helpful content on the Google Cloud Platform blog.