Jump to Content
Startups

Data considerations for early-stage startups

January 21, 2022
Lak Lakshmanan

Director, Analytics & AI Solutions

As lead for analytics and AI solutions at Google Cloud, my team works with startups building on Google Cloud. This puts us in the fortunate position to learn from founders and engineers about how early-stage startups’ investments can either constrain them or position them for success, even at the seed level. In this post, I want to share a few of the best practices to keep in mind as you're building. 

Understand your value proposition before diving into a technology stack

If you're launching a startup in the cloud, you're no doubt thinking about a technology stack, but it’s important to step back a bit and think carefully about the major value proposition that your startup offers to your customers. That value proposition is going to fundamentally drive the kind of technology that you should pick.

For example, does your system need processing in real time, or can it be done in a batch mode? Can you rely on once-a-day insights or do the insights have to come in as events happen?

Additionally, what kind of latency will your customers face? That latency makes your value proposition either usable or unusable. Early on in Google's development, leaders realized that no one was going to wait more than a few hundred milliseconds for a web page to show them their results, and that realization drove the technology decisions that have allowed Google to scale from being a startup in a garage to being a trillion dollar company. Your startup needs to define its value to customers with this level of specificity before it can build a technology stack suited to its needs. 

Focus on customer interactions

A few companies have gracefully pulled off big IT pivots that reshaped their value proposition. Netflix, for example, moved from mostly sending DVDs through the mail to becoming a streaming service and major content producer. That’s a huge shift in the user experience and the technology stack necessary to support it, even if the underlying value proposition (i.e., get content to customers) was broadly the same. But it’s also an outlier. If you’re planning for potential changes of this magnitude, rather than focused on getting your value proposition to users, you probably need to sharpen what that value proposition is.

Specifically, you need a clear vision of how customers will access and interact with your business. Typically, they'll do so over a website or a mobile app, but there are still so many variables. 

Are customers going to transmit documents? If so, in what format? Is handwriting supported or is input limited to typing? Can they use images for optical character recognition? Will it mostly be forms? Will the data be structured or unstructured? If all that sounds  a little overwhelming, don’t worry, it’ll seem simpler by the end of this article—but also be aware: we’re just getting warmed up.

Imagine that most of your customers will access your business via voice, so you know you’ll want to prioritize conversational workflows. That’s a start—but dig deeper.  Even if we suppose you’re usingDialogflow, a Google Cloud conversational AI platform that lets you build and deploy virtual agents, we’re still not really seeing the value proposition.  How will all this work, from the beginning of a typical full customer interaction to the resolution? How many interactions will have to be facilitated over low-bandwidth connections, for example? When it comes to user interactions, make sure you can see an end-to-end use case.

Another example: you're building a retail website, and one of your end-to-end use cases involves the customer asking if a certain amount of a given product is in stock, whether it’s one unit of the product, ten or hundreds. If the product is not sufficiently stocked, you want your app to offer similar items that are. Will your technology stack support this end-to-end use case?

These considerations are not an argument for premature optimization. There’s value in moving fast, getting minimum viable products to users, and then iterating. But in the early stages, you only get one chance to start on the right foot—and how you navigate that chance will influence a lot of dollars and effort down the road. You need to make sure you have business use cases, not just an idea, before you can start designing a technology stack.  

Here’s how to get in the right frame of mind. Pick three use cases: two that are “bread and butter” and one that is technologically complex.  Make sure your proposed technology stack can support all three, end to end. 

Default toward higher levels of abstraction

Now that we’re in the right frame of mind, we’re ready to think about the technology stack more directly. 

As a startup, you’ll need to conserve resources, and to do that, you’ll want to build at the highest level of abstraction possible for your value proposition. For example, you probably don't want your people setting up clusters. You don't want them configuring things if they can use a fully managed service. You want them focused on building your prototype, not managing infrastructure.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Canonical_Data_Stack_on_Google_Cloud.max-1000x1000.jpg
Canonical Data Stack on Google Cloud

This focus has definitely informed how we create products at Google Cloud, as our canonical data stack—Pub/Sub, Dataflow, BigQuery, and Vertex AI—consists of auto-scaling and serverless products.

But management of infrastructure is not the only place where you should err toward a less-is-more philosophy. 

When it comes to architecture, choose no-code over low-code and low-code over writing custom code. For example, rather than writing ETL pipelines to transform the data you need before you land it into BigQuery, you could use pre-built connectors to directly land the raw data into BigQuery. That’s no code right there. Then, transform the data into the form you need using SQL views directly in the data warehouse. This is called ELT, and it is low code. You will be a lot more agile if you choose an ELT approach over an ETL approach. 

Another place is when you choose your ML modeling framework. Don’t start with custom TensorFlow models. Start with AutoML. That’s no-code. You can invoke AutoML directly from BigQuery, avoiding the need to build complex data and ML pipelines. If necessary, move on to pre-built models from TensorFlow Hub, HuggingFace, etc. That’s low-code. Build your own custom ML models only as a last resort.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_No-code_low-code_Data_Stack_on_Google_Cl.max-1300x1300.jpg
No-code, low-code Data Stack on Google Cloud

Focus on getting your vision to market, not chasing technology hype  

The goal is to pick the right technology stack for bringing your vision to market, generating value for customers, conserving resources, and maintaining flexibility for growth. Early IT investments should usually gravitate toward things that preserve flexibility, such as managed services built on standard protocols or open APIs, but they needn’t always rush to the flashiest technologies.  The answer isn’t always ML, for example. The answer might be heuristics to start, with a path to ML once you have collected enough data. You want to make sure that your intelligence layer has enough abstraction so you can mark it up with simple rules at first, but then replace it with a more robust system as you go along. 

Launch and iterate fast with these principles 

The preceding discussion is a reminder that your most expensive resource is your people—and that you really want them to be focused on building your prototype, minimum viable product or production app  You want to launch fast and iterate fast, and the only way you can do that is by focusing on the things that differentiate you. 

But regardless of the technologies you use, the bottom line is the same: follow these four principles. 

  • Figure out your major value proposition and design your tech stack around it. 

  • Be very careful about user interactions. User experience is super important; you need to make sure you deliver the kind of experience that your customers have grown to expect.

  • When you’re building, pick the highest possible level of abstraction possible—the most fully managed tools and no-code/low-code frameworks that give you the functionality that you need. 

  • Instead of choosing new or flashy technologies, consider if you can build a “good enough” minimum viable product quickly and come back to a better implementation later. 

To learn more about why startups are choosing Google Cloud, click here.

Posted in