How Stairwell uses Bigtable for cybersecurity
Mike Wiacek
CEO & Founder, Stairwell
Bill Pentney
Senior Staff Engineer - Machine Learning, Stairwell
In 2006, Mike Wiacek stepped into Google’s Mountain View offices, beginning an almost 14-year journey. He founded Google's Threat Analysis Group and co-founded Chronicle, an Alphabet cybersecurity venture now a part of Google Cloud. Today, Mike summarizes how his experiences as a Googler shaped the way he built Stairwell, his new company.
While cybersecurity may seem to be about erecting digital barriers, at its core cybersecurity is a data analysis challenge. At Stairwell, we comprehend the complexity of this challenge and have designed an adaptive, resilient system to counter adaptive threats.
We collect and store all executable files across an organization’s systems, forever. These files are continuously rescanned and analyzed in light of existing and emerging threat intelligence. This strategy enables real-time monitoring, retrospective threat assessments, bespoke detection engineering workflows, and novel security hunting capabilities.
A year after our general availability launch, Stairwell is tracking more than 8.2 billion file sightings across our customer environments.
The database dilemma
In Stairwell’s early days, we chose open-source PostgreSQL for its reliability and ease of use. It was an effective choice for the short term, allowing us to focus on achieving product-market fit by learning from customer feedback.
However, we needed a more robust solution as we scaled up to meet our customer needs. Our database’s limitations started to emerge, which influenced our product development and vision in undeniable ways.
By last year, these limitations had become evident and we made the strategic decision to migrate our key-value data storage to Bigtable, Google’s scalable, high-performance NoSQL database known for its versatility in both batch and real-time workloads. The shift marked a transformative moment for Stairwell's scaling capabilities.
Bigtable: A scaling revolution
Scaling challenges with traditional databases can hinder an ambitious engineering team, which is where a platform like Bigtable comes in. Bigtable acts as a catalyst, inspiring teams to aim higher. Within months, we were handling tables so large they would be unimaginable with our previous database. Engineering ambitions can rise to the occasion when given the right infrastructure.
While technically it is possible to use any database as a key-value store, Bigtable’s data model and performance is what made it the clear winner for us. Bigtable's lexicographically sorted key model is pivotal for our specific use cases. We've optimized key schemas and column family qualifiers to reduce both the number of requests and the data volume per read/write operation.
Paired with reader expressions, Bigtable serves as both an indexed data store and a distributed hash table for us. By collocating related data, we gain the efficiency of precise key lookups along with the speed of scanning adjacent rows in a single query. Additionally, we use Bigtable's garbage collection to auto-prune outdated records, maintaining a lean storage footprint. In sum, a well-designed data model with Bigtable is not just beneficial — it's empowering.
Our largest table is awe-inspiring: more than 328 million rows with column counts ranging from a solitary 1 to an overwhelming 10,000. This single table manages to house hundreds of billions of data points, while maintaining an average read latency of just 1.9 milliseconds and maxing out at 4 milliseconds. This is borderline magical!
Bigtable isn't just a gigantic storage vault for us; it's a high-performance analytics engine capable of executing both batch and streaming queries on a grand scale. We frequently run large Dataflow jobs that populate hundreds of millions of rows, while simultaneously supporting live user-facing queries that return results in single digit milliseconds. In one standout instance, Bigtable effortlessly served more than 22 million rows per second during an intense Dataflow job. This isn't merely fast — it's a game-changing level of data processing capability.
Load spikes, malware detection, and Go
Many NoSQL databases boast about their scalability, but there’s often a hidden limitation: the actual speed at which they can scale. Bigtable leverages Google's Colossus file system, allowing it to add new nodes in real-time, instantly bringing them online to handle incoming data requests, all without the necessity for data rebalancing and resharding. Unlike other NoSQL systems, Bigtable doesn’t suffer from throughput dilution upon downscaling, either.
Bigtable's autoscaler automates scaling, and adjusts the node count in response to changes in query load, thus ensuring a consistent performance level. This efficiency translates directly into cost-effectiveness: We pay only for what we use, and we experience zero downtime during any of these operations.
Bigtable serves as the backbone for MalEval, our machine learning-based malware detection system. This system utilizes a neural network trained on extensive file metadata, extracted from thousands of file features via our in-house scanning tools.
Bigtable supports our training pipeline, which sifts through gigabytes of data across millions of rows to gather training samples, and our real-time malware scanner. The scanner ingests fresh metadata to feed into the trained model, allowing us to infer the likelihood of a file being malicious. Thanks to Bigtable's efficiency, we can meet stringent latency requirements while processing millions of customer files daily.
Our entire backend at Stairwell is coded in Go, a language that we have found exceptionally fit for purpose. One remarkable feature of Bigtable that has made our lives easier is the `bttest` package in the Go Cloud SDK. This package provides an in-memory Bigtable server, ideal for both local development and for use within our unit tests. This is in stark contrast to other databases, where setting up similar environments for unit tests can be a resource-intensive task.
The future: Outsmarting tomorrow's challenges
Migrating to Bigtable helped us envision a future and function for Stairwell that would have been inconceivable otherwise, but Bigtable isn't just solving our current data needs. Bigtable is a forward-looking solution that equips us for the future of cybersecurity. It allows us to focus on what really matters: customer security. Bigtable's scalability ensures we're not just reacting to today's security landscape but preparing for the unpredictable challenges of tomorrow.
Learn more
Stairwell is one of many organizations using Bigtable to level up their goals. Check out the following posts to learn more about other organizations taking a similar approach in retail, FinTech, data quality and social media.
- How Discord & Tamr built ML-driven applications with Bigtable
- Credit Karma serves 63 billion personalized recommendations a day with Bigtable
- Home Depot delivers personalized experiences with Bigtable
Get started with a Bigtable free trial today.