Jump to Content
Data Analytics

Building hybrid blockchain/cloud applications with Ethereum and Google Cloud

June 13, 2019
https://storage.googleapis.com/gweb-cloudblog-publish/images/chainlink-gcp-skew_1.max-2600x2600.png
Allen Day

Developer Advocate, Digital Assets at Google Cloud

Adoption of blockchain protocols and technologies can be accelerated by integrating with modern internet resources and public cloud services. In this blog post, we describe a few applications of making internet-hosted data available inside an immutable public blockchain: placing BigQuery data available on-chain using a Chainlink oracle smart contract. Possible applications are innumerable, but we've focused this post on a few that we think are of high and immediate utility: prediction marketplaces, futures contracts, and transaction privacy.

Hybrid cloud-blockchain applications
Blockchains focus on mathematical effort to create a shared consensus. Ideas quickly sprang up to extend this model to allow party-to-party agreements, i.e. contracts. This concept of smart contracts was first described in a 1997 article by computer scientist Nick Szabo. An early example of inscribing agreements into blocks was popularized by efforts such as Colored Coins on the Bitcoin blockchain.

Smart contracts are embedded into the source of truth of the blockchain, and are therefore effectively immutable after they’re a few blocks deep. This provides a mechanism to allow participants to commit crypto-economic resources to an agreement with a counterparty, and to trust that contract terms will be enforced automatically and without requiring third party execution or arbitration, if desired.

But none of this addresses a fundamental issue: where to get the variables with which the contract is evaluated. If the data are not derived from recently added on-chain data, a trusted source of external data is required. Such a source is called an oracle.

In previous work, we made public blockchain data freely available in BigQuery through the Google Cloud Public Datasets Program for eight different cryptocurrencies. In this article, we'll refer to that work as Google's crypto public datasets. You can find more details and samples of these datasets in the GCP Marketplace. This dataset resource has resulted in a number of GCP customers developing business processes based on automated analysis of the indexed blockchain data, such as SaaS profit sharing, mitigating service abuse by characterizing network participants, and using static analysis techniques to detect software vulnerabilities and malware. However, these applications share a common attribute: they're all using the crypto public datasets as an input to an off-chain business process.

In contrast, a business process implemented as a smart contract is performed on-chain, and that is of limited utility without having access to off-chain inputs. To close the loop and allow bidirectional interoperation, we need to be not only making blockchain data programmatically available to cloud services, but also cloud services programmatically available on-chain to smart contracts.

Below, we'll demonstrate how a specific smart contract platform (Ethereum) can interoperate with our enterprise cloud data warehouse (BigQuery) via oracle middleware (Chainlink). This assembly of components allows a smart contract to take action based on data retrieved from an on-chain query to the internet-hosted data warehouse. Our examples generalize to a pattern of hybrid cloud-blockchain applications in which smart contracts can efficiently delegate to cloud resources to perform complex operations. We will explore other examples of this pattern in future blog posts.

How we built it
At a high level, Ethereum Dapps (i.e. smart contract applications) request data from Chainlink, which in turn retrieves data from a web service built with Google App Engine and BigQuery.

To retrieve data from BigQuery, a Dapp invokes the Chainlink oracle contract and includes payment for the parameterized request to be serviced (e.g. gas price at a specified point in time). One or more Chainlink nodes are listening for these calls, and upon observing, one executes the requested job. External adapters are service-oriented modules that extend the capability of the Chainlink node to authenticated APIs, payment gateways, and external blockchains. In this case, the Chainlink node interacts with a purpose-built App Engine web service.

On GCP, we implemented a web service using the App Engine Standard Environment. We chose App Engine for its low cost, high scalability, and serverless deployment model. App Engine retrieves data from BigQuery, which hosts the public cryptocurrency datasets. The data we've made available are from canned queries, i.e. we aren't allowing arbitrary data to be requested from BigQuery, but only the results of parameterized queries. Specifically, an application can request the average gas price for either (A) a particular Ethereum block number, or (B) a particular calendar date.

After a successful response from the web service, the Chainlink node invokes the Chainlink oracle contract with the returned data, which in turn invokes the Dapp contract and thus triggers execution of downstream Dapp-specific business logic. This is depicted in the figure below.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Untitled_6.max-1600x1600.png
For details on integrating your Dapp, please see our documentation for requesting data from BigQuery via Chainlink. Illustrative queries to BigQuery can be seen for gas price by date and by block number.

How to use the BigQuery Chainlink oracle
In this section we'll describe how useful applications can be built using Google Cloud and Chainlink.

Use case 1: Prediction marketplaces
Participants in prediction marketplaces allocate capital to speculate on future events in general. One area of intense interest is which smart contract platform will predominate because, being networks ecosystems, their value will follow a power law (i.e. winner-take-all) distribution. There are many differing opinions about which platform will succeed, as well as how success can be quantified.

By using the crypto public datasets, it’s possible for even complex predictions like the recent $500,000 bet about Ethereum’s future state to be settled successfully on-chain. We've also documented how the variety, volume, recency, and frequency of Dapp utilization can be measured by retrieving 1-, 7-, and 30-day activity for a specific Dapp.

These metrics are known as daily-, weekly-, and monthly-active users and are frequently used by web analytics and mobile app analytics professionals to assess website and app and success.

Use case 2: Hedging against blockchain platform risk
The decentralized finance movement is rapidly gaining adoption due to its successful reinvention of the existing financial system in blockchain environments which, on a technical basis, are more trustworthy and transparent than current systems.

Financial contracts like futures and options were originally developed to enable enterprises to reduce/hedge their risk related to resources critical to their operation. Similarly, data about on-chain activity such as average gas prices, can be used to create simple financial instruments that provide payouts to their holders in cases where gas prices rise too high. Other qualities of a blockchain network, e.g. block times and/or miner centralization, create risks that Dapp developers want to protect themselves against. By bringing high quality data from the crypto public datasets to financial smart contracts, Dapp developers' risk exposure can be reduced. The net result is more innovation and accelerated blockchain adoption.

We've documented how an Ethereum smart contract can interact with the BigQuery oracle to retrieve gas price data at a particular point in time. We've also implemented a stub of a smart contract option showing how the oracle can be used to implement a collateralized contract on future gas prices, a critical input for a Dapp to function.

Use Case 3: Enabling commit/reveals across Ethereum using submarine sends
One of the commonly mentioned limitations in Ethereum itself is a lack of transaction privacy, creating the ability for adversaries to take advantage of on-chain data leakage to exploit users of commonly used smart contracts. This can take the form of front-running transactions involving distributed exchange (DEx) addresses. As described in To Sink Frontrunners, Send in the Submarines, the problem of front-running plagues all current DExs and slows down the Decentralized Finance movement's progress, as exchanges are a key component of many DeFi products/applications.

By using the submarine sends approach, smart contract users can increase the privacy of their transactions, successfully avoiding adversaries that want to front-run them, making DExs more immediately useful. Though this approach is uniquely useful in stopping malicious behavior like front-running, it also has its own limitations, if done without an oracle.

Implementing submarine sends without an oracle produces blockchain bloat. Specifically, the Ethereum virtual machine allows a contract to see at maximum 256 blocks upstream in the chain, or approximately one hour. This maximum scope limits the practical usefulness of submarine sends because it creates unnecessary denormalization when rebroadcasting of data is required. In contrast, by implementing submarine sends with an oracle, bloat is eliminated because operating scope is increased to include all historical chain data.

Conclusion
We've demonstrated how to use Chainlink services to provide data from the BigQuery crypto public datasets on-chain.This technique can be used to reduce inefficiencies (submarine sends use case) and in some cases add entirely new capabilities (hedging use case) to Ethereum smart contracts, enabling new on-chain business models to emerge (prediction markets use case).

The essence of our approach is to trade a small amount of latency and transaction overhead for a potentially large amount of economic utility. As a concrete example, ordinary submarine sends require on-chain storage that scales O(n) with blocks added to the blockchain, but can be reduced to O(1) if the calling contract waits an extra two blocks to call the BigQuery oracle.

We anticipate that this interoperability technique will lead developers to create hybrid applications that take the best of what smart contract platforms and cloud platforms have to offer. We're particularly interested in bringing Google Cloud Platform's ML services (e.g. AutoML and Inference APIs).

By allowing reference to on-chain data that is out of scope, we improve the operational efficiency of the smart contract platform. In the case of submarine sends, storage consumption that scales O(n) with block height is reduced to O(1), at the trade-off cost of additional transactional latency to interact with an oracle contract.

Posted in