Announcing PSP's cryptographic hardware offload at scale is now open source
VP/GM Systems and Services Infrastructure
Soheil Hassas Yeganeh
Senior Staff Software Engineer, Google Cloud
Almost a decade ago, we started encrypting traffic between our data centers to help protect user privacy. Since then, we gradually rolled out changes to encrypt almost all data in transit. Our approach is described in our Encryption in Transit whitepaper. While this effort provided invaluable privacy and security benefits, software encryption came at significant cost: it took ~0.7% of Google's processing power to encrypt and decrypt RPCs, along with a corresponding amount of memory. Such costs spurred us to offload encryption to our network interface cards (NICs) using PSP (a recursive acronym for PSP Security Protocol), which we are open sourcing today.
Google’s production machines are shared among multiple tenants that have strict isolation requirements. Hence, we require per-connection encryption and authentication, similar to Transport Layer Security (TLS). At Google’s scale, the implication is that the cryptographic offload must support millions of live Transmission Control Protocol (TCP) connections and sustain 100,000 new connections per second at peak.
Before inventing a new offload-friendly protocol, we investigated existing industry-standards: Transport Layer Security (TLS) and Internet Protocol Security (IPsec). While TLS meets our security requirements, it is not an-offload friendly solution because of the tight coupling between the connection state in the kernel and the offload state in hardware. TLS also does not support non-TCP transport protocols, such as UDP.
IPsec protocol, on the other hand, is transport independent and can be offloaded to hardware. However, a limitation of IPSec offload solutions is that they cannot economically support our scale partly because they store the full encryption state in an associative hardware table with modest update rates. Assuming the size of an entry is 256B in either direction, transmit or receive, the total memory requirement for 10M connections is 5GB (256B x 2 x 10M) – which is well beyond the affordable capacity of commodity offload engines. Existing IPsec offload engines are designed to support encryption for a small number of site-to-site tunnels. Ultimately, we decided that IPsec does not meet our security requirements as it lacks support for keys per layer-4 connection.
To address these challenges, we developed PSP (a recursive acronym for PSP Security Protocol,) a TLS-like protocol that is transport-independent, enables per-connection security, and is offload-friendly.
At Google, we employ all of these protocols depending on the use case. For example, we use TLS for our user-facing connections, we use IPsec for site-to-site encryption where we need interoperability with 3rd party appliances, and we use PSP for intra- and inter- data center traffic.
PSP is intentionally designed to meet the requirements of large-scale data-center traffic. It does not mandate a specific key exchange protocol and offers few choices for the packet format and the cryptographic algorithms. It enables per-connection security by allowing an encryption key per layer-4 connection (such as a TCP connection.) It supports stateless operation because the encryption state can be passed to the device in the packet descriptor when transmitting packets and can be derived when receiving packets using a Security Parameter Index (SPI) and an on-device master key. This enables us to maintain minimal state in the hardware, avoiding hardware state explosion compared to typical stateful encryption technologies maintaining large on-device tables.
PSP supports both stateful and stateless modes of operation: In the stateless mode, encryption keys are stored in the transmit packet descriptors and derived for received packets, using a master key stored on the device. In contrast, stateful technologies typically maintain the actual encryption keys in a table per connection.
PSP uses User Datagram Protocol (UDP) encapsulation with a custom header and trailer. A PSP packet starts with the original IP header, followed by a UDP header on a prespecified destination port, followed by a PSP header containing the PSP information, followed by the original TCP/UDP packet (including header and payload), and ends with a PSP trailer that contains an Integrity Checksum Value (ICV). The layer-4 packet (header and payload) can be encrypted or authenticated, based on a user-provided offset called Crypt Offset. This field can be used to, for example, leave part of the TCP header authenticated yet unencrypted in transit while keeping the rest of the packet encrypted to support packet sampling and inspection in the network if necessary.
This is a critical visibility feature for us enabling proper attribution of traffic to applications, and is not feasible to achieve with IPsec. Of note, the UDP header is protected by the UDP checksum and the PSP header is always authenticated.
PSP packet format for encrypting a simple TCP/IP packet in the Linux TCP/IP stack.
We support PSP in our production Linux kernel, Andromeda (our network virtualization stack), and Snap (our host networking system), enabling us to use PSP for both internal communication and for Cloud customers. As of 2022, PSP cryptographic offload saves 0.5% of Google's processing power.
Similar to any other cryptographic protocol, we need both ends of a connection to support PSP. This can be prohibitive in brownfield deployments with a mix of old and new (PSP-capable) NICs. We built a software implementation of PSP (SoftPSP) to allow PSP-capable NICs to communicate with older machines, dramatically increasing coverage among pairwise server connections.
PSP delivers multiplicative benefits when combined with zero-copy techniques. For example, the impact of TCP zero-copy for both sending and receiving was limited by extra reads and writes of the payloads for software encryption. Since PSP eliminates these extra loads and stores, RPC processing no longer requires touching the payload in the network stack. For large 1MB RPCs, for example, we see a 3x speed up from combining PSP and zero-copy.
PSP and ZeroCopy have multiplicative impact, enabling us to send and receive RPCs without touching the payload. For large 1MB RPCs, using PSP alongside Zero-copy increases the throughput of TCP channels by 3x.
We believe that PSP can provide a number of significant security benefits for the industry. Given its proven track record in our production environment, we hope that it can become a standard for scalable, secure communication across a wide range of settings and applications. To support this, we are making PSP open source to encourage broader adoption by the community and hardware implementation by additional NIC vendors. For further information, please refer to http://github.com/google/psp which includes:
The PSP Architecture Specification.
A reference software implementation.
A suite of test cases.
Acknowledgements: We are thankful to a large number of colleagues from Technical Infrastructure and Cloud who contributed to PSP since its inception, including but not limited to Platforms, Security, Kernel Networking, RPCs, Andromeda, and other Network Infrastructure teams.