Sidecar container SQL proxy connection refused

Problem

Cloud SQL proxy is being run as a sidecar container to mediate access to a Cloud SQL database from a Pod running in a Google Kubernetes Engine cluster. For some time after the Pod has been created, connections to the SQL proxy fail with error connection refused (errno 111, ECONNREFUSED).

After a while (time depending on the deployment) a connection attempt succeeds.

Details of the error message will vary depending on the programming language and connection library used, but will likely include the words connection refused, ECONNREFUSED, or the error code 111. For example, in Python with the psycopg2 PostGreSQL DB adapter, the error message is:

OperationalError: could not connect to server: Connection refused

Environment

Solution

Just keep trying until the connection succeeds. Once the first connection is established, SQL proxy will operate normally until the container shuts down.

This requires an adaptation of the client code using Python with the psycopg2 DB adapter as an example.

Example workaround implementation 1

  1. Add a delay at the beginning of operations of client containers.
  2. import time; time.sleep(30) # 30 seconds
  3. The amount of time to wait could be determined looking at the max startup time of the SQL proxy, which can be easily measured with a few attempts, or deduced from the historical logs.
  4. This solution is simple to implement, but may still fail on occasions when the SQL proxy startup is slower than expected.

Example workaround implementation 2

  1. Test for successful connection before proceeding with other operations in client containers.
  2. from time import sleep, time import psycopg2 MAX_WAIT = 600 # abort if no connection in 10 minutes waited = 0 while waited < MAX_WAIT: tic = time() try: conn = psycopg2.connect(dbname="test", user="postgres", password="secret") break # out of `while` loop except psycopg2.OperationalError: sleep(1) # 1sec interval between probes waited += (time() - tic) else: raise RuntimeError("Timed out connecting to SQL proxy")

Cause

The SQL proxy container takes some time to start up and establish the connection to the backend DB, and during this time it refuses connections from clients.

Since the SQL proxy runs as a sidecar container, solving this at the Google Kubernetes Engine level would require a notion of dependency among containers in the same pod. This feature has been in discussion for some time but there is no decision yet as to whether to implement it and how.

The general philosophy in Kubernetes is that the system will be eventually in the specified state, so code should keep trying until it at some point in time succeeds. In other words: keep connecting until SQL proxy is ready and allows it; we only have a problem if SQL proxy never becomes ready.