Fixing the “depends_on” crisis in .NET Core by implementing the Circuit Breaker Pattern for Docker-Compose and Postgres

With Docker Compose version 2, a person using docker-compose could implement a “depends_on” and healthcheck script to ensure dependencies when starting docker containers could be handled. This was incredibly useful in waiting for a database to be ready to accept connections before attempting to connect. It looked like the following:

version: '2.3'
services:
  stats-processor:
    build: ./
    depends_on:
      - db
  db:
    image: postgres:10.3-alpine
    restart: always
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
    ports: 
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

In Docker Compose version 3, the ‘depends_on’ condition behavior was removed, with the reason being that the application should implement this behavior, or the orchestration software should implement this behavior; not docker-compose. I respectfully disagree1; but that’s neither here nor there.

To “fix” this issue for cases where an individual is using .NET Core to connect to Postgres, I’ve come up with the following, based on the Circuit Breaker Pattern. The pattern is described as:


The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all. 

Martin Fowler, “CIRCUITBREAKER”

It’s a very useful pattern and it is precisely the ‘right’ answer for the problem we’re facing: How do we ensure Postgres is ready to accept connections before we connect to it? Here’s the way I chose:

private static bool retryConnect(int tryTimes, string connectionString)
        {
            int times = tryTimes;
            using (NpgsqlConnection conn = new NpgsqlConnection(connectionString))
            {
                while (times > 0 && conn.FullState != ConnectionState.Open)
                {
                    try
                    {
                        if (conn.FullState == ConnectionState.Connecting) { Console.WriteLine("Connecting...");  Thread.Sleep(5000); break; }
                        if (conn.FullState != ConnectionState.Open) { Console.WriteLine("Opening Connection..."); conn.Open(); Thread.Sleep(5000); }
                        if (conn.FullState == ConnectionState.Open)
                        {
                            Console.WriteLine("We have connected!");
                        }
                    }
                    catch (SocketException ex)
                    {
                        Console.WriteLine("SocketException Exception: {0} ", ex);
                        Thread.Sleep(5000);
                        times--;
                    }
                    catch (NpgsqlException nex)
                    {
                        Console.WriteLine("NpgsqlException Exception: {0} ", nex);
                        Thread.Sleep(5000);
                        times--;
                    }
                }
                if (conn.FullState==ConnectionState.Open)
                {
                    Console.WriteLine("Connected!");
                    conn.Close();
                    return true;
                }
                return false;
            }
        }

The NpgsqlConnection class maintains a state machine of the status of the connection using its FullState property which uses the ConnectionState enumeration to declare which state its in, we use this as the internal property to determine whether we need to keep trying or not.
private static bool retryConnect(int tryTimes, string connectionString)
The method is static due to its use in a .NET Core Console application directly in the main method.

int times = tryTimes;
using (NpgsqlConnection conn = new NpgsqlConnection(connectionString))

Assigning the tryTimes variable to times (how many times should we try to connect) isn’t required since in C# a the variable is passed in by value (meaning external invocations wouldn’t be affected by mutating the variable inside this method); but I do it because I’m not really concerned about it here.
I am using the usingblock to ensure the connection is cleaned up when I’m done with it. I don’t really know the internals of the class (And if it’s a good abstraction I shouldn’t have to); so I’ll put it in a using block.

The next part is a `while` loop that specifies how long we should try:

while (times > 0 && conn.FullState != ConnectionState.Open)

Keep trying until we run out of tries (3, in my example), and the connection hasn’t been opened. If it’s opened before we’ve gotten to our third try, then abort. If we’ve gone three times and it’s still not been opened, abort. I used a `while` loop because the logic made sense when reading the code: “While we haven’t connected or run out of tries, keep trying to connect.”

The next three lines handle the conditions of the FullStateproperty.

if (conn.FullState == ConnectionState.Connecting) { 
    Thread.Sleep(5000); break; 
}
if (conn.FullState != ConnectionState.Open) { 
  conn.Open(); Thread.Sleep(5000); 
}
if (conn.FullState == ConnectionState.Open)
{
  break;
}

If we’re currently trying to connect, give it 5 seconds to connect (This amount is variable depending on how much you have going on in your database; you can raise or lower the limit to your taste and particulars). If we didn’t put a sleep in here, we could effectively run out of chances to give it a chance to try to connect before we wanted to.
If the connection isn’t open, try to open it (We’re betting that trying to open a connection that’s already trying to be opened results in a no-op. This is an assumption and could be wrong). And of course, Sleep for 5 seconds. It’s not entirely clear if we need to sleep here, as the Open() function is synchronous.
Finally, if the connection is indeed open, break out of the loop and let’s allow it to be evalulated again. This line isn’t needed; but was included in the original code to give us a chance to Console.WriteLine()and debug through the console.

The next two blocks handle errors that could happen while trying to connect:

catch (SocketException ex)
{
    Console.WriteLine("SocketException Exception: {0} ", ex);
    Thread.Sleep(5000);
    times--;
}
catch (NpgsqlException nex)
{
    Console.WriteLine("NpgsqlException Exception: {0} ", nex);
    Thread.Sleep(5000);
    times--;
}

If you’re trying to use NpgsqlConnection you’re ultimately using an abstraction over opening a Socket, which means that not only do you have to worry about things going wrong with Npgsql, you also have to worry about underlying network exceptions. In my case, when the database wasn’t ready, it would issue a Connection Refused, and perhaps paradoxically this does not raise an NpgsqlException, it raises a SocketException.
In our case, if we receive an exception (and we’re expecting to the first time, at least) then reduce the number of times we’re going to try again, and do nothing for 5 seconds (to hopefully give time for the Database to be available). This is also one of those settings that you’d tweak for your environment, as in some instances I’ve seen databases take a minute to become available when started from docker (typically due to the number of operations at startup or whether the database’s volumes were removed before starting it up2.

Finally, we handle the state we really care about; is this connection open?

if (conn.FullState== ConnectionState.Open)
{
    Console.WriteLine("Connected!");
    conn.Close();
    return true;
}
return false;

If the connection is open, our work is done, close the connection, and return true(the retryConnect method returns bool). Otherwise return false; as we’ve exhausted our number of tries and could not connect. The connection will be closed when the NpgsqlConnection class is disposed; but we’re going to be a good citizen and be explicit about it.

So that’s the code, explained in full. It’s very rough (as I created it about an hour ago); but it works for my use-case. I wouldn’t recommend it being blindly copied into a production codebase without a bit of testing. Using the code is pretty easy, and demonstrated below in the while loop. This while loop exists to ensure we’re going to wait for the database before trying to do anything. Incidentally (it’s turtles all the way down), this code will wait until the database is ready (as as an application it can’t do anything if the database isn’t up); and therefore the application will sit there until the database comes up. This works for this particular application; but your scenario (like a web application), may need a more robust answer.

 while (succeeded = retryConnect(3, connectionString) && !succeeded)
 {
     //do something that requires database to be available here
 }

Overall I wish this were an option in Docker Compose version 3 yaml files; but as it is not we’re forced to solve it ourselves. Please sound off with any corrections you’d make to this code.

1 (not everyone’s use-case is web scale orchestration. Some people have simpler desires; and depends_on fufills those simpler desires, even if it doesn’t cover the entire gamut of ways containers could fail. Put simply, it would be akin to removing Airbags from cars because people could get hurt by airbags; instead of realizing airbags do have their own uses and even if they don’t cover everything they’re still useful. Maybe an airbag switch would be a better feature instead of removing the airbags).

2: Yes, I know you shouldn’t really run a database in docker; however for development it’s frightfully useful and keeps you from having those niggling issues where each developer has to have the right settings on their host and the right installed software on their host in order to be productive (Not to mention when any system setting change occurs; every developer having to make that change on their system). By putting it in Docker, you reduce the number of pieces you have to have installed to be productive to just Docker, and that’s a net positive. If it’s not good for your use case, don’t do it.

Advertisements

1 thought on “Fixing the “depends_on” crisis in .NET Core by implementing the Circuit Breaker Pattern for Docker-Compose and Postgres

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s