High Availability with CARP and HAProxy

Let me start by setting the scene. You have five mail servers providing a relatively large customer base with SMTP (relay to the world), POP3(S), IMAP(S) and webmail (HTTP(S)). Other servers, which are not part of the scene, receive mails from the world to the customers and do spam filtering. Mail from the customers to the world are also piped through this filtering – you can stop your facepalm – but it is also not a part of the scene. At the moment, the five mail servers are loadbalanced using DNS round robin, which provides load sharing but no redundancy. The protocols are in charge of the redundancy.

Since only SMTP provides redundancy (delivery is retried a number of times), the customers will have a bad experience with POP3, IMAP and webmail in the event of a server crash. If their MUA does not have an outgoing queue from which it retries delivery, they will also notice problems with SMTP and risk losing outgoing mails. Additionally, many customers do not hesitate to call the service desk, if their newly written mail does not disappear from their MUA’s queue immediately after they have clicked send. This increases the service desk load, which we also want to avoid.

It is clear that the situation is not ideal. There is no redundancy. Both uncontrolled and controlled downtime (scheduled maintenance) will disturb the customers’ use of mail services. The servers run FreeBSD, so we will utilize CARP and HAProxy to add redundancy without adding extra mail servers next to the existing ones or adding loadbalancer servers in front of the mail servers. Originally, my mind was set on keepalived, as I have some experience with it on Linux, but it seems that CARP is the weapon of choice in the *BSD world. As a curiosity, the FreeBSD port of keepalived expired in November 2011. The idea behind the improved setup is that CARP lets us have a number of virtual IP addresses that float between the servers. HAProxy provides service monitoring on each server and directs connections to the other servers if the local services are down.

It is important for me to stress that the presented setup in this post is only one out of many setups that can achieve load sharing and redundancy. Its pros and cons must be compared to the pros and cons of other ways that could be chosen in a given situation. Before mentioning a few cons, I want to start with the big, obvious pro. In a typical CARP/keepalived and HAProxy setup, in my experience, you have two dedicated loadbalancer servers in active/passive mode, i.e. at a particular point in time, one server sits idle, while all traffic flows through the other server. Failover to the idle server provides redundancy, and loadbalancing over a number of application servers provides load sharing. The single active server, however, is still a bottleneck. If your infrastructure provides 1 or even 10 Gbps connectivity, and your setup only handles e.g. 200 Mbps of traffic, this might not be a problem. Nonetheless, the dedicated loadbalancer servers are still at least two extra servers that can be avoided, if the application servers themselves, in coorporation with the underlying network infrastructure, are able to perform load sharing and redundancy. The setup presented here enables the application servers to do so.

The cons are that the solution is a bit complex, the procedures for maintenance and scaling are a bit complex as well, and the chosen loadbalancer, HAProxy, lacks some useful features. I have only looked at version 1.4 of HAProxy, so the features might not be lacking in newer versions. In addition, I might have missed something. I operate a Riverbed Stingray Traffic Manager on a daily basis, which means that I am used to many possibilities for session persistence, extensive traffic scripting, built-in checks for many types of services, SSL offloading, etc. It would be nice to offload the SSL to HAProxy and to have built-in, deep checks for POP3 and IMAP. We have to do without these things.

The setup consists of six servers:

Hostname Address Virtual address 1 Virtual address 2
mail01.test.netic.dk 192.168.42.11 192.168.42.21 192.168.42.26
mail02.test.netic.dk 192.168.42.12 192.168.42.22 192.168.42.27
mail03.test.netic.dk 192.168.42.13 192.168.42.23 192.168.42.28
mail04.test.netic.dk 192.168.42.14 192.168.42.24 192.168.42.29
mail05.test.netic.dk 192.168.42.15 192.168.42.25 192.168.42.30
client01.test.netic.dk 192.168.42.100

We have DNS round robin for the ten virtual addresses for the records {relay, mail, smtp, pop3, pop3s, imap, imaps, webmail}.test.netic.dk. Example:

[root@client01 ~]# dig mail.test.netic.dk a | grep ^mail | sort
mail.test.netic.dk.	900	IN	A	192.168.42.21
mail.test.netic.dk.	900	IN	A	192.168.42.22
mail.test.netic.dk.	900	IN	A	192.168.42.23
mail.test.netic.dk.	900	IN	A	192.168.42.24
mail.test.netic.dk.	900	IN	A	192.168.42.25
mail.test.netic.dk.	900	IN	A	192.168.42.26
mail.test.netic.dk.	900	IN	A	192.168.42.27
mail.test.netic.dk.	900	IN	A	192.168.42.28
mail.test.netic.dk.	900	IN	A	192.168.42.29
mail.test.netic.dk.	900	IN	A	192.168.42.30

It might be okay to use a greater TTL. Caching resolvers should cache the entire record sets and answer clients in a round robin fashion. We are not interested in changing the records frequently. The six servers are FreeBSD 9.1 amd64 guests in VirtualBox on my laptop (click on the images to view them unscaled):

20130615-high-availability-with-carp-and-haproxy-01

The terminal multiplexer tmux makes it easy to give an overview:

20130615-high-availability-with-carp-and-haproxy-02

In FreeBSD 9.1, CARP is available as a kernel module. I just added the line ‘if_carp_load=”YES”‘ to /boot/loader.conf and rebooted. The device configuration takes place in /etc/rc.conf. The handbook has an example – I will post configuration details in the end of the post.

I started out with one virtual address per server rather than two. This is shown on the following two images:

20130615-high-availability-with-carp-and-haproxy-03   20130615-high-availability-with-carp-and-haproxy-04

The advskew values are chosen such that the server “to the right” takes over in case of a crash, i.e. mail02 takes over for mail01, mail03 for mail02, …, mail01 for mail05. Think of the five servers being placed in a ring. One virtual address per server, however, has the disadvantage that a lot of traffic forwarding work might be put on one server, which might cause a domino effect. The second image shows the uneven distribution after mail01, mail02 and mail03 have crashed. Their virtual addresses have moved to mail04, while mail05 still only has one address.

Rather than building some sort of service which continuously distributes the virtual addresses evenly between operational servers, I decided to upgrade to two virtual addresses per server. Both solutions are complex, but the one that I have chosen does not require an extra server or any scripting. In return it probably does not scale as well as the other solution.

The following six images show the resulting setup and test the most obvious failover scenarios. The two servers “on each side” of a server takes over in the event of a crash. In this way, the traffic forwarding work is divided somewhat more evenly.

20130615-high-availability-with-carp-and-haproxy-05   20130615-high-availability-with-carp-and-haproxy-06   20130615-high-availability-with-carp-and-haproxy-07

20130615-high-availability-with-carp-and-haproxy-08   20130615-high-availability-with-carp-and-haproxy-09   20130615-high-availability-with-carp-and-haproxy-10

Note that a CARP interface starts out in the backup state, when it is brought up. We want virtual addresses to float back to their original server, when the server becomes operational. My solution, which is not thoroughly tested at this point, is to have a cronjob at boot and every tenth minute that sets the state, e.g. on mail01:

[root@mail01 ~]# tail -n 2 /etc/crontab 
@reboot root /bin/sleep 30; /sbin/ifconfig carp0 state master; /sbin/ifconfig carp5 state master
*/10 * * * * root /sbin/ifconfig carp0 state master; /sbin/ifconfig carp5 state master

If you decide to create the aforementioned virtual address distribution service, these cronjobs becomes unnecessary (and disturbing).

The following image shows that HAProxy has entered the scene and that Postfix answers on the virtual addresses:

20130615-high-availability-with-carp-and-haproxy-11

The haproxy.conf shown on the image has been changed a bit. It turned out that checks every tenth second, coming from five HAProxy instances, generate many useless lines of log. So far, I have chosen only to monitor the backup servers/services once every minute. (Correction: The uploaded haproxy.conf files at the bottom of this post still checks all servers every tenth second. It is left as an exercise to the reader to adjust this for his/her particular setup.)

HAProxy on a given server is configured such that it only forwards to services on the other servers, if its own services are down. If one of its own services is down, it loadbalances in a round robin fashion over the corresponding service on the other servers. The idea is that we do not want to generate extra traffic between the servers, if we do not have to. It should be more the exception than the rule that a local service is down.

The following two images show tests of CARP failover, from a client’s point of view:

20130615-high-availability-with-carp-and-haproxy-12   20130615-high-availability-with-carp-and-haproxy-13

The following two images show a HAProxy failover test and the HATop ncurses client, respectively:

20130615-high-availability-with-carp-and-haproxy-14   20130615-high-availability-with-carp-and-haproxy-15

Two things about the image to the right: a) HATop are useful for gathering statistics and for toggling service maintenance. b) The glimpse of Postfix’ lines in /var/log/maillog reveals that services no longer see the real source address of a client – they only see the source addresses of the different HAProxy instances.

Especially the remark about source addresses is important. Many connections/requests from one source address might trigger an abuse detection/prevention mechanism in a service. Lookups in e.g. RBLs will not make sense. Most protection mechanisms must be migrated from the services to HAProxy. Finally, the HAProxy access log is necessary to link internal and external addresses. Intelligent log collection tools like Splunk can be configured to do this, which means that it might not be a problem.

The services only listen on 127.0.0.1 and 192.168.42.1[1-5], while HAProxy listens on all interfaces, i.e. also on the floating virtual addresses:

[root@mail01 ~]# netstat -an | grep LISTEN | sort
tcp4       0      0 *.110                  *.*                    LISTEN
tcp4       0      0 *.143                  *.*                    LISTEN
tcp4       0      0 *.25                   *.*                    LISTEN
tcp4       0      0 *.587                  *.*                    LISTEN
tcp4       0      0 10.0.3.15.22           *.*                    LISTEN
tcp4       0      0 127.0.0.1.22           *.*                    LISTEN
tcp4       0      0 127.0.0.1.8025         *.*                    LISTEN
tcp4       0      0 127.0.0.1.8110         *.*                    LISTEN
tcp4       0      0 127.0.0.1.8143         *.*                    LISTEN
tcp4       0      0 127.0.0.1.8587         *.*                    LISTEN
tcp4       0      0 192.168.42.11.22       *.*                    LISTEN
tcp4       0      0 192.168.42.11.8025     *.*                    LISTEN
tcp4       0      0 192.168.42.11.8110     *.*                    LISTEN
tcp4       0      0 192.168.42.11.8143     *.*                    LISTEN
tcp4       0      0 192.168.42.11.8587     *.*                    LISTEN

I know I also mentioned the protocols POP3S, IMAPS and HTTP(S) (webmail) in the beginning, but these services are not yet installed, and the corresponding frontends and backends in HAProxy are disabled. In haproxy.conf, “balance source” is added to HTTP(S) backends, as session persistence is needed in the event of round robin loadbalancing over backup servers/services. As is evident from the current, non-redundant setup, persistence is not needed with pure DNS round robin, as a browser gets a single DNS reply from the operating system and remembers that for at least some time (15-30 minutes have been observed). This is yet another reason for keeping addresses alive rather than editing DNS records when performing scheduled maintenance or when working around a crashed server.

Let me end this saga with links to scripts and configuration files:

Easter egg on the occasion of Copenhell:

As I Lay Dying – An Ocean Between Us