/ haproxy

The wonder of Haproxy

If you have not heard of HAProxy; you're in for a treat.

HAProxy is a software load-balancer tool; it accepts HTTP / TCP streams and proxies them from front-to-backend groups of servers. If you've worked with load balancers before everything here will seem pretty intuitive. If you're new to load balancers, here's a brief overview:

load-balancer

In HAProxy, connections enter a frontend; flow through a set of rules and are then matched by a "use backend" rule. They arrive at the backend, flow through a new set of optional rules; and a server is selected. The connection is then made between HAProxy and the backend server; and data is streamed between the two connections until the connection is closed.

HAProxy is used extensively where I work. We stream thousands of connections per second through HAProxy every day. It's reliable; stable; fast; and highly scalable.

Sample Config

Let's look at at a simple (and complete) HAProxy config (the one that runs this site in-fact!):

global
    # Logging config
	log /dev/log	local0
	log /dev/log	local1 notice
    
    # Basic process isolation
	chroot /var/lib/haproxy
	user haproxy
	group haproxy
	daemon

	# Default SSL material locations
	ca-base /etc/ssl/certs
	crt-base /etc/ssl/private

	# Default ciphers to use on SSL-enabled listening sockets.
	# For more information, see ciphers(1SSL). This list is from:
	#  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
	ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
	ssl-default-bind-options no-sslv3

defaults
	log	global
	mode	http
	option	httplog
	option	dontlognull
    timeout connect 5000
    timeout client  5000
    timeout server  5000

    errorfile 400 /etc/haproxy/errors/400.http
	errorfile 403 /etc/haproxy/errors/403.http
	errorfile 408 /etc/haproxy/errors/408.http
	errorfile 500 /etc/haproxy/errors/500.http
	errorfile 502 /etc/haproxy/errors/502.http
	errorfile 503 /etc/haproxy/errors/503.http
	errorfile 504 /etc/haproxy/errors/504.http


frontend Allinterfaces
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/sni/
    
    redirect scheme https code 301 if !{ ssl_fc }
    
    acl xens_host hdr(Host) -i www.xens.net

    use_backend ghost if xens_host

backend ghost
    balance roundrobin
    option forwardfor

    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }

    server ghost 127.0.0.1:2368

(I should make note that the above config is all that's required to run this site; I actually have about two dozen other backends and routing rules configured on this server)

This configuration is very simple. The global and defaults blocks are pretty simple; the only tuning typically required there is the timeouts block; some use cases may require longer, or shorter, timeouts.

Frontends

The frontend block is composed of two bind's:

  • one binds to HTTP port 80 on "all interfaces"
  • the other binds to HTTPS port 443, enables "SSL", and makes use of Server Name Indication to pick the "correct" certificate from a folder containing various certificates in the chained-PEM format (which I'll cover making in a future article)

The frontend inherits all config from the default block. Any of these values could be overridden in the front end; providing the ability to customize pretty well anything on a per-interface basis.

The next line sets up a redirect; This blog is only available with SSL enabled; HAProxy will encounter the redirect and perform it without ever making a backend decision. (In-fact; the redirect will be handled by the front-end that the redirect was performed in. This will help identify these redirects in a future article about logstash and we'll go over tuning your configuration to optimize!) In effect; if the SSL flag is not set; HAProxy returns a response with the supplied response code and a "location" header with the value "https://" prefixed on the HOST HTTP header.

The ACL condition checks that the HTTP Host: header matches the expected value. If you try to browse to https://xens.net/ you'll get a 503 error (no backend server is available)[1]. If we wanted to correct this condition; we could do a number of different things:

  • We could add the base domain name to the ACL. The ACL would become:
    • acl xens_host hdr(Host) -i www.xens.net xens.net
    • This would match; and would route requests with either domain name to the backend. It would be up to the backend to handle this additional name.
  • We could perform a redirect at haproxy,
    • Add an ACL to define the condition:
      • redirect_hosts hdr(host) xens.net
    • Add a redirct to match the condition:
      • http-request redirect code 301 location http://www.%[hdr(host)]%[capture.req.uri] unless redirect_hosts
  • We could also modify the 'errorfile 503' to contain a redirect using a location header (though this it not recommended; anytime the backend is down for other maintenence; displaying a nice error page should happen without the browser acting strange!)

It should be noted that the ACL that is checking the Host: header technically depends on some very 'black-magic' handling of the Host header as described in RFC 2616, Section 3.2.2. If you attempt to send requests to the server on a non-standard (IE not 80/443) port; the ACL condition will actually fail. The header value (when utilizing the hdr() function) must match exactly. One could blindly accept any host header beginning with the prefix (using acl xens_host hdr_beg(Host) -i www.xens.net) but that as a whole separate set of security risks associated. Better yet would be to explicitly add all of the host:port combinations that may be required as a compound ACL! acl xens_host hdr(Host) -i www.xens.net www.xens.net:80 www.xens.net:443 www.xens.net:8080 (etc))

The last line utilizes the ACL to perform a match; and selects the backend if the condition matches. If NO conditions match; an optional default_backend [backend_name] can be used; otherwise HAProxy responds with a 503 error "no server is available to handle the request".

It should be noted that in this simple case; the ACL could be anonymized; You've actually already seen an anonymous ACL in the HTTPS redirect. We could remove the ACL definition and use the following use_backend:

use_backend ghost if {hdr(Host) -i www.xens.net}

ACL's can be reused other places in the same block they are not available in the backend. (you can redeclare them in the backend; but they are not shared in any way!) Where as anonymous ACL's are not named; thus go out-of-scope after their first use.

Backends

Our backend; like the frontend preceding it; inherits all config from the default block. Again; these values can be overridden as required.

balance roundrobin makes no functional difference in this case; if we had multiple backend servers this would ensure that traffic was equally distributed to each server (adjusted by the weight of each). As we only have the one backend server; this does nothing.

option forwardfor adds the "X-Forwarded-For:" HTTP header to the connection that is about to be made to the backend server; and populates the value with the IP address of the incoming connection. This is useful if your backend application makes use of the IP addresses of incoming connections; as all connections will appear to come from the load balancer IP directly; and the X-Forwarded-For: header will contain the actual IP address of the client.

The next two http-request set-header lines add headers to the http request; not the HTTP response. These headers are sent to the backend with values from the incoming connection. These can be extremely useful if (for example) you had multiple frontends that all routed to the same backend; or you wanted to pass some other value statically down to the application as a header. Here we're just telling the application what port and protocol the load balancer received; the call to the backend will be made over HTTP (strictly) and without these headers set it will think it's running in HTTP mode (and should perform it's own redirects to the HTTPS:// path of the site)

The last line simply defines the friendly name of our server; followed by the ipaddress:port to make the backend connection to. If we wanted to; we could also implement a health-check to ensure the site is working + updated + etc before sending the connection though.

What this all means

HAProxy (in the above case) provides a layer of abstraction between the application and how traffic connects to it. Want to run multiple blogs on the same IP address all using SSL?

  • add a new back-end for each; update the port to match
  • add a new ACL with the new domain name
  • get a certificate for the new domain; copy it to the SNI path (in PEM format)
  • add a use_backend matching the new ACL + new backend
  • Reload haproxy; and you're done.

Want to move your site to a new server with 0 downtime? even regardless of DNS propagation?

  • Set the new server up (deploy your application / etc)
  • setup an SSH tunnel between the two servers (to ensure requests are encrypted!)
  • alter your HAProxy config to use the new server as the backend
  • Reload HAProxy and watch traffic flow.

Want to route a small percentage of traffic to a new application once they have received a cookie confirming they want to try a new experience?

  • Setup stick-tables (I'll cover this in the future)
  • add an ACL to check for the presence + value of the cookie
  • add a new back end pointing to your new application
  • use_backend new_backend if new_acl_matches
  • Reload haproxy

There's literally endless possibilities with HAProxy. it's become one of the most valuable tools in my arsenal. I hope you give it a try!

Future content

I Love HAProxy. I'll be covering some of the advanced usage of balancing-traffic-at-scale in the future.

If you have HAProxy questions; first and formost; read the manual. If that doesn't answer your questions, please leave them below or email me I'd be happy to write future articles about challanges people face and explore the solutions!


  1. I should point out that I got tired of this not working for this and other sites; and have since 'fixed' it using the second method. ↩︎