Jack Moore

Email: jack(at)jmoore53.com
Project Updates

A bit of Nginx Load Balancing

23 Jan 2022 » web, html, nginx, configuration, docker, load balancing

Similiar to the last post, I wanted to look more into NGINX features including load balancing and geographically serving requests from servers within the same region. In this post I look into load balancing requests with GeoIP2, serving requests based on locations and IPs, and nginx upstream servers for load balancing. I also added the echo module to nginx, and explored other logging formats for nginx.

Containers and Docker Networking

After commiting the image I created last post (nginx:geocity), this post will use that same image to create multiple containers to server requests. I did move the image to a different server so I didn’t accidently delete any production containers. To copy the image, I used the following:

docker save -o /opt/nginx/nginx-geocity nginx:geocity
scp /opt/nginx/nginx-geocity jack@remoteserver:/opt/nginx/nginx-geocity
docker load -i /opt/nginx/nginx-geocity

From the new host, it all starts with a network to connect the docker containers. For this simple POC, I created all the containers on the same host and used docker networking to link them all together. Below is the creation of the containers and network:

# Create Geocity container to proxy all requests
docker run --name nginx_geocity_1 -p 8081:80 nginx:geocity

# Create network
docker network create --driver bridge nginx-net

# Connect already running container to network
docker network connect nginx-net nginx_geocity_1

# Create new container connected to network
docker run --name nginx_region_1 --network nginx-net -d nginx:geocity
docker run --name nginx_region_2 --network nginx-net -d nginx:geocity
docker run --name nginx_region_3 --network nginx-net -d nginx:geocity

At this point there are 4 containers running, we have one main nginx server to process incoming requests (nginx_geocity_1) and three “regional” servers to serve requests (nginx_region_[1,2,3]).

Because the default page simply displays “Welcome to Nginx”, I needed a way to distinguish which regional server I was connecting to. Again because this is a POC, I edited the html/index.html to say “Welcome to Region $regional_server_id” which meant when I loaded regional server 1 it displayed “Welcome to Region 1”, and for regional server 2 it displayed “Welcome to Region 2”. Simple, but necessary.

Nginx Configuration

After this was setup, I modified the nginx.conf file in the nginx_geocity_1 server to look like the following to server requests based on region:

http {
    geoip2 /opt/GeoLite2/GeoLite2-City.mmdb {
        auto_reload 60m;
        $geoip2_metadata_city_build metadata build_epoch;
        $geoip2_data_city_name city names en;
    }
    geoip2 /opt/GeoLite2/GeoLite2-Country.mmdb {
        auto_reload 60m;
        $geoip2_metadata_country_build metadata build_epoch;
        $geoip2_data_country_code country iso_code;
        $geoip2_data_country_name country names en;
        $geoip2_data_continent_code continent code;
    }

    map $geoip2_data_continent_code $nearest_server {
        default all;
        EU eu;
        NA na;
        AS as;
        AF af;
    }

    # geo $geo {
    #     default all;
    #     172.17.0.0/24 eu;
    #     127.0.0.1/32 all;
    #     10.0.4.0/24 na;
    # }

Note this is very similiar to the last post. Essentially not much has changed, however we have added a geo $geo block (for testing purpsoses used later). Basically continent codes get mapped to a $nearest_server variable now and also the $geo variable also maps to continent character codes. These variables are for when serving requests where to direct the requests to the next server (ie eu will go to Europe servers, na will go to North American servers, all is used for defaults).

Now the default.conf site has been modified to look like the following:

server {

    listen       80;
    listen  [::]:80;
    server_name  localhost;

    location /echo {
        echo $geo;
    }

    location / {
        proxy_pass http://$nearest_server;
    }
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }
}

upstream all {
        server nginx_region_1;
}

upstream eu {
        server nginx_region_2;
}

upstream na {
        server nginx_region_3;
}

Now any request that comes in is mapped a country code from the GeoIP2 module, and then is served data via proxy from the closest geographic server. (I could and really should have used mod_rewrite to reroute the request for better performance instead of proxying requests through the one server, but will look into this later.)

Where I ran into problems

The above looks great. Everything looks like it works and all is well except that this is all on one server. It is hard to test out geo codes from localhost and from an internal network because all the requests are going to end up at the same proxy_pass location.

I tried a few different things including adding the echo module, and setting real_ip_addr for the proxy, but these didn’t work. I even contemplated IP Spoofing, but stayed away to prevent tangenting off into networking issues. My first thought was to add more logging. I added more logging to track the upstream requests to the containers that were serving the requests.

I added the following to my nginx.conf configuration within the http block to keep the regular logging, as well as log the requests sent to the upstream servers:

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    log_format upstreamlog '$server_name to: $upstream_addr {$request} '
                           'upstream_response_time $upstream_response_time'
                           ' request_time $request_time';

    access_log  /var/log/nginx/access.log  main;
    access_log  /var/log/nginx/access.log  upstreamlog;

Although this was helpful to have extra logging, it did not help me solve my issue, it just showed I was able to process requests fast.

Then I added the echo module to nginx, and this is why there is a location to serve /echo which will just return the continent_code on a request. (Note: I had to re-compile nginx with the echo module because it is not builtin to NGINX by default.)

This is where the NGINX builtin geo module also came into play. With a few quick modifications I was able to test connections from the network I’m currently on, the local machine, and the docker network to the geocity server to see where requests were being sent. Uncommenting the geo block that’s shown above in my configuration, I modified the configuration and instead of proxy_passing the $nearest_server to proxy_pass to $geo instead.

Now when I curled from my local machine I would get different requests back from the server showing which variables were being used:

# Curl request from my local machine to the server:
curl remote.ip.addr.here:8081/echo
# response: na

# Curl request from within the container
curl localhost/echo
# response: all

# Curl request from a different docker container on the same subnet
curl nginx_geocity_1/echo
# response: eu

Then after this was complete, I tested to see if the /index.html page returned “Welcome to Region $region_id” indicating I was infact getting a response from the correct upstream server. After this verification everything was looking good. I figured this was a good POC, and continent codes would match, but would likely need to be tested on multiple servers before being moved into production.

Adding more servers to the upstream

NGINX upstream allows for multiple servers and load balancing by simply adding more servers within the upstream blocks. It also allows for different methods of loadbalancing including round robin (appears to be the default if two servers are added to the upstream), least_conn (least connections), ip_hash, and generic hash.

I tested this out by adding all the servers to one upstream and using least_conn. The upstream block looked like the following:

upstream na {
        least_conn;
        server nginx_region_1;
        server nginx_region_2;
        server nginx_region_3;
}

Using my browser and refreshing a few times, I was able to see all three region servers were serving the request.

From Here

From here I want to look into IP Spoofing for testing including using the scrapy package, and iptables. I also want to look into docker overlay netowrks for networking across docker hosts (although I’d like to do this without docker swarm or k8s for right now.)

Also, as I noted above, this is just an ok solution for georouting. It is far from perfect. Realistically, for the best georouting it would start at DNS georouting. Then from there if someone from europe managed to up hit the US version of the site at “example.com” it may be better to use mod_rewrite to send them to the “.eu” version of the site on their first request to better serve them. I may look into these types of solutions in the future.

© Jack Moore