High Availability on FreeBSD: The service IP address, part 2

In this second part of the article, we configure the service IP address.

Configuring the service IP address

As already mentioned in the first part, using the “carp” kernel module -which implements the protocol with the same name-, we will make an IP address assigned to two nodes, so that if one of them fails, the other continues answering requests.

The configuration consists of an extra line in the /etc/rc.conf file of each node.

For the first node, we will run the following:

For the second, we will run:

Let’s examine this in a generic way:

The first line indicates the use of an IP alias. This means that the ’em0′ NIC will have an additional IP to the one already configured as the primary IP (192.0.2.5/24 in our case for node 1, and 192.0.2.6/24 for the second one). Everything that goes between quotes are the parameters that this alias will take.

The second line states that we are going to use the “inet” family (IPv4), and if it is not specified, it is the default value. Another option is “inet6” for the IPv6 protocol.

In the third line, the virtual host identifier is established. Each service IP address must have the same value in all the nodes where it is configured. If we had more service addresses, this field should change. The allowed values are from 1 to 255.

The fourth line establishes the authentication password with which the participating nodes will be identified in a “vhid“. This key does not set any encryption.

In the fifth line, the IP address that we want as the service IP address is established.

Finally, and only for the secondary node, the value of “advskew” is set to 100. This value introduces a delay when the node is “announced” as a CARP node, modifying its order of precedence; and it is useful when we want to force a primary node automatically, or there are multiple secondary nodes.

At this time the change must be applied in each node.

For the first node we will execute:

And, in the second:

We can verify the correct functioning in several ways.

Using the “ifconfig em0” command in each node:

If we observe the last line, in the first server it shows “CARP: MASTER” and, in the second, “CARP: BACKUP”.

Another option, where we will also see more information such as the choice of the “MASTER” node, state transitions, etc., is the / var / log / messages file of each node:

Verifying functionality

With the configuration already done and activated and, the nodes defined in master and slave, it is time to make the necessary tests to verify that the behavior is appropriate.

We’ll use the “ping” command from some system on the same 192.0.2.100/24 subnet to check the service IP address availability:

The first real test we will do is verify what happens when the master node becomes unavailable due to a reboot or power failure, and the second one consists of network failure simulation.

In case of active node reboot

Connected to both nodes, on the master we’ll reboot the server, and on the second we check what happens:

The backup node becomes the master and the service IP address is assigned to it:

The service IP address continues accessible from other systems on the network. The following output shows what perception a client had  during the process since the first node failed until the secondary took the control and started giving service:

As can be seen, the virtual IP address has not been available for about 3 seconds.

In case of physical network failure

Is possible to simulate the loss of a network interface with the following command:

We can verify that the slave node detects this fault in the master:

As in the previous case, the second server becomes MASTER:

El estado actual del servicio sería como muestra la siguiente imagen:

The current service status is like the following image:


If we disable the network interface on the second node by:

Having no enabled interface (either the first node or the second), the service would no longer be accessible.

Forcing a node as primary

We may want a particular node to always be the primary node.

For this task, we can use an automatic configuration that consists of adding a line to the /etc/sysctl.conf file of the node that we want it to be:

If we do not want to restart the node at this time, we will activate the change in the following way:

We can also temporarily set a node as primary using the following command in the current MASTER node:

Final remarks

For simplicity along these two articles related to the service IP address, we used a single physical network interface for both server management and providing the service.

It is advisable to use multiple physical network interfaces, each for a task. Ideally the physical network interface “em0” will be used to provide the service, while another network interface “em1” will be used to administer the server and where CARP exchanges the status of both nodes.

Similarly, for simplicity, we used only one connection for each task. On production environments, where high availability is essential, must have second network links using “link aggregation” with separate network cards; in the future, I will post about it.

High Availability on Linux: The service IP address, part 1

In the previous post we saw what is high availability and what we address when a consumer tries to access a service: The closest possible availability time to 100%.

This entry describes what is the service IP address -entry point to them- and how to setup it using two servers.

To implement the things described under these lines is necessary to have two physical or virtual servers and a Linux distribution installed on them.

In order to avoid excessively long entries, I splitted this one into two parts. This first part is an introduction and preparation of systems, while the second shows how to setup the proper service IP address.

The service IP address

When we access to a service the connection is made to an IP address, in a direct way (192.0.2.100) or through a host name (www.example.com).

Suppose we want to access to a web page (http://www.example.com) and that its associated IP address is “192.0.2.100”. This IP address, through which the page is accessible, is called the “service IP address”.

The objective is to have this IP address always available. For consumers the perception should be like the following image:

To achieve it, having a minimum of two servers, one will have the service IP address assigned and the other will be waiting just in case the first one fails to take it.

If server “server-1” wasn’t functional, then server “server-2” would use the service IP address. The service will be degraded -one component in fail state- but operative and accesible for the consumer.

If no server is available the service cannot be given as the service IP address cannot be assigned to none of them.

The consumer can’t access the service. It perception will vary:

The method which allows a server to use a service IP address previously assigned to other one when it becomes unavailable is called “IP failover”.

Multiple standard protocols which implements IP failover methods exists, being VRRP the most famous of them and the one we will be using.

Required setup

Before we start we must have the necessary material prepared.

We will use two servers called “server-1” and “server-2”. I have used Debian 8, but this configuration is practically identical on other distributions like CentOS except where network configuration file path and software installation command varies.

Both servers are connected through “eth0” network interface to a switch and belongs to 192.0.2.0/24 subnet. The switch is connected to a router whose IP address is 192.0.2.1 and acts as a gateway.

The following diagram shows the equipment interconnection:

We will note both servers information for further reference:

System preparation

In Linux multiple VRRP protocol implementations exists. On these articles we are going to use “keepalived” for its extra functionality, but another valid option would be “vrrpd“.

The following steps must be done in both servers.

Depending on the choosen distribution  we may install “keepalived” with “apt” or “yum”:

We also need to load and configure the “dummy” kernel module on the server. It provides a virtual network interface where the service IP address will be assigned.

To load and make it permanent across server reboots we need to run:

To configure it and make it permanent across server reboots we need to run:

Once loaded we will verify that “dummy0″ interface is available:

We’ll leave the interface ready to use:

We need a last change on the operating system: We should allow processes (programs running in the background) to listen network requests on an IP address that doesn’t belong to them.

As the service IP address will be assigned to an unique server, the other one doesn’t have it, so it won’t allow programs to use it to listen for data. That implies,  once the service IP address changes from server, accessing to the server which has the address in such moment and run the necessary programs that, now successfully, can use such IP address.

This will limit us so much and we don’t met the “high availability” premise -The service IP address changes from one server to another, but programs won’t run without manual (human) intervention- so we will configure the system to allow processes to listen on any IP address, even if doesn’t belong to it.

At this point we will verify the following:

– Keepalived software is installed:

– The “dummy” module is loaded and configured:

– The “dummy0” interface exists and is ready to listen:

– The system has the net.ipv4.ip_nonlocal_bind variable set to 1:

Once everything has been checked, we can start configuring the service IP address, but this will be on the next entry on this serie.

High availability: Introduction

This entry is the first one of multiples I will write about high availability systems, being this one an introduction to necessary basic concepts to understand what is and why is it for. The articles will be focused on software high availability.

The entry will serve both newcomers or people managing computer services which aren’t technical and those introduced into the subject who wants to refresh their concepts or correct me on that he/she believes appropiate.

As we move forward on entries, they’ll become more technical and will require a deeper knowledge. Terminal usage on a Linux distribution, software installation and file editing will become essential requisites.

Along them some services will be configured on high availability, like web servers o database servers, in a way the concepts can be captured into something tangible and that can be implemented with a theoretical component.

After this introduction, lets start.

A computer services world

Today’s computing cannot be conceived without that so called “services”. Services are a mix of software and hardware running 100% of the time, permanently connected to a computer network and whose mission is information transformation and transmission.

If any of the software or hardware aren’t working, then the service cannot be used and we cannot get or transform the information we want.

A service will be very simple such as an unique program on an unique server (or domestic PC) or very complex to be formed with multiple computer programs and hardware. Very popular sites like Google or Facebook runs many programs on thousands of servers and other gear to serve their search, maps, photograpy, social network, etc. services.

When we talk about services we refer to all necessary things (software, hardware, etc.) that composes it to be used for.

Some software used to serve servicios will be “Apache HTTPD Server” which allows to serve web pages with an unique program, or “Postfix” which serves email and uses multiple programs for it. They both can be run on an unique server or domestic PC.

What is availability?

Before we can answer what is high availability, we must know what is availability.

This term, on compute world, refers than an existing service is accesible. A typical service will be a web server, where there are pages stored and accessed from web browsers like this blog.

When an user through his/her web browser tries to visit an existing web site -we correctly wrote it- and it doesn’t load, we say it’s “unavailable” and we cannot view it.

Therefore, “availability” is what allows an existing service being accessed and used without inconvenience.

Availability time

Availability can be measured with a yearly percentage, in minutes. From 0% (0 minutes) to 100% (525600 minutes, approximately). An 1% corresponds to 5260 minutes or 87.6 hours or 3.65 days.

This percentage is called “availability time” or “(service) uptime”. And the time the service hasn’t been available is called “downtime”.

In some services it’s calculated from the first day of the year and on others it’s calculated in annual periods since its activation day.

What is high availability?

We can define the high availability as the used technique to provide uninterrupted service, allowing one or more elements (software, servers, network gear, etc.) of such service to be on failure state without impact on the operation or it is not noticeable to the end user.

High availability is achieved through redundancy of the elements of the service: network gear (switches, routers), servers and server components (two or more hard disks, two or more ECC type RAM memory, two or more processors, two or more power supplies) connected to two or more electrical sockets on two or more power strips -each one from an independent electrical supplier-, etc. And most importantly, the programs that provide the service are ready in all that redundant infrastructure to operate.

There is also an article dedicated to high availability in Wikipedia.

Why is it high availability for?

From an user point of view, thanks to high availability we can access a service at any time and put it to use.

For a professional or business, the high availability translates to a better professional image. What will happen if our users cannot access to our website? They cannot see it. And, what will happen if, well, we are hosting providers and let many users with their websites unavailable? Many users that have placed their trust in our business will see how over a time interval their websites aren’t accesibles with the harm that will cause them; and to the business if it must compensate economically for such unavailability.

It can be worse: Lets imagine a bank service responsible of moving money between entities stops working; or the service which gives employee payroll; or the service information exchange between medical consultations.

High availability allows services to be always available. To a user lets enjoy them, and a professional or business ensure their services can be always enjoyed the maximum possible time, trying to be very close to 100%.