In every work environment with which we work, certain servers absolutely always must be up and running for the business to keep functioning smoothly. These servers provide services that always need to be available—whether it be a database, DHCP, DNS, file, Web, firewall or mail server.
A cornerstone of any service that always needs be up with no downtime is being able to transfer the service from one system to another gracefully. The magic that makes this happen on Linux is a service called Heartbeat. Heartbeat is the main product of the High-Availability Linux Project.
Heartbeat is very flexible and powerful. In this article, we tested only basic active/passive cluster with two members, where the active server is providing the services and the passive server is waiting to take over if necessary.
In the past, clusters for high-availability (HA) solutions were expensive and usually required proprietary hardware and software support. Today, with the availability of Heartbeat solution, users can build a cost-effective, high-availability environment for their business-critical applications.
The heartbeat mechanism is used to monitor the availability and health of cluster nodes. The availability of multiple heartbeat paths reduces the chance of losing communication between nodes. In general, if a heartbeat is not received along any channel after a predefined amount of time (typically a few heartbeat intervals), the remaining cluster nodes assume the silent node is dead.
The two main channels used for heartbeat message transfers are Ethernet and serial lines. Fibre Channel introduces a third option. A heartbeat can be sent through the IP over Fibre Channel protocol, which, in addition to providing heartbeat packets, enables a cluster node to quickly recognize that it has lost its connection to shared storage. The figure illustrates the possible heartbeat connectivity between nodes.
This configuration is done to test Heartbeat services on a 2-node cluster environment. For this, we have used two centos 6 configured on virtual machine on centos host computer.
Select the suitable network adapter and select the IP assignment option.
|Primary Machine||Secondary Machine|
|IP: 192.168.31.100 /24||IP: 192.168.13.150 /24|
# yum install heartbeat
This command will connect to the internet and download the Heartbeat package from its repository.
After the package is downloaded, it will automatically get installed in the /etc folder under ha.d folder.
There are 3 essential files required to configure Heartbeat on your system.
Note: On install, these files are not found in the ha.d directory. They have to be copied from /usr/share/doc/heartbeat to the ha.d directory.
This file is where we specify the critical operation parameters for the working of the Heartbeat service. It tells heartbeat what types of media paths to use and how to configure them. The ha.cf in the source directory contains all the various options you can use.
Configuration parameters in this file are:
Specifies to use a broadcast heartbeat over the eth1 interface (replace with eth0, eth2, or whatever you use).
Sets the time between heartbeats to 2 seconds.
Time in seconds before issuing a “late heartbeat” warning in the logs.
Node is pronounced dead after 30 seconds.
With some configurations, the network takes some time to start working after a reboot. This is a separate “deadtime” to handle that case. It should be at least twice the normal deadtime.
Use port number 694 for bcast or ucast communication. This is the default, and the official IANA registered port number.
The master listed in the haresources file holds all the resources until a failover, at which time the slave takes over. When auto_failback is set to on once the master comes back online, it will take everything back from the slave. When set to off this option will prevent the master node from re-acquiring cluster resources after a failover.
Mandatory. Hostname of Primary machine in cluster.
Mandatory. Hostname of Secondary machine in cluster.
This is used to specify the path where Heartbeat’s debug logs will be stored.
This is used to specify the path where Heartbeat’s general logs will be stored.
Configuration of ha.cf in both machine
|Primary machine||Secondary machine|
Once you’ve got your ha.cf set up, you need to configure haresources. This is a list of resources that move from machine to machine as nodes go down and come up in the cluster.
Note: This file must be the same on both nodes!
Syntax: node-name ip-address/subnet/interface
The node name listed in front of the resource group information is the name of the preferred node to run the service. It is not necessarily the name of the current machine. If you are running auto_failback ON, then these services will be started up on the preferred nodes – any time they’re up. If you are running with auto_failback OFF, then the node information will be used in the case of a simultaneous start-up.
The given ip address is directed to an interface which has a route to the given address. This means you have to have a net route set up outside of the High-Availability structure.
The subnet mask for the IP alias that is created defaults to the same netmask as the route that is selected in the above step.
The interface for the IP address defaults to the same netmask as the route that is selected.
The authkeys file must be owned by root and be chmod 600. The actual format of the authkeys file is very simple; it’s only two lines. There is an auth directive with an associated method ID number, and there is a line that has the authentication method and the key that go with the ID number of the auth directive. There are three supported authentication methods: crc, md5 and sha1. Listing 1 shows an example. You can have more than one authentication method ID, but this is useful only when you are changing authentication methods or keys. Make the key long—it will improve security and you don’t have to type in the key ever again. If your heartbeat runs over a secure network, such as the crossover cable in our example, you’ll want to use crc. This is the cheapest method from a resources perspective. If the network is insecure, but you’re either not very paranoid or concerned about minimizing CPU resources, use md5. Finally, if you want the best authentication without regard for CPU resources, use sha1. It’s the hardest to crack.
2 sha1 Hi!
1 md5 Hello!
This must have exactly one auth directive at the beginning. ‘auth’
<number> <authmethod> [<authkey>]
I USED: auth 1
1 md5 Hello!
Once, you are through with the above configurations in both of the machines, use the following to start the Heartbeat daemon.
# /etc/init.d/heartbeat start
This command has to be run simultaneously on both the machines.
Following commands can also be used as required:
# chkconfig heartbeat on
Once, the Heartbeat services are running properly, you will see something like this on your screens:
After Heartbeat services are running on both machines, both the machines will monitor to each other.
To test if the services are actually working, we did the following test runs:
Primary machine (hostname: http://www.arthar1.com IP: 192.168.31.100/24) and secondary machine (hostname: http://www.arthar2.com IP: 192.168.31.150/24) are allowed to run their heartbeat services for a while.
A third machine (having an IP in the same range and same subnet mask) is made to ping the IP address of Primary machine. After a while, the Primary machine is physically isolated from the given network. (We did this by pulling off the Ethernet cord). As soon as the Primary machine is taken out, the ping windows displays ‘Request timed out..’ in reply to its requests.
After a given interval of time (defined by warntime, deadtime, initdead) the ping window s getting replies from the Primary machine’s IP again. What actually happened was, in the time interval, the Secondary machine got to know that its Primary machine is not sending heartbeats and therefore concludes that Primary is down. So the secondary acquires the resources of the Primary and starts serving any requests to the Primary’s IP.
Note: If it weren’t for Heartbeat, the ping window would have shown ‘Destination Host Unreachable’ when the primary was taken off.