Scenario: ========= We have 2 machines, each one with the following configuration: - RHEL 3.0 AS u3; - clumanager 1.2.22-2; - 2 NICS (Intel Gigabit): eth0 for public and eth1 for private network; - 1 HBA (QLogic 2200) attached to EMC storage; this is the physical model: +-------------+ | | | router | | 172.27.2.1 | | | +------+------+ | eth0 +---------------------+--------------------+ eth0 172.27.2.35 | | 172.27.2.36 | (virtual ip: 172.27.2.100) | | | +-------+------+ eth1 +-------+------+ | | 10.0.0.1 | | | linpro035ias +---------heartbeat---------+ linpro036ias | | | 10.0.0.2 | | +-------+------+ eth1 +-------+------+ hba | | hba +---------------------+--------------------+ | +------+------+ | | | EMC storage | | /dev/sdc3 | | | +-------------+ - the member names are linpro035ias (which points to 172.27.2.35) and linpro036ias (which points to 172.27.2.36); - the virtual ip (172.27.2.100) associated with the service in the cluster; - the heartbeat is broadcast; - the tiebreaker is network (ip 172.27.2.1, the router/gateway); - there's a crossover cable for private networking (eth1); The problem: ============ If clumembd%broadcast_primary_only is not defined (or set to no, the default setting), heartbeat packets are sent over eth0 (public) and eth1 (private), but if I unplug the crossover cable, the cluster continues as if nothing were happened. If clumembd%broadcast_primary_only is set to yes, heartbeat packets are sent just over eth0. So, I conclude that a private network between the two nodes is not being used. Steps to reproduce: =================== 01. set eth0 on first machine to 172.27.2.35; on the second machine to 172.27.2.36 (both connected to a router/gateway); 02. set eth1 on first machine to 10.0.0.1; on the second machine to 10.0.0.2 (connected using a crossover cable); 03. set a cluster service to use httpd (/etc/init.d/httpd); 04. set a service ip address to 172.27.2.100; 05. set a device (/dev/sdc3, attached to the storage), to mount /u02 as ext3 06. /u02 must contain the www directory (mv /var/www /u02); 07. /etc/httpd/conf/httpd.conf must be edited to replace /var/www to /u02/www; 08. in the Cluster Daemon Properties, enable Broadcast Heartbeating and Network Tiebreaker (172.27.2.1 -> the router/gateway); 09. edit /etc/syslog.conf and append the following line: local4.* /var/log/cluster and restart the syslog service; 09. start the rawdevices service; 10. start the clumanager service; What happened after you performed the steps above? ================================================== 1. if the crossover cable is unplugged (that should be used for heartbeating), nothing happens; it can be monitored using 'tail -f /var/log/cluster'; 2. if the crossover cable is plugged again, and clumembd%broadcast_primary_only is set to yes ( cludb -put clumembd%broadcast_primary_only yes ) the heartbeat packets go through eth0 (enhancement requested by Lon Hohberger), so eth1 is not used. What should have happened instead? ================================== Lon Hohberger said on => https://bugzilla.redhat.com/bugzilla/show_bug.cgi? id=144838 "Cluster Manager requires that all members coexist on the same fully connected subnet and that the link(s) used for cluster communication are the same link(s) used to monitor the tiebreaker IP address." So, if the link used for cluster communication must be the same link used to monitor the tiebreaker IP address: a) packets used to attend the service httpd go through eth0; b) packets used to monitor the tiebreaker IP address go through eth0; c) heartbeat packets go through eth0; d) eth1 (that should be used for heartbeating) is useless (and never used with clumembd%broadcast_primary_only set to yes); So, the "clumembd%broadcast_primary_only" set to yes will make the service packets (httpd), heartbeating packets and tiebreaker monitoring packets go through all the same physical interface (ie eth0). I think that we should have a parameter, or an option in the cluster configuration GUI, to specify that broadcast heartbeating will use network interface X (ie eth1), like this: # cludb -put clumembd%broadcast_interface eth1
Expected behavior for all cases. I've attempted to explain this in a general manner here: http://people.redhat.com/lhh/network-stuff.html For additional assistance, please contact Red Hat Support for additional configuration assistance. http://www.redhat.com/apps/support