Description of problem: fence_xvmd cannot start if default route is set. Version-Release number of selected component (if applicable): cman-2.0.84-2.el5 How reproducible: always Steps to Reproduce: 1. del or not set default route 2. start fence_xvmd by manual (with -f option not to be daemon) Here is the log: [root@cluster-1 ~]# rpm -qf /sbin/fence_xvmd cman-2.0.84-2.el5 [root@cluster-1 ~]# fence_xvmd -LX -fddd Debugging threshold is now 3 -- args @ 0xbfd70fbc -- args->addr = 225.0.0.12 args->domain = (null) args->key_file = /etc/cluster/fence_xvm.key args->op = 2 args->hash = 2 args->auth = 2 args->port = 1229 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 259 args->debug = 3 -- end args -- Reading in key file /etc/cluster/fence_xvm.key into 0xbfd6ffbc (4096 max size) Actual key length = 4096 bytesMy Node ID = 1 Domain UUID Owner State ------ ---- ----- ----- Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 [root@cluster-1 ~]# ip route show 192.168.12.0/24 dev virbr0 proto kernel scope link src 192.168.12.1 192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.11 169.254.0.0/16 dev eth0 scope link default via 192.168.122.1 dev eth0 [root@cluster-1 ~]# ip route del default via 192.168.122.1 dev eth0 [root@cluster-1 ~]# ip route show 192.168.12.0/24 dev virbr0 proto kernel scope link src 192.168.12.1 192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.11 169.254.0.0/16 dev eth0 scope link [root@cluster-1 ~]# fence_xvmd -LX -fddd Debugging threshold is now 3 -- args @ 0xbf93662c -- args->addr = 225.0.0.12 args->domain = (null) args->key_file = /etc/cluster/fence_xvm.key args->op = 2 args->hash = 2 args->auth = 2 args->port = 1229 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 259 args->debug = 3 -- end args -- Reading in key file /etc/cluster/fence_xvm.key into 0xbf93562c (4096 max size) Actual key length = 4096 bytesFailed to bind multicast receive socket to 225.0.0.12: No such device Check network configuration. Could not set up multicast listen socket [root@cluster-1 ~]# ip route add default via 192.168.122.1 dev eth0 [root@cluster-1 ~]# fence_xvmd -LX -fddd Debugging threshold is now 3 -- args @ 0xbfb76dbc -- args->addr = 225.0.0.12 args->domain = (null) args->key_file = /etc/cluster/fence_xvm.key args->op = 2 args->hash = 2 args->auth = 2 args->port = 1229 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 259 args->debug = 3 -- end args -- Reading in key file /etc/cluster/fence_xvm.key into 0xbfb75dbc (4096 max size) Actual key length = 4096 bytesMy Node ID = 1 Domain UUID Owner State ------ ---- ----- ----- Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 [root@cluster-1 ~]# Actual results: it cannot start Expected results: it starts w/o any problems. Additional info: This probelm is because that interface to listen is not selected explicitly in both ipv4 and ipv6 code, I think. It's my understanding that listening interface is selected by linux kernel in this case and it should be the one to default gw if appropriate route to the multicast network to join is not set. (do_ip_setsockopt, ip_mc_join_group and other related functions in kernel/net/ipv4/ip_sockglue.c). Also, this behavior checks with the BSD variants' behavior noted in the multicast section in the very famous book, Unix networking programming. There is another problem behind this. There is no way to indicate fence_xvmd to select correct interface if host has multiple network interfaces. So, fence_xvmd might listen on wrong interface and cannot communicate with fence_xvm at all. To fix these problems, fence_xvmd must have the way to select correct interface to listen. I'll post a patch solves these problems at once.
Created attachment 314709 [details] A patch for fence_xvmd to add option to select correct network interface explicitly
Here is a log of fence_xvmd with my patch attached: [root@cluster-1 ~]# /var/tmp/cman-2.0.84-2-root-root/sbin/fence_xvmd -LX -fddddd -I virbr0 Debugging threshold is now 5 -- args @ 0xbfc095b8 -- args->addr = 225.0.0.12 args->domain = (null) args->key_file = /etc/cluster/fence_xvm.key args->op = 2 args->hash = 2 args->auth = 2 args->port = 1229 args->ifindex = 5 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 259 args->debug = 5 -- end args -- Reading in key file /etc/cluster/fence_xvm.key into 0xbfc085b8 (4096 max size) Actual key length = 4096 bytesSetting up ipv4 multicast receive (225.0.0.12:1229) Joining multicast group ipv4_recv_sk: success, fd = 3 My Node ID = 1 Domain UUID Owner State ------ ---- ----- ----- Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 [root@cluster-1 ~]# ip route del default via 192.168.122.1 dev eth0 [root@cluster-1 ~]# /var/tmp/cman-2.0.84-2-root-root/sbin/fence_xvmd -LX -fddddd -I virbr0 Debugging threshold is now 5 -- args @ 0xbf95b308 -- args->addr = 225.0.0.12 args->domain = (null) args->key_file = /etc/cluster/fence_xvm.key args->op = 2 args->hash = 2 args->auth = 2 args->port = 1229 args->ifindex = 5 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 259 args->debug = 5 -- end args -- Reading in key file /etc/cluster/fence_xvm.key into 0xbf95a308 (4096 max size) Actual key length = 4096 bytesSetting up ipv4 multicast receive (225.0.0.12:1229) Joining multicast group ipv4_recv_sk: success, fd = 3 My Node ID = 1 Domain UUID Owner State ------ ---- ----- ----- Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 [root@cluster-1 ~]# If no interface is specified with the "-I" option, default interface #0 corresponding to INADDR_ANY in ipv4, will be used. In this case, default route is not set so that fence_xvmd does not start as expected. [root@cluster-1 ~]# /var/tmp/cman-2.0.84-2-root-root/sbin/fence_xvmd -LX -fddddd Debugging threshold is now 5 -- args @ 0xbffea1a8 -- args->addr = 225.0.0.12 args->domain = (null) args->key_file = /etc/cluster/fence_xvm.key args->op = 2 args->hash = 2 args->auth = 2 args->port = 1229 args->ifindex = 0 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 259 args->debug = 5 -- end args -- Reading in key file /etc/cluster/fence_xvm.key into 0xbffe91a8 (4096 max size) Actual key length = 4096 bytesSetting up ipv4 multicast receive (225.0.0.12:1229) Joining multicast group Failed to bind multicast receive socket to 225.0.0.12: No such device Check network configuration. Could not set up multicast listen socket [root@cluster-1 ~]# ip route add default via 192.168.122.1 dev eth0 [root@cluster-1 ~]#
(In reply to comment #2) > Here is a log of fence_xvmd with my patch attached: s/attached/applied/ :P
Merged to master and RHEL5 branches. Master: http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=5fd95228e8b58a6b42f78a01a480e686e53086c3 RHEL5: http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=a4f4f109ae329a96632f776d1ce166241f5aa1e1
Is this patch scheduled for inclusion in RHEL5.2? If so, do you have a release date? I'm willing to test this, if required. Kind regards, Jasper Capel
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0189.html