Red Hat Bugzilla – Bug 1400172
heartbeat: IPsrcaddr: fails unsetting due to duplicate route lines
Last modified: 2017-08-01 10:57:40 EDT
Description of problem: The 'start' operation of /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr runs 'ip route change' to set the source route. This adds a second route line for the local network. This duplicate entry then confuses the 'stop' operation of the script. Version-Release number of selected component (if applicable): 3.9.5-54.el7_2.17 How reproducible: Always. Steps to Reproduce: Run: export OCF_ROOT=/usr/lib/ocf export PATH="/usr/sbin:/sbin:$PATH" export OCF_RESKEY_ipaddress=192.168.0.253 export OCF_RESKEY_cidr_netmask=20 export OCF_RESKEY_nic=eth0 export OCF_RESKEY_ip=192.168.0.253 export OCF_RESKEY_cidr_netmask=20 /usr/lib/ocf/resource.d/heartbeat/IPaddr2 start ip route show dev eth0 /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr start ip route show dev eth0 /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr stop ip route show dev eth0 /usr/lib/ocf/resource.d/heartbeat/IPaddr2 stop Actual results: default via 192.168.0.1 proto static metric 100 192.168.0.0/20 proto kernel scope link src 192.168.0.196 metric 100 default via 192.168.0.1 proto static src 192.168.0.253 metric 100 192.168.0.0/20 scope link src 192.168.0.253 192.168.0.0/20 proto kernel scope link src 192.168.0.196 metric 100 Error: either "to" is duplicate, or "192.168.0.0/20" is a garbage. ocf-exit-reason:command 'ip route replace 192.168.0.0/20 192.168.0.0/20 dev eth0' failed default via 192.168.0.1 proto static src 192.168.0.253 metric 100 192.168.0.0/20 scope link src 192.168.0.253 192.168.0.0/20 proto kernel scope link src 192.168.0.196 metric 100 Expected results: This is what I get after applying the patch mentioned below. default via 192.168.0.1 dev eth0 proto static metric 100 192.168.0.0/20 dev eth0 proto kernel scope link src 192.168.0.196 metric 100 default via 192.168.0.1 dev eth0 proto static src 192.168.0.253 metric 100 192.168.0.0/20 dev eth0 scope link src 192.168.0.253 192.168.0.0/20 dev eth0 proto kernel scope link src 192.168.0.196 metric 100 default via 192.168.0.1 proto static metric 100 192.168.0.0/20 scope link 192.168.0.0/20 proto kernel scope link src 192.168.0.196 metric 100 Additional info: The fix I applied was to further filter the output of ip route: --- /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr.orig 2016-11-30 15:43:23.896352263 +0000 +++ /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr 2016-11-30 15:43:39.419058286 +0000 @@ -469,7 +469,7 @@ } INTERFACE=`echo $findif_out | awk '{print $1}'` -NETWORK=`ip route list dev $INTERFACE scope link match $ipaddress|grep -o '^[^ ]*'` +NETWORK=`ip route list dev $INTERFACE scope link proto kernel match $ipaddress|grep -o '^[^ ]*'` case $1 in start) srca_start $ipaddress
Tested and working patch: https://github.com/ClusterLabs/resource-agents/pull/904
Thanks for that. However, I realized that things are a bit more complicated. The reason we get a duplicate route line is that 'ip route replace' was run with parameters that are different from the actual route. This causes a new route to be created. We replaced the parsing of the routing table. A new patch will shortly be added.
Created attachment 1234728 [details] Fix route editing for IPSrcaddr.sh
(In reply to Tzafrir Cohen from comment #4) > Created attachment 1234728 [details] > Fix route editing for IPSrcaddr.sh Can you send me some more information of your setup? I dont see any issues when I just change the NETWORK= line (so it seems the first part of my patch isnt necessary), so I guess that should be enough to solve the issue unless there's some special case I'm not hitting. -NETWORK=`ip route list dev $INTERFACE scope link match $ipaddress|grep -o '^[^ ]*'` +NETWORK=`ip route list dev $INTERFACE scope link proto kernel match $ipaddress|grep -o '^[^ ]*'`
I have verified that source address is correctly added to default route when IPsrcaddr agent is running in resource-agents-3.9.5-80 ---- * configure cluster with ipaddr and ipaddrsrc in a group [1] * disable the group [2] before the patch (resource-agents-3.9.5-80.el7) =============================================== [root@host-035 ~]# pcs resource ... Resource Group: vip-g vip (ocf::heartbeat:IPaddr2): Stopped (disabled) vip-src (ocf::heartbeat:IPsrcaddr): Stopped (disabled) [root@host-035 ~]# ip ro > default via 10.15.107.254 dev eth0 proto static metric 100 10.15.104.0/22 dev eth0 scope link 10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100 [root@host-035 ~]# pcs resource enable vip-g [root@host-035 ~]# pcs resource ... Resource Group: vip-g vip (ocf::heartbeat:IPaddr2): Started host-035 vip-src (ocf::heartbeat:IPsrcaddr): Started host-035 [root@host-035 ~]# ip ro > default via 10.15.107.254 dev eth0 proto static metric 100 10.15.104.0/22 dev eth0 scope link 10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100 after the patch (resource-agents-3.9.5-104.el7) =============================================== [root@host-035 ~]# pcs resource ... Resource Group: vip-g vip (ocf::heartbeat:IPaddr2): Stopped (disabled) vip-src (ocf::heartbeat:IPsrcaddr): Stopped (disabled) [root@host-035 ~]# ip ro default via 10.15.107.254 dev eth0 proto static metric 100 10.15.104.0/22 dev eth0 scope link 10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100 [root@host-035 ~]# pcs resource enable vip-g [root@host-035 ~]# pcs resource ... Resource Group: vip-g vip (ocf::heartbeat:IPaddr2): Started host-035 vip-src (ocf::heartbeat:IPsrcaddr): Started host-035 [root@host-035 ~]# ip ro > default via 10.15.107.254 dev eth0 proto static src 10.15.107.150 metric 100 10.15.104.0/22 dev eth0 scope link src 10.15.107.150 10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100 ----- > (1) pcs-config [root@host-034 ~]# pcs config Cluster Name: STSRHTS3691 Corosync Nodes: host-034 host-035 Pacemaker Nodes: host-034 host-035 Resources: Clone: dlm-clone Meta Attrs: interleave=true ordered=true Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s) start interval=0s timeout=90 (dlm-start-interval-0s) stop interval=0s timeout=100 (dlm-stop-interval-0s) Clone: clvmd-clone Meta Attrs: interleave=true ordered=true Resource: clvmd (class=ocf provider=heartbeat type=clvm) Attributes: with_cmirrord=1 Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s) start interval=0s timeout=90 (clvmd-start-interval-0s) stop interval=0s timeout=90 (clvmd-stop-interval-0s) Group: vip-g Resource: vip (class=ocf provider=heartbeat type=IPaddr2) Attributes: cidr_netmask=22 ip=10.15.107.150 Operations: monitor interval=10s timeout=20s (vip-monitor-interval-10s) start interval=0s timeout=20s (vip-start-interval-0s) stop interval=0s timeout=20s (vip-stop-interval-0s) Resource: vip-src (class=ocf provider=heartbeat type=IPsrcaddr) Attributes: cidr_netmask=22 ipaddress=10.15.107.150 Operations: monitor interval=10 timeout=20s (vip-src-monitor-interval-10) start interval=0s timeout=20s (vip-src-start-interval-0s) stop interval=0s timeout=20s (vip-src-stop-interval-0s) Stonith Devices: Resource: fence-host-034 (class=stonith type=fence_xvm) Attributes: delay=5 pcmk_host_check=static-list pcmk_host_list=host-034 pcmk_host_map=host-034:host-034.virt.lab.msp.redhat.com Operations: monitor interval=60s (fence-host-034-monitor-interval-60s) Resource: fence-host-035 (class=stonith type=fence_xvm) Attributes: pcmk_host_check=static-list pcmk_host_list=host-035 pcmk_host_map=host-035:host-035.virt.lab.msp.redhat.com Operations: monitor interval=60s (fence-host-035-monitor-interval-60s) Fencing Levels: Location Constraints: Ordering Constraints: start dlm-clone then start clvmd-clone (kind:Mandatory) Colocation Constraints: clvmd-clone with dlm-clone (score:INFINITY) Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: STSRHTS3691 dc-version: 1.1.16-10.el7-94ff4df have-watchdog: false last-lrm-refresh: 1497508285 no-quorum-policy: freeze Quorum: Options: > (2) pcs-status [root@host-034 ~]# pcs status Cluster name: STSRHTS3691 Stack: corosync Current DC: host-035 (version 1.1.16-10.el7-94ff4df) - partition with quorum Last updated: Thu Jun 15 01:46:02 2017 Last change: Thu Jun 15 01:45:59 2017 by root via cibadmin on host-034 2 nodes configured 8 resources configured (4 DISABLED) Online: [ host-034 host-035 ] Full list of resources: fence-host-034 (stonith:fence_xvm): Started host-035 fence-host-035 (stonith:fence_xvm): Started host-034 Clone Set: dlm-clone [dlm] Started: [ host-034 host-035 ] Clone Set: clvmd-clone [clvmd] Started: [ host-034 host-035 ] Resource Group: vip-g vip (ocf::heartbeat:IPaddr2): Stopped (disabled) vip-src (ocf::heartbeat:IPsrcaddr): Stopped (disabled) Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1844