Bug 1400172
| Summary: | heartbeat: IPsrcaddr: fails unsetting due to duplicate route lines | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Tzafrir Cohen <tzafrir> | ||||
| Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.2 | CC: | agk, cluster-maint, fdinitto, mnovacek, phagara, sbradley, tzafrir | ||||
| Target Milestone: | rc | Flags: | tzafrir:
needinfo-
|
||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | resource-agents-3.9.5-88.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-08-01 14:57:40 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Tested and working patch: https://github.com/ClusterLabs/resource-agents/pull/904 Thanks for that. However, I realized that things are a bit more complicated. The reason we get a duplicate route line is that 'ip route replace' was run with parameters that are different from the actual route. This causes a new route to be created. We replaced the parsing of the routing table. A new patch will shortly be added. Created attachment 1234728 [details]
Fix route editing for IPSrcaddr.sh
(In reply to Tzafrir Cohen from comment #4) > Created attachment 1234728 [details] > Fix route editing for IPSrcaddr.sh Can you send me some more information of your setup? I dont see any issues when I just change the NETWORK= line (so it seems the first part of my patch isnt necessary), so I guess that should be enough to solve the issue unless there's some special case I'm not hitting. -NETWORK=`ip route list dev $INTERFACE scope link match $ipaddress|grep -o '^[^ ]*'` +NETWORK=`ip route list dev $INTERFACE scope link proto kernel match $ipaddress|grep -o '^[^ ]*'`
I have verified that source address is correctly added to default route when
IPsrcaddr agent is running in resource-agents-3.9.5-80
----
* configure cluster with ipaddr and ipaddrsrc in a group [1]
* disable the group [2]
before the patch (resource-agents-3.9.5-80.el7)
===============================================
[root@host-035 ~]# pcs resource
...
Resource Group: vip-g
vip (ocf::heartbeat:IPaddr2): Stopped (disabled)
vip-src (ocf::heartbeat:IPsrcaddr): Stopped (disabled)
[root@host-035 ~]# ip ro
> default via 10.15.107.254 dev eth0 proto static metric 100
10.15.104.0/22 dev eth0 scope link
10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100
[root@host-035 ~]# pcs resource enable vip-g
[root@host-035 ~]# pcs resource
...
Resource Group: vip-g
vip (ocf::heartbeat:IPaddr2): Started host-035
vip-src (ocf::heartbeat:IPsrcaddr): Started host-035
[root@host-035 ~]# ip ro
> default via 10.15.107.254 dev eth0 proto static metric 100
10.15.104.0/22 dev eth0 scope link
10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100
after the patch (resource-agents-3.9.5-104.el7)
===============================================
[root@host-035 ~]# pcs resource
...
Resource Group: vip-g
vip (ocf::heartbeat:IPaddr2): Stopped (disabled)
vip-src (ocf::heartbeat:IPsrcaddr): Stopped (disabled)
[root@host-035 ~]# ip ro
default via 10.15.107.254 dev eth0 proto static metric 100
10.15.104.0/22 dev eth0 scope link
10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100
[root@host-035 ~]# pcs resource enable vip-g
[root@host-035 ~]# pcs resource
...
Resource Group: vip-g
vip (ocf::heartbeat:IPaddr2): Started host-035
vip-src (ocf::heartbeat:IPsrcaddr): Started host-035
[root@host-035 ~]# ip ro
> default via 10.15.107.254 dev eth0 proto static src 10.15.107.150 metric 100
10.15.104.0/22 dev eth0 scope link src 10.15.107.150
10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100
-----
> (1) pcs-config
[root@host-034 ~]# pcs config
Cluster Name: STSRHTS3691
Corosync Nodes:
host-034 host-035
Pacemaker Nodes:
host-034 host-035
Resources:
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
start interval=0s timeout=90 (dlm-start-interval-0s)
stop interval=0s timeout=100 (dlm-stop-interval-0s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Attributes: with_cmirrord=1
Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
start interval=0s timeout=90 (clvmd-start-interval-0s)
stop interval=0s timeout=90 (clvmd-stop-interval-0s)
Group: vip-g
Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=22 ip=10.15.107.150
Operations: monitor interval=10s timeout=20s (vip-monitor-interval-10s)
start interval=0s timeout=20s (vip-start-interval-0s)
stop interval=0s timeout=20s (vip-stop-interval-0s)
Resource: vip-src (class=ocf provider=heartbeat type=IPsrcaddr)
Attributes: cidr_netmask=22 ipaddress=10.15.107.150
Operations: monitor interval=10 timeout=20s (vip-src-monitor-interval-10)
start interval=0s timeout=20s (vip-src-start-interval-0s)
stop interval=0s timeout=20s (vip-src-stop-interval-0s)
Stonith Devices:
Resource: fence-host-034 (class=stonith type=fence_xvm)
Attributes: delay=5 pcmk_host_check=static-list pcmk_host_list=host-034 pcmk_host_map=host-034:host-034.virt.lab.msp.redhat.com
Operations: monitor interval=60s (fence-host-034-monitor-interval-60s)
Resource: fence-host-035 (class=stonith type=fence_xvm)
Attributes: pcmk_host_check=static-list pcmk_host_list=host-035 pcmk_host_map=host-035:host-035.virt.lab.msp.redhat.com
Operations: monitor interval=60s (fence-host-035-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Ordering Constraints:
start dlm-clone then start clvmd-clone (kind:Mandatory)
Colocation Constraints:
clvmd-clone with dlm-clone (score:INFINITY)
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: STSRHTS3691
dc-version: 1.1.16-10.el7-94ff4df
have-watchdog: false
last-lrm-refresh: 1497508285
no-quorum-policy: freeze
Quorum:
Options:
> (2) pcs-status
[root@host-034 ~]# pcs status
Cluster name: STSRHTS3691
Stack: corosync
Current DC: host-035 (version 1.1.16-10.el7-94ff4df) - partition with quorum
Last updated: Thu Jun 15 01:46:02 2017
Last change: Thu Jun 15 01:45:59 2017 by root via cibadmin on host-034
2 nodes configured
8 resources configured (4 DISABLED)
Online: [ host-034 host-035 ]
Full list of resources:
fence-host-034 (stonith:fence_xvm): Started host-035
fence-host-035 (stonith:fence_xvm): Started host-034
Clone Set: dlm-clone [dlm]
Started: [ host-034 host-035 ]
Clone Set: clvmd-clone [clvmd]
Started: [ host-034 host-035 ]
Resource Group: vip-g
vip (ocf::heartbeat:IPaddr2): Stopped (disabled)
vip-src (ocf::heartbeat:IPsrcaddr): Stopped (disabled)
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1844 |
Description of problem: The 'start' operation of /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr runs 'ip route change' to set the source route. This adds a second route line for the local network. This duplicate entry then confuses the 'stop' operation of the script. Version-Release number of selected component (if applicable): 3.9.5-54.el7_2.17 How reproducible: Always. Steps to Reproduce: Run: export OCF_ROOT=/usr/lib/ocf export PATH="/usr/sbin:/sbin:$PATH" export OCF_RESKEY_ipaddress=192.168.0.253 export OCF_RESKEY_cidr_netmask=20 export OCF_RESKEY_nic=eth0 export OCF_RESKEY_ip=192.168.0.253 export OCF_RESKEY_cidr_netmask=20 /usr/lib/ocf/resource.d/heartbeat/IPaddr2 start ip route show dev eth0 /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr start ip route show dev eth0 /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr stop ip route show dev eth0 /usr/lib/ocf/resource.d/heartbeat/IPaddr2 stop Actual results: default via 192.168.0.1 proto static metric 100 192.168.0.0/20 proto kernel scope link src 192.168.0.196 metric 100 default via 192.168.0.1 proto static src 192.168.0.253 metric 100 192.168.0.0/20 scope link src 192.168.0.253 192.168.0.0/20 proto kernel scope link src 192.168.0.196 metric 100 Error: either "to" is duplicate, or "192.168.0.0/20" is a garbage. ocf-exit-reason:command 'ip route replace 192.168.0.0/20 192.168.0.0/20 dev eth0' failed default via 192.168.0.1 proto static src 192.168.0.253 metric 100 192.168.0.0/20 scope link src 192.168.0.253 192.168.0.0/20 proto kernel scope link src 192.168.0.196 metric 100 Expected results: This is what I get after applying the patch mentioned below. default via 192.168.0.1 dev eth0 proto static metric 100 192.168.0.0/20 dev eth0 proto kernel scope link src 192.168.0.196 metric 100 default via 192.168.0.1 dev eth0 proto static src 192.168.0.253 metric 100 192.168.0.0/20 dev eth0 scope link src 192.168.0.253 192.168.0.0/20 dev eth0 proto kernel scope link src 192.168.0.196 metric 100 default via 192.168.0.1 proto static metric 100 192.168.0.0/20 scope link 192.168.0.0/20 proto kernel scope link src 192.168.0.196 metric 100 Additional info: The fix I applied was to further filter the output of ip route: --- /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr.orig 2016-11-30 15:43:23.896352263 +0000 +++ /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr 2016-11-30 15:43:39.419058286 +0000 @@ -469,7 +469,7 @@ } INTERFACE=`echo $findif_out | awk '{print $1}'` -NETWORK=`ip route list dev $INTERFACE scope link match $ipaddress|grep -o '^[^ ]*'` +NETWORK=`ip route list dev $INTERFACE scope link proto kernel match $ipaddress|grep -o '^[^ ]*'` case $1 in start) srca_start $ipaddress