Bug 1400172

Summary: heartbeat: IPsrcaddr: fails unsetting due to duplicate route lines
Product: Red Hat Enterprise Linux 7 Reporter: Tzafrir Cohen <tzafrir>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.2CC: agk, cluster-maint, fdinitto, mnovacek, phagara, sbradley, tzafrir
Target Milestone: rcFlags: tzafrir: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-88.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 14:57:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix route editing for IPSrcaddr.sh none

Description Tzafrir Cohen 2016-11-30 15:45:33 UTC
Description of problem:
The 'start' operation of /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr runs 'ip route change' to set the source route. This adds a second route line for the local network. This duplicate entry then confuses the 'stop' operation of the script.

Version-Release number of selected component (if applicable):
3.9.5-54.el7_2.17

How reproducible:
Always.

Steps to Reproduce:
Run:

export OCF_ROOT=/usr/lib/ocf
export PATH="/usr/sbin:/sbin:$PATH"
export OCF_RESKEY_ipaddress=192.168.0.253
export OCF_RESKEY_cidr_netmask=20

export OCF_RESKEY_nic=eth0
export OCF_RESKEY_ip=192.168.0.253
export OCF_RESKEY_cidr_netmask=20

/usr/lib/ocf/resource.d/heartbeat/IPaddr2 start

ip route show dev eth0

/usr/lib/ocf/resource.d/heartbeat/IPsrcaddr start

ip route show dev eth0

/usr/lib/ocf/resource.d/heartbeat/IPsrcaddr stop

ip route show dev eth0

/usr/lib/ocf/resource.d/heartbeat/IPaddr2 stop


Actual results:


default via 192.168.0.1  proto static  metric 100 
192.168.0.0/20  proto kernel  scope link  src 192.168.0.196  metric 100 

default via 192.168.0.1  proto static  src 192.168.0.253  metric 100 
192.168.0.0/20  scope link  src 192.168.0.253 
192.168.0.0/20  proto kernel  scope link  src 192.168.0.196  metric 100 

Error: either "to" is duplicate, or "192.168.0.0/20" is a garbage.
ocf-exit-reason:command 'ip route replace 192.168.0.0/20
192.168.0.0/20 dev eth0' failed

default via 192.168.0.1  proto static  src 192.168.0.253  metric 100 
192.168.0.0/20  scope link  src 192.168.0.253 
192.168.0.0/20  proto kernel  scope link  src 192.168.0.196  metric 100 



Expected results:

This is what I get after applying the patch mentioned below.

default via 192.168.0.1 dev eth0  proto static  metric 100 
192.168.0.0/20 dev eth0  proto kernel  scope link  src 192.168.0.196  metric 100

default via 192.168.0.1 dev eth0  proto static  src 192.168.0.253  metric 100 
192.168.0.0/20 dev eth0  scope link  src 192.168.0.253 
192.168.0.0/20 dev eth0  proto kernel  scope link  src 192.168.0.196  metric 100

default via 192.168.0.1  proto static  metric 100 
192.168.0.0/20  scope link 
192.168.0.0/20  proto kernel  scope link  src 192.168.0.196  metric 100 

Additional info:

The fix I applied was to further filter the output of ip route:

--- /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr.orig    2016-11-30 15:43:23.896352263 +0000
+++ /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr 2016-11-30 15:43:39.419058286 +0000
@@ -469,7 +469,7 @@
 }
 
 INTERFACE=`echo $findif_out | awk '{print $1}'`
-NETWORK=`ip route list dev $INTERFACE scope link match $ipaddress|grep -o '^[^ ]*'`
+NETWORK=`ip route list dev $INTERFACE scope link proto kernel match $ipaddress|grep -o '^[^ ]*'`
 
 case $1 in
        start)          srca_start $ipaddress

Comment 2 Oyvind Albrigtsen 2016-12-21 14:37:25 UTC
Tested and working patch: https://github.com/ClusterLabs/resource-agents/pull/904

Comment 3 Tzafrir Cohen 2016-12-22 11:46:32 UTC
Thanks for that. However, I realized that things are a bit more complicated. 

The reason we get a duplicate route line is that 'ip route replace' was run with parameters that are different from the actual route. This causes a new route to be created.

We replaced the parsing of the routing table. A new patch will shortly be added.

Comment 4 Tzafrir Cohen 2016-12-22 11:51:33 UTC
Created attachment 1234728 [details]
Fix route editing for IPSrcaddr.sh

Comment 6 Oyvind Albrigtsen 2017-02-24 13:34:05 UTC
(In reply to Tzafrir Cohen from comment #4)
> Created attachment 1234728 [details]
> Fix route editing for IPSrcaddr.sh

Can you send me some more information of your setup?

I dont see any issues when I just change the NETWORK= line (so it seems the first part of my patch isnt necessary), so I guess that should be enough to solve the issue unless there's some special case I'm not hitting.

-NETWORK=`ip route list dev $INTERFACE scope link match $ipaddress|grep -o '^[^ ]*'`
+NETWORK=`ip route list dev $INTERFACE scope link proto kernel match $ipaddress|grep -o '^[^ ]*'`

Comment 8 michal novacek 2017-06-15 07:22:29 UTC
I have verified that source address is correctly added to default route when
IPsrcaddr agent is running in resource-agents-3.9.5-80

----

* configure cluster with ipaddr and ipaddrsrc in a group [1]
* disable the group [2]

before the patch (resource-agents-3.9.5-80.el7)
===============================================

[root@host-035 ~]# pcs resource
...
 Resource Group: vip-g
     vip        (ocf::heartbeat:IPaddr2):       Stopped (disabled)
     vip-src    (ocf::heartbeat:IPsrcaddr):     Stopped (disabled)

[root@host-035 ~]# ip ro
> default via 10.15.107.254 dev eth0 proto static metric 100 
10.15.104.0/22 dev eth0 scope link 
10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100 

[root@host-035 ~]# pcs resource enable vip-g
[root@host-035 ~]# pcs resource
...
 Resource Group: vip-g
     vip        (ocf::heartbeat:IPaddr2):       Started host-035
     vip-src    (ocf::heartbeat:IPsrcaddr):     Started host-035

[root@host-035 ~]# ip ro
> default via 10.15.107.254 dev eth0 proto static metric 100 
10.15.104.0/22 dev eth0 scope link 
10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100 

after the patch (resource-agents-3.9.5-104.el7)
===============================================

[root@host-035 ~]# pcs resource
...
 Resource Group: vip-g
     vip        (ocf::heartbeat:IPaddr2):       Stopped (disabled)
     vip-src    (ocf::heartbeat:IPsrcaddr):     Stopped (disabled)

[root@host-035 ~]# ip ro
default via 10.15.107.254 dev eth0 proto static metric 100 
10.15.104.0/22 dev eth0 scope link 
10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100 

[root@host-035 ~]# pcs resource enable vip-g
[root@host-035 ~]# pcs resource
...
 Resource Group: vip-g
     vip        (ocf::heartbeat:IPaddr2):       Started host-035
     vip-src    (ocf::heartbeat:IPsrcaddr):     Started host-035

[root@host-035 ~]# ip ro
> default via 10.15.107.254 dev eth0 proto static src 10.15.107.150 metric 100 
10.15.104.0/22 dev eth0 scope link src 10.15.107.150 
10.15.104.0/22 dev eth0 proto kernel scope link src 10.15.105.35 metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.35 metric 100


-----

> (1) pcs-config
[root@host-034 ~]# pcs config
Cluster Name: STSRHTS3691
Corosync Nodes:
 host-034 host-035
Pacemaker Nodes:
 host-034 host-035

Resources:
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
               start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
               start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
 Group: vip-g
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=22 ip=10.15.107.150
   Operations: monitor interval=10s timeout=20s (vip-monitor-interval-10s)
               start interval=0s timeout=20s (vip-start-interval-0s)
               stop interval=0s timeout=20s (vip-stop-interval-0s)
  Resource: vip-src (class=ocf provider=heartbeat type=IPsrcaddr)
   Attributes: cidr_netmask=22 ipaddress=10.15.107.150
   Operations: monitor interval=10 timeout=20s (vip-src-monitor-interval-10)
               start interval=0s timeout=20s (vip-src-start-interval-0s)
               stop interval=0s timeout=20s (vip-src-stop-interval-0s)

Stonith Devices:
 Resource: fence-host-034 (class=stonith type=fence_xvm)
  Attributes: delay=5 pcmk_host_check=static-list pcmk_host_list=host-034 pcmk_host_map=host-034:host-034.virt.lab.msp.redhat.com
  Operations: monitor interval=60s (fence-host-034-monitor-interval-60s)
 Resource: fence-host-035 (class=stonith type=fence_xvm)
  Attributes: pcmk_host_check=static-list pcmk_host_list=host-035 pcmk_host_map=host-035:host-035.virt.lab.msp.redhat.com
  Operations: monitor interval=60s (fence-host-035-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS3691
 dc-version: 1.1.16-10.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1497508285
 no-quorum-policy: freeze

Quorum:
  Options:

> (2) pcs-status
[root@host-034 ~]# pcs status
Cluster name: STSRHTS3691
Stack: corosync
Current DC: host-035 (version 1.1.16-10.el7-94ff4df) - partition with quorum
Last updated: Thu Jun 15 01:46:02 2017
Last change: Thu Jun 15 01:45:59 2017 by root via cibadmin on host-034

2 nodes configured
8 resources configured (4 DISABLED)

Online: [ host-034 host-035 ]

Full list of resources:

 fence-host-034 (stonith:fence_xvm):    Started host-035
 fence-host-035 (stonith:fence_xvm):    Started host-034
 Clone Set: dlm-clone [dlm]
     Started: [ host-034 host-035 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ host-034 host-035 ]
 Resource Group: vip-g
     vip        (ocf::heartbeat:IPaddr2):       Stopped (disabled)
     vip-src    (ocf::heartbeat:IPsrcaddr):     Stopped (disabled)

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Comment 9 errata-xmlrpc 2017-08-01 14:57:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1844