1504055 – IPsrcaddr fails to stop when ipaddress is on interface not managed by NetworkManager

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1504055 - IPsrcaddr fails to stop when ipaddress is on interface not managed by NetworkManager

Summary: IPsrcaddr fails to stop when ipaddress is on interface not managed by Network...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	resource-agents
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Oyvind Albrigtsen
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1546815 1647927 1757837
TreeView+	depends on / blocked

Reported:	2017-10-19 12:04 UTC by Ondrej Faměra
Modified:	2020-03-06 15:14 UTC (History)
CC List:	11 users (show)
Fixed In Version:	resource-agents-4.1.1-22.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1757837 (view as bug list)
Environment:
Last Closed:	2019-08-06 12:01:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3551741	0	None	None	None	2018-08-03 16:55:00 UTC
Red Hat Product Errata	RHBA-2019:2012	0	None	None	None	2019-08-06 12:02:01 UTC

Description Ondrej Faměra 2017-10-19 12:04:31 UTC

== Description of problem:
When the interface is not managed by NetworkManager the IPsrcaddr resource fails
to stop after successful IPsrcadd start and monitor.

== Version-Release number of selected component (if applicable):
resource-agents-3.9.5-105.el7 

== How reproducible:
Always

== Steps to Reproduce:
1. Configure interface ensX with IPADDR, GATEWAY and NM_CONTROLLED="no" (for example IPADDR=10.0.0.85, PREFIX=24, GATEWAY=10.0.0.1)
2. configure IPaddr2 resource (vip) with IP y.y.y.y in cluster with same subnet as the ensX that is different from IPADDR and GATEWAY (for example y.y.y.y=10.0.0.10)
3. Configure IPsrcaddr resource (vip_route) in cluster to use the IPaddr2 as default route (ipaddress=10.0.0.10 cidr_netmask=24)
4. Disable both resources and try debug-start,debug-monitor, debug-stop them on single node
  # pcs resource disable vip
  # pcs resource disable vip_route
  # pcs resource debug-start vip
  # pcs resource debug-start vip_route
  # pcs resource debug-monitor vip_route
  # pcs resource debug-stop vip_route

== Actual results:
  # pcs resource debug-start vip
  Operation start for vip (ocf:heartbeat:IPaddr2) returned 0

  # pcs resource debug-start vip_route
  Operation start for vip_route (ocf:heartbeat:IPsrcaddr) returned 0

  # pcs resource debug-monitor vip_route
  Operation monitor for vip_route (ocf:heartbeat:IPsrcaddr) returned 0

  # pcs resource debug-stop vip_route
Error performing operation: Operation not permitted
Operation stop for vip_route (ocf:heartbeat:IPsrcaddr) returned 1
 >  stderr: Usage: ip route { list | flush } SELECTOR
 >  stderr:        ip route save SELECTOR
 >  stderr:        ip route restore
 >  stderr:        ip route showdump
 >  stderr:        ip route get ADDRESS [ from ADDRESS iif STRING ]
 >  stderr:                             [ oif STRING ]  [ tos TOS ]
 >  stderr:                             [ mark NUMBER ]
 >  stderr:        ip route { add | del | change | append | replace } ROUTE
 >  stderr: SELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact PREFIX ]
 >  stderr:             [ table TABLE_ID ] [ proto RTPROTO ]
 >  stderr:             [ type TYPE ] [ scope SCOPE ]
 >  stderr: ROUTE := NODE_SPEC [ INFO_SPEC ]
 >  stderr: NODE_SPEC := [ TYPE ] PREFIX [ tos TOS ]
 >  stderr:              [ table TABLE_ID ] [ proto RTPROTO ]
 >  stderr:              [ scope SCOPE ] [ metric METRIC ]
 >  stderr: INFO_SPEC := NH OPTIONS FLAGS [ nexthop NH ]...
 >  stderr: NH := [ via ADDRESS ] [ dev STRING ] [ weight NUMBER ] NHFLAGS
 >  stderr: OPTIONS := FLAGS [ mtu NUMBER ] [ advmss NUMBER ]
 >  stderr:            [ rtt TIME ] [ rttvar TIME ] [reordering NUMBER ]
 >  stderr:            [ window NUMBER ] [ cwnd NUMBER ] [ initcwnd NUMBER ]
 >  stderr:            [ ssthresh NUMBER ] [ realms REALM ] [ src ADDRESS ]
 >  stderr:            [ rto_min TIME ] [ hoplimit NUMBER ] [ initrwnd NUMBER ]
 >  stderr:            [ features FEATURES ] [ quickack BOOL ] [ congctl NAME ]
 >  stderr:            [ expires TIME ]
 >  stderr: TYPE := { unicast | local | broadcast | multicast | throw |
 >  stderr:           unreachable | prohibit | blackhole | nat }
 >  stderr: TABLE_ID := [ local | main | default | all | NUMBER ]
 >  stderr: SCOPE := [ host | link | global | NUMBER ]
 >  stderr: NHFLAGS := [ onlink | pervasive ]
 >  stderr: RTPROTO := [ kernel | boot | static | NUMBER ]
 >  stderr: TIME := NUMBER[s|ms]
 >  stderr: BOOL := [1|0]
 >  stderr: FEATURES := ecn
 >  stderr: ocf-exit-reason:command 'ip route replace  dev ensX' failed

== Expected results:
  # pcs resource debug-start vip
  Operation start for vip (ocf:heartbeat:IPaddr2) returned 0

  # pcs resource debug-start vip_route
  Operation start for vip_route (ocf:heartbeat:IPsrcaddr) returned 0

  # pcs resource debug-monitor vip_route
  Operation monitor for vip_route (ocf:heartbeat:IPsrcaddr) returned 0

  # pcs resource debug-stop vip_route
  Operation stop for vip_route (ocf:heartbeat:IPsrcaddr) returned 0

== Additional info:
Issue is not reproducible when the interface is managed by NetworkManager.
Below are outputs from 'ip route' command when NetworkManager is and is not in use.

=======
## bond0 without NetworkManager clean start
  default via 10.0.0.1 dev bond0
  10.0.0.0/24 dev bond0 proto kernel scope link src 10.0.0.85

  # after starting IPsrcaddr
  default via 10.0.0.1 dev bond0 src 10.0.0.10
  10.0.0.0/24 dev bond0 scope link src 10.0.0.10

  # attempting to stop IPsrcaddr results in error (exit code 1)

## bond0 with NetworkManager clean start
  default via 10.0.0.1 dev bond0 proto static metric 300  
  10.0.0.0/24 dev bond0 proto kernel scope link src 10.0.0.85 metric 300

  # after starting IPsrcaddr
  default via 10.0.0.1 dev bond0 proto static src 10.0.0.10 metric 300
  10.0.0.0/24 dev bond0 scope link src 10.0.0.10
  10.0.0.0/24 dev bond0 proto kernel scope link src 10.0.0.85 metric 300

  # after stopping the IPsrcaddr
  default via 10.0.0.1 dev bond0 proto static metric 300
  10.0.0.0/24 dev bond0 scope link
  10.0.0.0/24 dev bond0 proto kernel scope link src 10.0.0.85 metric 300
=======

Customer provided us with the patch containing workaround that works in their environment (there are some hardcoded things)

--- /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr 2017-06-23 09:32:28.000000000 +0200
+++ /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr.custom 2017-10-16 15:55:31.617636470 +0200
@@ -172,7 +172,7 @@
                rc=$OCF_SUCCESS
                ocf_log info "The ip route has been already set.($NETWORK, $INTERFACE, $ROUTE_WO_SRC)"
        else
-               ip route replace $NETWORK dev $INTERFACE src $1 || \
+               ip route replace $NETWORK dev $INTERFACE proto kernel scope link src $1 || \
                        errorexit "command 'ip route replace $NETWORK dev $INTERFACE src $1' failed"
 
                $CMDCHANGE $ROUTE_WO_SRC src $1 || \
@@ -204,7 +204,10 @@
          
        [ $rc = 2 ] && errorexit "The address you specified to stop does not match the preferred source address"
 
-       ip route replace $NETWORK dev $INTERFACE || \
+# WORKAROUND !!!!
+       PRIMARY=`ip -4 addr show bond1 primary | fgrep inet | awk -c '{ print $2; }' | cut -f1 -d\/`
+
+       ip route replace $NETWORK dev $INTERFACE proto kernel scope link src ${PRIMARY} || \
                errorexit "command 'ip route replace $NETWORK dev $INTERFACE' failed"
 
        $CMDCHANGE $ROUTE_WO_SRC || \


Additional tests:
- test when NetworkManager service is completely off
  - no change in behaviour (systemctl disable NetworkManager and reboot, verified that NM was stopped)
- test adding the same route as NetworkManager adds when it is used to see if then this works
  - adding a route _after_ starting IPsrcaddr gets us into state from which things starts to work correctly.
    However we cannot add the same route _before_ starting the IPsrcaddr as the start of IPsrcaddr would then fail

The following commands seems to causing the failure
207         ip route replace $NETWORK dev $INTERFACE || \

471 INTERFACE=`echo $findif_out | awk '{print $1}'`
472 NETWORK=`ip route list dev $INTERFACE scope link proto kernel match $ipaddress|grep -o '^[^ ]*'`

- In scenario when it fails we see following
>  stderr: + 17:02:43: srca_stop:207: ip route replace dev ens7

- In scenario when it works we see following
>  stderr: + 17:03:05: srca_stop:207: ip route replace 10.0.0.0/24 dev ens7

So it seems that we are missing the $NETWORK. Below are the outputs how the NETWORK is determined
# ip route
default via 10.0.0.1 dev ens7 src 10.0.0.10 
10.0.0.0/24 dev ens7 scope link src 10.0.0.10 
...
 >  stderr: ++ 17:02:43: 472: ip route list dev ens7 scope link proto kernel match 10.0.0.10
 >  stderr: ++ 17:02:43: 472: grep -o '^[^ ]*'
 >  stderr: + 17:02:43: 472: NETWORK=

# ip route
default via 10.0.0.1 dev ens7 src 10.0.0.10 
10.0.0.0/24 dev ens7 scope link src 10.0.0.10 
10.0.0.0/24 dev ens7 proto kernel scope link src 10.0.0.85 metric 100 
...
 >  stderr: ++ 17:08:13: 472: ip route list dev ens7 scope link proto kernel match 10.0.0.10
 >  stderr: ++ 17:08:13: 472: grep -o '^[^ ]*'
 >  stderr: + 17:08:13: 472: NETWORK=10.0.0.0/24

So it looks that in some cases we fail to detect correctly the NETWORK

Comment 2 Oyvind Albrigtsen 2017-11-01 15:11:27 UTC

Bumping to 7.6.

Comment 11 Oyvind Albrigtsen 2019-04-05 08:43:27 UTC

https://github.com/ClusterLabs/resource-agents/pull/1311

Comment 14 errata-xmlrpc 2019-08-06 12:01:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2012

Note You need to log in before you can comment on or make changes to this bug.