Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1720153

Summary: nmcli ignores validate_active: true, validate_inactive: true when a team device is being created see #3 for details
Product: Red Hat Enterprise Linux 7 Reporter: Michal Tesar <mtesar>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: atragler, bgalvani, fgiudici, fpokryvk, lrintel, lxin, network-qe, rkhan, sukulkar, thaller, vbenes
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.18.0-4.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 13:17:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
[PATCH nm-1-18] libnm-core: fix conversion to json of team watcher flags none

Description Michal Tesar 2019-06-13 09:23:10 UTC
Description of problem:

  +-RHEL7.6-machine---------------+
  |                               |
  |         192.168.1.1/24        |
  | +--------------------------+  |
  | |         teamdev0         |  | 
  | | +---------+  +---------+ |  | 
  | | |   em1   |  |   em2   | |  | 
  +---+----+----+--+----+----+----+
           |             |  
           |             |
      +--+-+-+--------+-+-+--+
      |  |   |        |   |  |
      |  +---+ switch +---+  |
      |        +---+         |
      |        |   |         |
      +--------+-+-+---------+
                 |
                 |
  +---------+----+----+-----------+
  |         |   NIC   |           |
  |         +---------+           |
  |        192.168.1.2/24         |
  |                               |
  +-RHEL7.6(ARP TARGET)-machine---+

# nmcli connection add type team con-name team0 ifname teamdev0 config '{"device": "teamdev0","link_watch": {"interval": 1000,"missed_max": 1,"name": "arp_ping","send_always": false,"source_host": "192.168.1.1","target_host": "192.168.1.2","validate_active": true,"validate_inactive": true},"ports": {"em2": {"prio": 100,"sticky": true},"em1": {"prio": 50}},"runner": {"name": "activebackup"}}' ip4 192.168.1.1/24

# nmcli connection add type team-slave con-name team0_em1 ifname em1 master teamdev0
# nmcli connection add type team-slave con-name team0_em2 ifname em2 master teamdev0

When the arp target is not answering the active slave flaps between slaves and fail counter increases for em2 interface. 
When the arp target starts answering all go up and fine.

Version-Release number of selected component (if applicable):

# teamd --version
teamd 1.27

# yum info libteam
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Installed Packages
Name        : libteam
Arch        : x86_64
Version     : 1.27
Release     : 6.el7_6.1

# uname -r
3.10.0-957.21.2.el7.x86_64

How reproducible:

- unplug the cable to the arp target

# teamdctl teamdev0 state
setup:
  runner: activebackup
ports:
  em1
    link watches:
      link summary: up
      instance[link_watch_0]:
        name: arp_ping
        link: up
        down count: 0
  em2
    link watches:
      link summary: down
      instance[link_watch_0]:
        name: arp_ping
        link: down
        down count: 17850
runner:
  active port: em1

# teamdctl teamdev0 state
setup:
  runner: activebackup
ports:
  em1
    link watches:
      link summary: up
      instance[link_watch_0]:
        name: arp_ping
        link: up
        down count: 0
  em2
    link watches:
      link summary: up
      instance[link_watch_0]:
        name: arp_ping
        link: up
        down count: 17849 <------------
runner:
  active port: em2

Actual results:
The active slave down count increases and active port flaps between slaves.

Expected results:
Down count for the sticky slave not increasing and no  active slave flapping

Additional info:

Reproduced on my local test env. Fully accessible for BZ participants on request.

Comment 2 Michal Tesar 2019-06-13 11:43:31 UTC
Ok it looks now that this behaviour is caused by the setting of these two:

    "ports": {
        "em1": {
            "ifinfo": {
                "dev_addr": "20:47:47:85:e3:d8",
                "dev_addr_len": 6,
                "ifindex": 2,
                "ifname": "em1"
            },
            "link": {
                "duplex": "full",
                "speed": 1000,
                "up": true
            },
            "link_watches": {
                "list": {
                    "link_watch_0": {
                        "down_count": 0,
                        "init_wait": 0,
                        "interval": 1000,
                        "missed": 0,
                        "missed_max": 1,
                        "name": "arp_ping",
                        "send_always": false,
                        "source_host": "192.168.1.1",
                        "target_host": "192.168.1.2",
                        "up": true,
                        "validate_active": false,  <--------
                        "validate_inactive": false <--------
                    }
                },
                "up": true
            }
        },
        "em2": {
            "ifinfo": {
                "dev_addr": "20:47:47:85:e3:d8",
                "dev_addr_len": 6,
                "ifindex": 3,
                "ifname": "em2"
            },
            "link": {
                "duplex": "full",
                "speed": 1000,
                "up": true
            },
            "link_watches": {
                "list": {
                    "link_watch_0": {
                        "down_count": 397,
                        "init_wait": 0,
                        "interval": 1000,
                        "missed": 2,
                        "missed_max": 1,
                        "name": "arp_ping",
                        "send_always": false,
                        "source_host": "192.168.1.1",
                        "target_host": "192.168.1.2",
                        "up": false,
                        "validate_active": false,    <------
                        "validate_inactive": false   <------

Despite the fact that 

#  nmcli connection add type team con-name team0 ifname teamdev0 config '{"device": "teamdev0","link_watch": {"interval": 1000,"missed_max": 1,"name": "arp_ping","send_always": false,"source_host": "192.168.1.1","target_host": 
"192.168.1.2","validate_active": true,"validate_inactive": true},"ports": {"em2": {"prio": 100,"sticky": true},"em1": {"prio": 50}},"runner": {"name": "activebackup"}}' ip4 192.168.1.1/24

# ps -ef | grep teamd
root       763  1841  0 13:14 ?        00:00:01 /usr/bin/teamd -o -n -U -D -N -t teamdev0 -c {"device": "teamdev0", "link_watch": {"name": "arp_ping", "interval": 1000, "missed_max": 1, "target_host": "192.168.1.2", "source_host": "192.168.1.1"}, "ports": {"em2": {"prio": 100, "sticky": true}, "em1": {"prio": 50}}, "runner": {"name": "activebackup"}

So the nm is simply ignoring these two.

# rpm -q NetworkManager
NetworkManager-1.12.0-10.el7_6.x86_64

Michal

Comment 3 Michal Tesar 2019-06-13 12:06:33 UTC
Hello,

this works fine on NetworkManager-1.8.0-9.el7.x86_64

# rpm -q NetworkManager
NetworkManager-1.8.0-9.el7.x86_64

#  nmcli connection add type team con-name team0 ifname teamdev0 config '{"device": "teamdev0","link_watch": {"interval": 1000,"missed_max": 1,"name": "arp_ping","send_always": false,"source_host": "192.168.1.1","target_host": "192.168.1.2","validate_active": true,"validate_inactive": true},"ports": {"eth1": {"prio": 100,"sticky": true},"eth2": {"prio": 50}},"runner": {"name": "activebackup"}}' ip4 192.168.1.1/24
Connection 'team0' (bc88db90-585d-4932-8675-3a132698df37) successfully added.

# ps -ef | grep teamd
root     11693   752  0 13:53 ?        00:00:00 /usr/bin/teamd -o -n -U -D -N -t teamdev0 -c {"device": "teamdev0", "hwaddr": "B6:3D:CD:8E:84:CC", "link_watch": {"interval": 1000, "missed_max": 1, "name": "arp_ping", "send_always": false, "source_host": "192.168.1.1", "target_host": "192.168.1.2", "validate_active": true, "validate_inactive": true}, "ports": {"eth1": {"prio": 100, "sticky": true}, "eth2": {"prio": 50}}, "runner": {"name": "activebackup"}}
root     11720 11641  0 13:53 pts/0    00:00:00 grep --color=auto teamd

So there has to be a regression between NetworkManager-1.8.0-9.el7.x86_64 and NetworkManager-1.12.0-10.el7_6.x86_64 which caused these two params (maybe more did not check any others) be ignored.

Michal

Comment 4 Michal Tesar 2019-06-13 12:58:41 UTC
The original slave flapping is caused by the switch arp request broadcast to other slaves.
But the team device considers such arp request with matching source target as link up.

This is fixed in rhel-8 as well as in upstream already:

https://lists.fedorahosted.org/archives/list/libteam@lists.fedorahosted.org/thread/MGBS7RG24YUM7VW6UQZWM6JUVZFUNFXZ/

https://bugzilla.redhat.com/show_bug.cgi?id=1663093

So I am switching this bz from libteam to NetworkManager due to the regression described in #3

Regards Michal

Comment 5 Thomas Haller 2019-06-13 13:30:10 UTC
This is probably fixed by the recent rework of team handling in libnm on upstream/master (upcoming 1.20). Not that that would help RHEL at this point...

Comment 7 Beniamino Galvani 2019-06-14 14:47:43 UTC
Created attachment 1580707 [details]
[PATCH nm-1-18] libnm-core: fix conversion to json of team watcher flags

Comment 8 Thomas Haller 2019-06-15 06:38:39 UTC
(In reply to Beniamino Galvani from comment #7)
> Created attachment 1580707 [details]
> [PATCH nm-1-18] libnm-core: fix conversion to json of team watcher flags

lgtm

Comment 13 errata-xmlrpc 2019-08-06 13:17:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2302