Bug 1720153
| Summary: | nmcli ignores validate_active: true, validate_inactive: true when a team device is being created see #3 for details | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Michal Tesar <mtesar> | ||||
| Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 7.6 | CC: | atragler, bgalvani, fgiudici, fpokryvk, lrintel, lxin, network-qe, rkhan, sukulkar, thaller, vbenes | ||||
| Target Milestone: | rc | Keywords: | Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | NetworkManager-1.18.0-4.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-08-06 13:17:02 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Ok it looks now that this behaviour is caused by the setting of these two:
"ports": {
"em1": {
"ifinfo": {
"dev_addr": "20:47:47:85:e3:d8",
"dev_addr_len": 6,
"ifindex": 2,
"ifname": "em1"
},
"link": {
"duplex": "full",
"speed": 1000,
"up": true
},
"link_watches": {
"list": {
"link_watch_0": {
"down_count": 0,
"init_wait": 0,
"interval": 1000,
"missed": 0,
"missed_max": 1,
"name": "arp_ping",
"send_always": false,
"source_host": "192.168.1.1",
"target_host": "192.168.1.2",
"up": true,
"validate_active": false, <--------
"validate_inactive": false <--------
}
},
"up": true
}
},
"em2": {
"ifinfo": {
"dev_addr": "20:47:47:85:e3:d8",
"dev_addr_len": 6,
"ifindex": 3,
"ifname": "em2"
},
"link": {
"duplex": "full",
"speed": 1000,
"up": true
},
"link_watches": {
"list": {
"link_watch_0": {
"down_count": 397,
"init_wait": 0,
"interval": 1000,
"missed": 2,
"missed_max": 1,
"name": "arp_ping",
"send_always": false,
"source_host": "192.168.1.1",
"target_host": "192.168.1.2",
"up": false,
"validate_active": false, <------
"validate_inactive": false <------
Despite the fact that
# nmcli connection add type team con-name team0 ifname teamdev0 config '{"device": "teamdev0","link_watch": {"interval": 1000,"missed_max": 1,"name": "arp_ping","send_always": false,"source_host": "192.168.1.1","target_host":
"192.168.1.2","validate_active": true,"validate_inactive": true},"ports": {"em2": {"prio": 100,"sticky": true},"em1": {"prio": 50}},"runner": {"name": "activebackup"}}' ip4 192.168.1.1/24
# ps -ef | grep teamd
root 763 1841 0 13:14 ? 00:00:01 /usr/bin/teamd -o -n -U -D -N -t teamdev0 -c {"device": "teamdev0", "link_watch": {"name": "arp_ping", "interval": 1000, "missed_max": 1, "target_host": "192.168.1.2", "source_host": "192.168.1.1"}, "ports": {"em2": {"prio": 100, "sticky": true}, "em1": {"prio": 50}}, "runner": {"name": "activebackup"}
So the nm is simply ignoring these two.
# rpm -q NetworkManager
NetworkManager-1.12.0-10.el7_6.x86_64
Michal
Hello,
this works fine on NetworkManager-1.8.0-9.el7.x86_64
# rpm -q NetworkManager
NetworkManager-1.8.0-9.el7.x86_64
# nmcli connection add type team con-name team0 ifname teamdev0 config '{"device": "teamdev0","link_watch": {"interval": 1000,"missed_max": 1,"name": "arp_ping","send_always": false,"source_host": "192.168.1.1","target_host": "192.168.1.2","validate_active": true,"validate_inactive": true},"ports": {"eth1": {"prio": 100,"sticky": true},"eth2": {"prio": 50}},"runner": {"name": "activebackup"}}' ip4 192.168.1.1/24
Connection 'team0' (bc88db90-585d-4932-8675-3a132698df37) successfully added.
# ps -ef | grep teamd
root 11693 752 0 13:53 ? 00:00:00 /usr/bin/teamd -o -n -U -D -N -t teamdev0 -c {"device": "teamdev0", "hwaddr": "B6:3D:CD:8E:84:CC", "link_watch": {"interval": 1000, "missed_max": 1, "name": "arp_ping", "send_always": false, "source_host": "192.168.1.1", "target_host": "192.168.1.2", "validate_active": true, "validate_inactive": true}, "ports": {"eth1": {"prio": 100, "sticky": true}, "eth2": {"prio": 50}}, "runner": {"name": "activebackup"}}
root 11720 11641 0 13:53 pts/0 00:00:00 grep --color=auto teamd
So there has to be a regression between NetworkManager-1.8.0-9.el7.x86_64 and NetworkManager-1.12.0-10.el7_6.x86_64 which caused these two params (maybe more did not check any others) be ignored.
Michal
The original slave flapping is caused by the switch arp request broadcast to other slaves. But the team device considers such arp request with matching source target as link up. This is fixed in rhel-8 as well as in upstream already: https://lists.fedorahosted.org/archives/list/libteam@lists.fedorahosted.org/thread/MGBS7RG24YUM7VW6UQZWM6JUVZFUNFXZ/ https://bugzilla.redhat.com/show_bug.cgi?id=1663093 So I am switching this bz from libteam to NetworkManager due to the regression described in #3 Regards Michal This is probably fixed by the recent rework of team handling in libnm on upstream/master (upcoming 1.20). Not that that would help RHEL at this point... Created attachment 1580707 [details]
[PATCH nm-1-18] libnm-core: fix conversion to json of team watcher flags
(In reply to Beniamino Galvani from comment #7) > Created attachment 1580707 [details] > [PATCH nm-1-18] libnm-core: fix conversion to json of team watcher flags lgtm Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2302 |
Description of problem: +-RHEL7.6-machine---------------+ | | | 192.168.1.1/24 | | +--------------------------+ | | | teamdev0 | | | | +---------+ +---------+ | | | | | em1 | | em2 | | | +---+----+----+--+----+----+----+ | | | | +--+-+-+--------+-+-+--+ | | | | | | | +---+ switch +---+ | | +---+ | | | | | +--------+-+-+---------+ | | +---------+----+----+-----------+ | | NIC | | | +---------+ | | 192.168.1.2/24 | | | +-RHEL7.6(ARP TARGET)-machine---+ # nmcli connection add type team con-name team0 ifname teamdev0 config '{"device": "teamdev0","link_watch": {"interval": 1000,"missed_max": 1,"name": "arp_ping","send_always": false,"source_host": "192.168.1.1","target_host": "192.168.1.2","validate_active": true,"validate_inactive": true},"ports": {"em2": {"prio": 100,"sticky": true},"em1": {"prio": 50}},"runner": {"name": "activebackup"}}' ip4 192.168.1.1/24 # nmcli connection add type team-slave con-name team0_em1 ifname em1 master teamdev0 # nmcli connection add type team-slave con-name team0_em2 ifname em2 master teamdev0 When the arp target is not answering the active slave flaps between slaves and fail counter increases for em2 interface. When the arp target starts answering all go up and fine. Version-Release number of selected component (if applicable): # teamd --version teamd 1.27 # yum info libteam Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager Installed Packages Name : libteam Arch : x86_64 Version : 1.27 Release : 6.el7_6.1 # uname -r 3.10.0-957.21.2.el7.x86_64 How reproducible: - unplug the cable to the arp target # teamdctl teamdev0 state setup: runner: activebackup ports: em1 link watches: link summary: up instance[link_watch_0]: name: arp_ping link: up down count: 0 em2 link watches: link summary: down instance[link_watch_0]: name: arp_ping link: down down count: 17850 runner: active port: em1 # teamdctl teamdev0 state setup: runner: activebackup ports: em1 link watches: link summary: up instance[link_watch_0]: name: arp_ping link: up down count: 0 em2 link watches: link summary: up instance[link_watch_0]: name: arp_ping link: up down count: 17849 <------------ runner: active port: em2 Actual results: The active slave down count increases and active port flaps between slaves. Expected results: Down count for the sticky slave not increasing and no active slave flapping Additional info: Reproduced on my local test env. Fully accessible for BZ participants on request.