Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1183728

Summary: setting MTU=50000 over bond seems to succeed, but leaves system in unrecoverable state
Product: [oVirt] vdsm Reporter: Michael Burman <mburman>
Component: GeneralAssignee: Petr Horáček <phoracek>
Status: CLOSED WONTFIX QA Contact: Michael Burman <mburman>
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: alkaplan, bazulay, bugs, danken, gklein, lpeer, lsurette, mburman, myakove, nyechiel, phoracek, srevivo, ykaul, ylavi
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1196271 (view as bug list) Environment:
Last Closed: 2016-04-18 11:23:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1184005, 1196271    
Attachments:
Description Flags
logs
none
supervdsm log none

Description Michael Burman 2015-01-19 15:38:16 UTC
Created attachment 981538 [details]
logs

Description of problem:
Not supported MTU on host break Bond and his slaves, and after this can't perform any SetupNetworks operations.
If i have a bond with 2 slaves configured and a network attached to him, i'm editing the network with some custom not-supported MTU on host, let say 50000, 
in the event log i will see:
(1/1): Applying changes for network(s) topo_net on host navy-vds1.qa.lab.tlv.redhat.com. (User: admin)

But when i will go to SetupNetworks i will see that the Bond is broken and network Detached.

vdsCaps will report that bond exists but has no slaves. GUI will show no bond at all. 
From this point there will be no way to perform or approve any operation in SetupNetworks. vdsm report bond is exists without slaves.

Version-Release number of selected component (if applicable):
3.5.0-0.29.el6ev
vdsm-4.16.8.1-5.el7ev.x86_64

How reproducible:
always

Steps to Reproduce:
1. Create Bond with 2 slaves via SetupNetworks and approve operation
2. Create network with default MTU and attach to bond via SetupNetworks
3. Edit the network with a custom non-supported MTU, for ex 50000

Actual results:
Go to SetupNetworks and find that Bond is broken, network Detached.
vdsCaps report bond without slaves.

Expected results:
Network to be un-synced cause the host not support such MTU or to be blocked.

Comment 1 Dan Kenigsberg 2015-01-21 10:00:09 UTC
MainProcess|Thread-1129::ERROR::2015-01-19 16:18:39,493::supervdsmServer::105::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer", line 103, in wrapper
    res = func(*args, **kwargs)
  File "/usr/share/vdsm/supervdsmServer", line 223, in setupNetworks
    return setupNetworks(networks, bondings, **options)
  File "/usr/share/vdsm/network/api.py", line 642, in setupNetworks
    implicitBonding=False, _netinfo=_netinfo)
  File "/usr/share/vdsm/network/api.py", line 226, in wrapped
    ret = func(**attrs)
  File "/usr/share/vdsm/network/api.py", line 434, in delNetwork
    implicitBonding=implicitBonding)
  File "/usr/share/vdsm/network/api.py", line 108, in objectivizeNetwork
    nics, mtu, _netinfo, implicitBonding)
  File "/usr/share/vdsm/network/models.py", line 267, in objectivize
    raise ConfigNetworkError(ne.ERR_BAD_PARAMS, 'Missing required nics'
ConfigNetworkError: (21, 'Missing required nics for bonding device.')

Comment 2 Dan Kenigsberg 2015-01-21 12:00:10 UTC
Michael, would you please try to reproduce this out of oVirt: define ifcfg-bond0 with MTU 50000 on top of slaves, and run `ifup bond0`. If fails to create the proper bond0, but ends without error - open an initscripts bug.

Vdsm should resolve its inconsistency regardless.

Comment 3 Michael Burman 2015-01-21 13:36:23 UTC
Dan,

I have reproduced this issue like you asked.
The MTU should be defined on top of the bond and not slaves, and then it's reproduced. 

after configuring ifcfg files, i run 'ifup bond0' nad get:
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument

cat /sys/class/net/bond0/bonding/slaves
empty

cat /sys/class/net/bond0/
no slaves

So it fails to create the proper bond0, slaves are gone, but ends without an error.

Comment 4 Lior Vernia 2015-01-26 14:39:49 UTC
*** Bug 1184005 has been marked as a duplicate of this bug. ***

Comment 5 Eyal Edri 2015-02-25 08:43:36 UTC
3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2

Comment 7 Michael Burman 2015-04-15 13:58:00 UTC
Tested on -  3.6.0-0.0.master.20150412172306.git55ba764.el6
with vdsm-4.17.0-632.git19a83a2.el7.x86_64

Followed my steps from Description and failed ON_QA, cause system in unrecoverable state.
1) 'rhevm'/'ovirtmgmt' attached to eth0
2) bond0 from eth2 and eth3
3) 'net1' attached to eth1
4) change 'net1' MTU to 50000(unsupported MTU)
Result:
- bond0 is broken, 'net1' detached from eth1
- i was able to attach 'net1' back as un-synced network and when changed the MTU back to default on 'net1' it become synced, but,
bond is broken, slaves are down and i can't bring them up or to create the bond0 again from those slaves via SN. operation seems to succeed, but bond is not created and slaves are down. 

from vdsCaps:
bondings = {'bond0': {'active_slave': '', 

cat /sys/class/net/bond0/bonding/slaves
empty

engine.log:
015-04-15 14:34:40,314 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-95) [2a12541a] Command 'GetAllVmStatsVDSCommand(HostName = navy-vds3.qa.lab.tlv.redh
at.com, HostId = 900504c9-0397-460c-abc0-2346f825de35, vds=Host[navy-vds3.qa.lab.tlv.redhat.com,900504c9-0397-460c-abc0-2346f825de35])' execution failed: VDSGenericException: VDSNetworkException: Policy reset
2015-04-15 14:34:40,336 ERROR [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler_Worker-95) [2a12541a] Failed to invoke scheduled method vmsMonitoring: null


no errors in vdsm.log

Comment 8 Michael Burman 2015-04-16 06:01:22 UTC
Hi Petr, please note that reboot bringing server back to recoverable state, host is up with slaves and bond0 up as well. And it's possible attach networks to bond0, break him or to approve any SN operations.

Comment 9 Dan Kenigsberg 2015-04-21 09:03:04 UTC
I expected that it would be possible to fix the broken bond via a 
follow-up setupNetwork command with sane MTUs.

Michael, could you attach supervdsm.log of a failure to set the sane MTUs? Or does it fail in the Engine?

Comment 10 Michael Burman 2015-04-22 05:22:30 UTC
Hi Dan

There is actually no failure in supervdsm.log, no failures at all, from supervdsm.log it seems the bond has been successfully created, but bond is not created. Slaves are down and i can't bring them ip.

caps report:

bondings = {'bond0': {'active_slave': '', 

attaching supervdsm.log

Comment 11 Michael Burman 2015-04-22 05:23:07 UTC
Created attachment 1017232 [details]
supervdsm log

Comment 12 Petr Horáček 2015-07-13 13:57:31 UTC
(In reply to Michael Burman from comment #7)
> Tested on -  3.6.0-0.0.master.20150412172306.git55ba764.el6
> with vdsm-4.17.0-632.git19a83a2.el7.x86_64
> 
> Followed my steps from Description and failed ON_QA, cause system in
> unrecoverable state.
> 1) 'rhevm'/'ovirtmgmt' attached to eth0
> 2) bond0 from eth2 and eth3
> 3) 'net1' attached to eth1
> 4) change 'net1' MTU to 50000(unsupported MTU)
> Result:
> - bond0 is broken, 'net1' detached from eth1
> - i was able to attach 'net1' back as un-synced network and when changed the
> MTU back to default on 'net1' it become synced, but,
> bond is broken, slaves are down and i can't bring them up or to create the
> bond0 again from those slaves via SN. operation seems to succeed, but bond
> is not created and slaves are down. 
> 
> from vdsCaps:
> bondings = {'bond0': {'active_slave': '', 
> 
> cat /sys/class/net/bond0/bonding/slaves
> empty

Hi Michael,
I tested it on vdsm-4.17.0-1127.git91c5728.el7 with commands:

$ vdsClient -s 0 setupNetworks "bondings={bond12:{nics:eth0}}"
Done
$ vdsClient -s 0 setupNetworks "networks={ovirtmgmt:{bonding:bond12,bootproto:dhcp}}"
Done
$ vdsClient -s 0 setupNetworks "networks={ovirtmgmt:{bonding:bond12,bootproto:dhcp,mtu:50000}}"
Done
$ cat /sys/class/net/bond12/bonding/slaves 
eth0 
$ ping www.google.com
PING www.google.com (216.58.196.132) 56(84) bytes of data.
64 bytes from sin01s18-in-f132.1e100.net (216.58.196.132): icmp_seq=1 ttl=46 time=304 ms
64 bytes from sin01s18-in-f4.1e100.net (216.58.196.132): icmp_seq=2 ttl=46 time=304 ms
^C
--- www.google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 304.497/304.578/304.660/0.557 ms

It seems to me OK, could you try it again? Maybe it was fixed by another patch, maybe it fails only with Engine, maybe I did something wrong.

Comment 13 Michael Burman 2015-07-13 14:39:00 UTC
Hi Petr,

Same as described in comment 7 above ^^, same result.

Tested with vdsm-4.17.0-1054.git562e711.el7.noarch

Comment 14 Michael Burman 2015-07-13 14:59:51 UTC
Reboot not solving this, system stays in unrecoverable state.

Comment 15 Petr Horáček 2015-10-15 12:19:35 UTC
I tried to reproduce it again:

$ yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
$ yum install vdsm
$ vdsm-tool configure --force
$ service vdsmd start

$ python
>>> from vdsm import vdscli
>>> c = vdscli.connect()
>>> # 1) 'rhevm'/'ovirtmgmt' attached to eth0
>>> c.setupNetworks({'ovirtmgmt': {'nic': 'eth0', 'bootproto': 'dhcp', 'bridged': True}}, {}, {'connectivityCheck': False})
>>> # 2) bond0 from eth2 and eth3
>>>> c.setupNetworks({}, {'bond0': {'nics': ['ens10','ens11']}}, {'connectivityCheck': False})
>>> # 3) 'net1' attached to eth1
>>> c.setupNetworks({'net1': {'nic': 'ens9', 'bridged': True}}, {}, {'connectivityCheck': False})
>>> # 4) change 'net1' MTU to 50000(unsupported MTU)
>>> c.setupNetworks({'net1': {'nic': 'ens9', 'bridged': True, 'mtu': 50000}}, {}, {'connectivityCheck': False})


Everything is ok. MTU is unchanged, when i set MTU of net1 to 500 and then to 50000, it's set to 1500. Bond has its slaves and Internet is reachable.

I'll try to reproduce it with Engine.

Comment 16 Petr Horáček 2015-10-16 11:10:54 UTC
I tried the flow described in comment 7 with Engine and it seems to be OK. Instead of eth0 i used ens3 and instead of eth1-eth3 I used dummy_1-dummy_3.

[root@10-34-60-1 ~]# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovirtmgmt state UP mode DEFAULT qlen 1000
    link/ether 00:1a:4a:d0:40:08 brd ff:ff:ff:ff:ff:ff
3: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
    link/ether 12:8a:d9:2b:9c:81 brd ff:ff:ff:ff:ff:ff
4: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT 
    link/ether ee:6c:c3:52:90:bb brd ff:ff:ff:ff:ff:ff
5: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
    link/ether 00:1a:4a:d0:40:08 brd ff:ff:ff:ff:ff:ff
6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT 
    link/ether 1a:0f:52:7e:10:74 brd ff:ff:ff:ff:ff:ff
10: dummy_1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 50000 qdisc noqueue master net1 state UNKNOWN mode DEFAULT 
    link/ether 4a:5b:33:6a:3b:e6 brd ff:ff:ff:ff:ff:ff
12: dummy_2: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN mode DEFAULT 
    link/ether 12:8a:d9:2b:9c:81 brd ff:ff:ff:ff:ff:ff
13: dummy_3: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN mode DEFAULT 
    link/ether 12:8a:d9:2b:9c:81 brd ff:ff:ff:ff:ff:ff
14: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 50000 qdisc noqueue state UP mode DEFAULT 
    link/ether 4a:5b:33:6a:3b:e6 brd ff:ff:ff:ff:ff:ff


ovirt-engine-3.6.0.1-1.el7.centos.noarch
ovirt-release36-001-0.5.beta.noarch
vdsm-4.17.9-0.el7.centos.noarch

Maybe it's OK because of dummies or it was fixed. Maybe I missed something, should I try another versions?

Comment 17 Petr Horáček 2015-10-16 11:36:55 UTC
I forgot about:
initscripts-9.49.30-1.el7.x86_64

Comment 18 Michael Burman 2015-10-18 07:37:54 UTC
Hi Petr,

This bug easily reproduced with my steps from description above^^
It's not fixed at all. Actuall result is still very relevant.

bondings = {'bond0': {'active_slave': '',
Network that was attached to bond0 is not reported as un-synced.
bond0 is broken and can't be created again. Reboot server fix this.

You can contact me and i will show it to you on my setup.
Tested on vdsm-4.17.9-1.el7ev.noarch and rhevm-3.6.0.1-0.1.el6.noarch
This bug shouldn't be ON_QA. Please assigned it back on you, thanks.

Comment 19 Michael Burman 2015-10-18 07:58:45 UTC
Correcting my self, Reboot is not fixing it, host stays in unrecoverable state.

Comment 21 Yaniv Kaul 2015-11-22 15:34:48 UTC
(In reply to Michael Burman from comment #8)
> Hi Petr, please note that reboot bringing server back to recoverable state,
> host is up with slaves and bond0 up as well. And it's possible attach
> networks to bond0, break him or to approve any SN operations.

Based on above comment and the fact it's negative case - moving to medium severity.

Comment 22 Michael Burman 2015-11-23 06:29:42 UTC
Hi

To be clear, reboot is not fixing the issue, host stays in unrecoverable state.
See comments 14 + 19 ^^ .

Comment 23 Red Hat Bugzilla Rules Engine 2015-12-02 00:25:04 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 24 Dan Kenigsberg 2016-04-18 11:23:29 UTC
Burman, I am sorry, but realistically we are not going to make it.

Comment 25 Red Hat Bugzilla 2023-09-14 23:58:08 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days