Bug 1183728
| Summary: | setting MTU=50000 over bond seems to succeed, but leaves system in unrecoverable state | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Michael Burman <mburman> | ||||||
| Component: | General | Assignee: | Petr Horáček <phoracek> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Michael Burman <mburman> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | --- | CC: | alkaplan, bazulay, bugs, danken, gklein, lpeer, lsurette, mburman, myakove, nyechiel, phoracek, srevivo, ykaul, ylavi | ||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 1196271 (view as bug list) | Environment: | |||||||
| Last Closed: | 2016-04-18 11:23:29 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1184005, 1196271 | ||||||||
| Attachments: |
|
||||||||
MainProcess|Thread-1129::ERROR::2015-01-19 16:18:39,493::supervdsmServer::105::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
Traceback (most recent call last):
File "/usr/share/vdsm/supervdsmServer", line 103, in wrapper
res = func(*args, **kwargs)
File "/usr/share/vdsm/supervdsmServer", line 223, in setupNetworks
return setupNetworks(networks, bondings, **options)
File "/usr/share/vdsm/network/api.py", line 642, in setupNetworks
implicitBonding=False, _netinfo=_netinfo)
File "/usr/share/vdsm/network/api.py", line 226, in wrapped
ret = func(**attrs)
File "/usr/share/vdsm/network/api.py", line 434, in delNetwork
implicitBonding=implicitBonding)
File "/usr/share/vdsm/network/api.py", line 108, in objectivizeNetwork
nics, mtu, _netinfo, implicitBonding)
File "/usr/share/vdsm/network/models.py", line 267, in objectivize
raise ConfigNetworkError(ne.ERR_BAD_PARAMS, 'Missing required nics'
ConfigNetworkError: (21, 'Missing required nics for bonding device.')
Michael, would you please try to reproduce this out of oVirt: define ifcfg-bond0 with MTU 50000 on top of slaves, and run `ifup bond0`. If fails to create the proper bond0, but ends without error - open an initscripts bug. Vdsm should resolve its inconsistency regardless. Dan, I have reproduced this issue like you asked. The MTU should be defined on top of the bond and not slaves, and then it's reproduced. after configuring ifcfg files, i run 'ifup bond0' nad get: RTNETLINK answers: Invalid argument RTNETLINK answers: Invalid argument cat /sys/class/net/bond0/bonding/slaves empty cat /sys/class/net/bond0/ no slaves So it fails to create the proper bond0, slaves are gone, but ends without an error. *** Bug 1184005 has been marked as a duplicate of this bug. *** 3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2 Tested on - 3.6.0-0.0.master.20150412172306.git55ba764.el6
with vdsm-4.17.0-632.git19a83a2.el7.x86_64
Followed my steps from Description and failed ON_QA, cause system in unrecoverable state.
1) 'rhevm'/'ovirtmgmt' attached to eth0
2) bond0 from eth2 and eth3
3) 'net1' attached to eth1
4) change 'net1' MTU to 50000(unsupported MTU)
Result:
- bond0 is broken, 'net1' detached from eth1
- i was able to attach 'net1' back as un-synced network and when changed the MTU back to default on 'net1' it become synced, but,
bond is broken, slaves are down and i can't bring them up or to create the bond0 again from those slaves via SN. operation seems to succeed, but bond is not created and slaves are down.
from vdsCaps:
bondings = {'bond0': {'active_slave': '',
cat /sys/class/net/bond0/bonding/slaves
empty
engine.log:
015-04-15 14:34:40,314 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-95) [2a12541a] Command 'GetAllVmStatsVDSCommand(HostName = navy-vds3.qa.lab.tlv.redh
at.com, HostId = 900504c9-0397-460c-abc0-2346f825de35, vds=Host[navy-vds3.qa.lab.tlv.redhat.com,900504c9-0397-460c-abc0-2346f825de35])' execution failed: VDSGenericException: VDSNetworkException: Policy reset
2015-04-15 14:34:40,336 ERROR [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler_Worker-95) [2a12541a] Failed to invoke scheduled method vmsMonitoring: null
no errors in vdsm.log
Hi Petr, please note that reboot bringing server back to recoverable state, host is up with slaves and bond0 up as well. And it's possible attach networks to bond0, break him or to approve any SN operations. I expected that it would be possible to fix the broken bond via a follow-up setupNetwork command with sane MTUs. Michael, could you attach supervdsm.log of a failure to set the sane MTUs? Or does it fail in the Engine? Hi Dan
There is actually no failure in supervdsm.log, no failures at all, from supervdsm.log it seems the bond has been successfully created, but bond is not created. Slaves are down and i can't bring them ip.
caps report:
bondings = {'bond0': {'active_slave': '',
attaching supervdsm.log
Created attachment 1017232 [details]
supervdsm log
(In reply to Michael Burman from comment #7) > Tested on - 3.6.0-0.0.master.20150412172306.git55ba764.el6 > with vdsm-4.17.0-632.git19a83a2.el7.x86_64 > > Followed my steps from Description and failed ON_QA, cause system in > unrecoverable state. > 1) 'rhevm'/'ovirtmgmt' attached to eth0 > 2) bond0 from eth2 and eth3 > 3) 'net1' attached to eth1 > 4) change 'net1' MTU to 50000(unsupported MTU) > Result: > - bond0 is broken, 'net1' detached from eth1 > - i was able to attach 'net1' back as un-synced network and when changed the > MTU back to default on 'net1' it become synced, but, > bond is broken, slaves are down and i can't bring them up or to create the > bond0 again from those slaves via SN. operation seems to succeed, but bond > is not created and slaves are down. > > from vdsCaps: > bondings = {'bond0': {'active_slave': '', > > cat /sys/class/net/bond0/bonding/slaves > empty Hi Michael, I tested it on vdsm-4.17.0-1127.git91c5728.el7 with commands: $ vdsClient -s 0 setupNetworks "bondings={bond12:{nics:eth0}}" Done $ vdsClient -s 0 setupNetworks "networks={ovirtmgmt:{bonding:bond12,bootproto:dhcp}}" Done $ vdsClient -s 0 setupNetworks "networks={ovirtmgmt:{bonding:bond12,bootproto:dhcp,mtu:50000}}" Done $ cat /sys/class/net/bond12/bonding/slaves eth0 $ ping www.google.com PING www.google.com (216.58.196.132) 56(84) bytes of data. 64 bytes from sin01s18-in-f132.1e100.net (216.58.196.132): icmp_seq=1 ttl=46 time=304 ms 64 bytes from sin01s18-in-f4.1e100.net (216.58.196.132): icmp_seq=2 ttl=46 time=304 ms ^C --- www.google.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 304.497/304.578/304.660/0.557 ms It seems to me OK, could you try it again? Maybe it was fixed by another patch, maybe it fails only with Engine, maybe I did something wrong. Hi Petr, Same as described in comment 7 above ^^, same result. Tested with vdsm-4.17.0-1054.git562e711.el7.noarch Reboot not solving this, system stays in unrecoverable state. I tried to reproduce it again: $ yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm $ yum install vdsm $ vdsm-tool configure --force $ service vdsmd start $ python >>> from vdsm import vdscli >>> c = vdscli.connect() >>> # 1) 'rhevm'/'ovirtmgmt' attached to eth0 >>> c.setupNetworks({'ovirtmgmt': {'nic': 'eth0', 'bootproto': 'dhcp', 'bridged': True}}, {}, {'connectivityCheck': False}) >>> # 2) bond0 from eth2 and eth3 >>>> c.setupNetworks({}, {'bond0': {'nics': ['ens10','ens11']}}, {'connectivityCheck': False}) >>> # 3) 'net1' attached to eth1 >>> c.setupNetworks({'net1': {'nic': 'ens9', 'bridged': True}}, {}, {'connectivityCheck': False}) >>> # 4) change 'net1' MTU to 50000(unsupported MTU) >>> c.setupNetworks({'net1': {'nic': 'ens9', 'bridged': True, 'mtu': 50000}}, {}, {'connectivityCheck': False}) Everything is ok. MTU is unchanged, when i set MTU of net1 to 500 and then to 50000, it's set to 1500. Bond has its slaves and Internet is reachable. I'll try to reproduce it with Engine. I tried the flow described in comment 7 with Engine and it seems to be OK. Instead of eth0 i used ens3 and instead of eth1-eth3 I used dummy_1-dummy_3. [root@10-34-60-1 ~]# ip l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovirtmgmt state UP mode DEFAULT qlen 1000 link/ether 00:1a:4a:d0:40:08 brd ff:ff:ff:ff:ff:ff 3: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT link/ether 12:8a:d9:2b:9c:81 brd ff:ff:ff:ff:ff:ff 4: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether ee:6c:c3:52:90:bb brd ff:ff:ff:ff:ff:ff 5: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT link/ether 00:1a:4a:d0:40:08 brd ff:ff:ff:ff:ff:ff 6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether 1a:0f:52:7e:10:74 brd ff:ff:ff:ff:ff:ff 10: dummy_1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 50000 qdisc noqueue master net1 state UNKNOWN mode DEFAULT link/ether 4a:5b:33:6a:3b:e6 brd ff:ff:ff:ff:ff:ff 12: dummy_2: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN mode DEFAULT link/ether 12:8a:d9:2b:9c:81 brd ff:ff:ff:ff:ff:ff 13: dummy_3: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN mode DEFAULT link/ether 12:8a:d9:2b:9c:81 brd ff:ff:ff:ff:ff:ff 14: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 50000 qdisc noqueue state UP mode DEFAULT link/ether 4a:5b:33:6a:3b:e6 brd ff:ff:ff:ff:ff:ff ovirt-engine-3.6.0.1-1.el7.centos.noarch ovirt-release36-001-0.5.beta.noarch vdsm-4.17.9-0.el7.centos.noarch Maybe it's OK because of dummies or it was fixed. Maybe I missed something, should I try another versions? I forgot about: initscripts-9.49.30-1.el7.x86_64 Hi Petr,
This bug easily reproduced with my steps from description above^^
It's not fixed at all. Actuall result is still very relevant.
bondings = {'bond0': {'active_slave': '',
Network that was attached to bond0 is not reported as un-synced.
bond0 is broken and can't be created again. Reboot server fix this.
You can contact me and i will show it to you on my setup.
Tested on vdsm-4.17.9-1.el7ev.noarch and rhevm-3.6.0.1-0.1.el6.noarch
This bug shouldn't be ON_QA. Please assigned it back on you, thanks.
Correcting my self, Reboot is not fixing it, host stays in unrecoverable state. (In reply to Michael Burman from comment #8) > Hi Petr, please note that reboot bringing server back to recoverable state, > host is up with slaves and bond0 up as well. And it's possible attach > networks to bond0, break him or to approve any SN operations. Based on above comment and the fact it's negative case - moving to medium severity. Hi To be clear, reboot is not fixing the issue, host stays in unrecoverable state. See comments 14 + 19 ^^ . Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone. Burman, I am sorry, but realistically we are not going to make it. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |
Created attachment 981538 [details] logs Description of problem: Not supported MTU on host break Bond and his slaves, and after this can't perform any SetupNetworks operations. If i have a bond with 2 slaves configured and a network attached to him, i'm editing the network with some custom not-supported MTU on host, let say 50000, in the event log i will see: (1/1): Applying changes for network(s) topo_net on host navy-vds1.qa.lab.tlv.redhat.com. (User: admin) But when i will go to SetupNetworks i will see that the Bond is broken and network Detached. vdsCaps will report that bond exists but has no slaves. GUI will show no bond at all. From this point there will be no way to perform or approve any operation in SetupNetworks. vdsm report bond is exists without slaves. Version-Release number of selected component (if applicable): 3.5.0-0.29.el6ev vdsm-4.16.8.1-5.el7ev.x86_64 How reproducible: always Steps to Reproduce: 1. Create Bond with 2 slaves via SetupNetworks and approve operation 2. Create network with default MTU and attach to bond via SetupNetworks 3. Edit the network with a custom non-supported MTU, for ex 50000 Actual results: Go to SetupNetworks and find that Bond is broken, network Detached. vdsCaps report bond without slaves. Expected results: Network to be un-synced cause the host not support such MTU or to be blocked.