Created attachment 1170134 [details] no ovirtmgmt bridge after reboot Description of problem: ovirtmgmt bridge was badly persisted and vdsm couldn't restore it. This was discovered during BZ 1348103, but i'm not sure it depend on it. After add host was failed and auto-recovered, it seems that ovirtmgmt network wasn't persisted as it should or something was wrong with it. After host reboot, vdsm couldn't find ovirtmgmt bridge and ifcfg-ovirtmgmt wasn't exist. vdsm failed to restore it. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 188, in _getNetInfo 'stp': bridges.stp_state(iface)}) File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/bridges.py", line 66, in stp_state with open(BRIDGING_OPT % (bridge, 'stp_state')) as stp_file: IOError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgmt/bridge/stp_state' q Version-Release number of selected component (if applicable): 4.0.0.5-0.1.el7ev vdsm-4.18.3-0.el7ev.x86_64 How reproducible: sometimes Steps to Reproduce: 1. Add host 2. Reboot host Actual results: Host orchid-vds2.qa.lab.tlv.redhat.com does not comply with the cluster CLUSTER1 networks, the following networks are missing on host: 'ovirtmgmt' vdsm can't restore ovirtmgmt cause it's not exist on the host Expected results: Should work Additional info: See also https://bugzilla.redhat.com/show_bug.cgi?id=1348103 - Note, maybe it's related to BZ 1348103 and maybe not. can't understand the root issue. need further invastigation
[root@orchid-vds2 ~]# less /var/lib/vdsm/persistence/netconf /var/lib/vdsm/persistence/netconf: No such file or directory
What does it mean "badly persisted"? comment 1 claims that nothing was persisted. What's `virsh -r net-list`? Can you attach your ifcfg-*?
There is no ifcfg-ovirtmgmt. Any how i already took the host. once it will happen again, and i believe it will i will let you know. like i said it seems that it was not persisted, although vdsm trying to restore it.
This is reproduced again. now i'm almost sure it is related to bug 1348103. once the host auto recovered from the add host failure, it didn't persisted the ovirtmgmt, there was no set safe on the bridge on the auto recover. - After host auto recover and before reboot : [root@orchid-vds2 ~]# cat /var/lib/vdsm/persistence/netconf cat: /var/lib/vdsm/persistence/netconf: No such file or directory [root@orchid-vds2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt # Generated by VDSM version 4.18.3-0.el7ev DEVICE=ovirtmgmt TYPE=Bridge DELAY=0 STP=off ONBOOT=yes BOOTPROTO=dhcp MTU=1500 DEFROUTE=yes NM_CONTROLLED=no IPV6INIT=no [root@orchid-vds2 ~]# cat /var/run/vdsm/netconf/nets/ovirtmgmt { "ipv6autoconf": false, "bridged": true, "nic": "enp4s0", "mtu": 1500, "switch": "legacy", "dhcpv6": false, "stp": false, "hostQos": { "out": { "ls": { "m2": 50 } } }, "defaultRoute": true, "bootproto": "dhcp" [root@orchid-vds2 ~]# virsh -r net-list Name State Autostart Persistent ---------------------------------------------------------- ;vdsmdummy; active no no default active no yes vdsm-ovirtmgmt active yes yes - After reboot -> root@orchid-vds2 ~]# virsh -r net-list Name State Autostart Persistent ---------------------------------------------------------- ;vdsmdummy; active no no vdsm-ovirtmgmt active yes yes [root@orchid-vds2 ~]# cat /var/run/vdsm/netconf/nets/ovirtmgmt cat: /var/run/vdsm/netconf/nets/ovirtmgmt: No such file or directory [root@orchid-vds2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt cat: /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt: No such file or directory [root@orchid-vds2 ~]# cat /var/lib/vdsm/persistence/netconf cat: /var/lib/vdsm/persistence/netconf: No such file or directory
I managed to reproduce this bug without 1348103. After a successful add host(ovirtmgmt was persisted), i attached several network to the host via setup networks, non of them was persisted and all of them failed to restore by vdsm. This is happening from time to time. - The summery of the bug should be changed, cause it happening to all networks, not only ovirtmgmt. Dan, could it be related to the revert on this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1340234 ? I'm not sure i even can verify it ^^
(In reply to Michael Burman from comment #5) > I managed to reproduce this bug without 1348103. After a successful add > host(ovirtmgmt was persisted), i attached several network to the host via > setup networks, non of them was persisted and all of them failed to restore > by vdsm. > This is happening from time to time. > - The summery of the bug should be changed, cause it happening to all > networks, not only ovirtmgmt. > > Dan, could it be related to the revert on this BZ > https://bugzilla.redhat.com/show_bug.cgi?id=1340234 ? > I'm not sure i even can verify it ^^ Clearly, if setSafeNetConfig was never sent from Engine, nothing would be persistent. The bug here is different - we got to an inconsistent state.. What you describe in comment 5 is unrelated to this bug as well as to bug 1340234.
Ok, i will do my best to reproduce it and report a new bug.
From the logs, VDSM did not received setSafeNetworkConfig from Engine, therefore, the config has not been persisted. On reboot it came back to the original non VDSM config.
This issue has not been recreated for the last month. The investigation only revealed that the cause of the problem is due to the missing setSafeNetworkConfig instruction from Engine. Therefore, we are closing this bug for now, if the problem is seen again, please reopen the bug and provide logs from Engine and VDSM (vdsm, supervdsm).