Bug 1348451 - Engine has not sent an setSafeNetworkConfig command to persist VDSM network management
Summary: Engine has not sent an setSafeNetworkConfig command to persist VDSM network m...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 4.0.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.0.2
: ---
Assignee: Edward Haas
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-21 08:18 UTC by Michael Burman
Modified: 2016-07-24 14:01 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-24 14:01:05 UTC
oVirt Team: Network
Embargoed:
ylavi: ovirt-4.0.z?
ylavi: exception?
mburman: planning_ack?
mburman: devel_ack?
mburman: testing_ack?


Attachments (Terms of Use)
no ovirtmgmt bridge after reboot (138.27 KB, application/x-gzip)
2016-06-21 08:18 UTC, Michael Burman
no flags Details

Description Michael Burman 2016-06-21 08:18:12 UTC
Created attachment 1170134 [details]
no ovirtmgmt bridge after reboot

Description of problem:
ovirtmgmt bridge was badly persisted and vdsm couldn't restore it.

This was discovered during BZ 1348103, but i'm not sure it depend on it.
After add host was failed and auto-recovered, it seems that ovirtmgmt network wasn't persisted as it should or something was wrong with it. 

After host reboot, vdsm couldn't find ovirtmgmt bridge and ifcfg-ovirtmgmt wasn't exist. vdsm failed to restore it.

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 188, in _getNetInfo
    'stp': bridges.stp_state(iface)})
  File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/bridges.py", line 66, in stp_state
    with open(BRIDGING_OPT % (bridge, 'stp_state')) as stp_file:
IOError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgmt/bridge/stp_state'
q

Version-Release number of selected component (if applicable):
4.0.0.5-0.1.el7ev
vdsm-4.18.3-0.el7ev.x86_64

How reproducible:
sometimes

Steps to Reproduce:
1. Add host 
2. Reboot host

Actual results:
Host orchid-vds2.qa.lab.tlv.redhat.com does not comply with the cluster CLUSTER1 networks, the following networks are missing on host: 'ovirtmgmt'

vdsm can't restore ovirtmgmt cause it's not exist on the host

Expected results:
Should work

Additional info:
See also https://bugzilla.redhat.com/show_bug.cgi?id=1348103
- Note, maybe it's related to BZ 1348103 and maybe not. can't understand the root issue. need further invastigation

Comment 1 Michael Burman 2016-06-21 08:26:54 UTC
[root@orchid-vds2 ~]# less /var/lib/vdsm/persistence/netconf 
/var/lib/vdsm/persistence/netconf: No such file or directory

Comment 2 Dan Kenigsberg 2016-06-21 12:22:36 UTC
What does it mean "badly persisted"? comment 1 claims that nothing was persisted.

What's `virsh -r net-list`?

Can you attach your ifcfg-*?

Comment 3 Michael Burman 2016-06-21 12:48:56 UTC
There is no ifcfg-ovirtmgmt.
Any how i already took the host. once it will happen again, and i believe it will i will let you know. like i said it seems that it was not persisted, although vdsm trying to restore it.

Comment 4 Michael Burman 2016-06-22 13:10:47 UTC
This is reproduced again. now i'm almost sure it is related to bug 1348103. once the host auto recovered from the add host failure, it didn't persisted the ovirtmgmt, there was no set safe on the bridge on the auto recover.

- After host auto recover and before reboot :

 [root@orchid-vds2 ~]# cat /var/lib/vdsm/persistence/netconf
cat: /var/lib/vdsm/persistence/netconf: No such file or directory

[root@orchid-vds2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt 
# Generated by VDSM version 4.18.3-0.el7ev
DEVICE=ovirtmgmt
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
IPV6INIT=no

  
[root@orchid-vds2 ~]# cat /var/run/vdsm/netconf/nets/ovirtmgmt 
{
    "ipv6autoconf": false, 
    "bridged": true, 
    "nic": "enp4s0", 
    "mtu": 1500, 
    "switch": "legacy", 
    "dhcpv6": false, 
    "stp": false, 
    "hostQos": {
        "out": {
            "ls": {
                "m2": 50
            }
        }
    }, 
    "defaultRoute": true, 
    "bootproto": "dhcp"

[root@orchid-vds2 ~]# virsh -r net-list
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 ;vdsmdummy;          active     no            no
 default              active     no            yes
 vdsm-ovirtmgmt       active     yes           yes

- After reboot ->

root@orchid-vds2 ~]# virsh -r net-list
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 ;vdsmdummy;          active     no            no
 vdsm-ovirtmgmt       active     yes           yes


[root@orchid-vds2 ~]# cat /var/run/vdsm/netconf/nets/ovirtmgmt 
cat: /var/run/vdsm/netconf/nets/ovirtmgmt: No such file or directory

[root@orchid-vds2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt 
cat: /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt: No such file or directory

[root@orchid-vds2 ~]# cat /var/lib/vdsm/persistence/netconf
cat: /var/lib/vdsm/persistence/netconf: No such file or directory

Comment 5 Michael Burman 2016-06-23 07:41:31 UTC
I managed to reproduce this bug without 1348103. After a successful add host(ovirtmgmt was persisted), i attached several network to the host via setup networks, non of them was persisted and all of them failed to restore by vdsm. 
This is happening from time to time.
- The summery of the bug should be changed, cause it happening to all networks, not only ovirtmgmt.

Dan, could it be related to the revert on this BZ  https://bugzilla.redhat.com/show_bug.cgi?id=1340234 ? 
I'm not sure i even can verify it ^^

Comment 6 Dan Kenigsberg 2016-06-23 11:53:13 UTC
(In reply to Michael Burman from comment #5)
> I managed to reproduce this bug without 1348103. After a successful add
> host(ovirtmgmt was persisted), i attached several network to the host via
> setup networks, non of them was persisted and all of them failed to restore
> by vdsm. 
> This is happening from time to time.
> - The summery of the bug should be changed, cause it happening to all
> networks, not only ovirtmgmt.
> 
> Dan, could it be related to the revert on this BZ 
> https://bugzilla.redhat.com/show_bug.cgi?id=1340234 ? 
> I'm not sure i even can verify it ^^

Clearly, if setSafeNetConfig was never sent from Engine, nothing would be persistent. The bug here is different - we got to an inconsistent state..
What you describe in comment 5 is unrelated to this bug as well as to bug 1340234.

Comment 7 Michael Burman 2016-06-23 12:01:57 UTC
Ok, i will do my best to reproduce it and report a new bug.

Comment 8 Edward Haas 2016-07-24 13:48:01 UTC
From the logs, VDSM did not received setSafeNetworkConfig from Engine, therefore, the config has not been persisted. 
On reboot it came back to the original non VDSM config.

Comment 9 Edward Haas 2016-07-24 14:01:05 UTC
This issue has not been recreated for the last month.
The investigation only revealed that the cause of the problem is due to the missing setSafeNetworkConfig instruction from Engine.

Therefore, we are closing this bug for now, if the problem is seen again, please reopen the bug and provide logs from Engine and VDSM (vdsm, supervdsm).


Note You need to log in before you can comment on or make changes to this bug.