Bug 1252268

Summary: Networks definitions are missing after restoration of networks that were changed since last network persistence.
Product: Red Hat Enterprise Virtualization Manager Reporter: Chaofeng Wu <cwu>
Component: ovirt-nodeAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED ERRATA QA Contact: Chaofeng Wu <cwu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.4CC: bazulay, bmcclain, cshao, cwu, danken, dougsland, fdeutsch, gklein, huiwa, huzhao, lpeer, lsurette, mburman, mgoldboi, yaniwang, ycui, yeylon, ykaul
Target Milestone: ovirt-3.6.0-rcKeywords: Reopened, ZStream
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-node-3.3.0-0.4.20150906git14a6024.el7ev Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1254305 (view as bug list) Environment:
Last Closed: 2016-03-09 14:35:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1254305    
Attachments:
Description Flags
libvirt connection error on TUI
none
libvirt error during system reboot
none
sosreport after install rhevh6.6
none
sosreport upgrade to rhevh6.7 before reboot
none
sosreport vdsmd are libvirtd are running after rhevh6.7 reboot twice
none
sosreport
none
manual restart service failed
none
sosreport with static IP address
none
sosreport in a clean environment
none
patched ifcfg.py code
none
sosreport ifcfg.py none

Description Chaofeng Wu 2015-08-11 06:05:58 UTC
Created attachment 1061329 [details]
libvirt connection error on TUI

Description of problem:
This bug is split from BUG1251040. After upgrade RHEV-H6.6 to RHEV-H6.7, we find out that some network configurations are missing after the RHEV-H upgrade successful and reboot more twice.

Version-Release number of selected component (if applicable):
rhevh-6.6-20150512.0.el6ev.iso
rhev-hypervisor6-6.7-20150804.0.iso

How reproducible:
80%

Steps to Reproduce:
1, PXE install rhevh-6.6-20150512.0.el6ev.iso, configure eth1 with vlan tag 20 then register to RHEV-M3.5.4
2, On RHEV-M web portal, the host status is up, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save.
3, All the networks are up, then reboot the system, upgrage to rhevh-6.6-20150512.0.el6ev.iso to the rhev-hypervisor6-6.7-20150804.0.iso.
4, After upgraded successful and the system up, break bond0, then create eth0 and eth3 as bond0,drag testnet2 to bond0, drag testnet1 to eth2, save.
5, All the networks are up, then reboot the system.
6, Reboot system more than twice, check the vdsmd and libvirtd service status. All of them are not running.
7, Roll back to the previous rhevh6.6, sometimes vdsmd and libvirtd are running, sometimes are not.

Actual results:
Some network configurations include rhevm, testnet2, bond0 are missing, RHEV-H6.7 status is down on RHEV-M portal.

Expected results:
All the network configurations are not lost, RHEV-H6.7 status is up on RHEV-M portal.

Additional info:

Comment 1 Chaofeng Wu 2015-08-11 06:06:38 UTC
Created attachment 1061330 [details]
libvirt error during system reboot

Comment 2 Chaofeng Wu 2015-08-11 06:07:43 UTC
Created attachment 1061331 [details]
sosreport after install rhevh6.6

Comment 3 Chaofeng Wu 2015-08-11 06:11:02 UTC
Created attachment 1061332 [details]
sosreport upgrade to rhevh6.7 before reboot

Comment 4 Chaofeng Wu 2015-08-11 06:12:57 UTC
Created attachment 1061333 [details]
sosreport vdsmd are libvirtd are running after rhevh6.7 reboot twice

Comment 5 Ido Barkan 2015-08-11 14:15:56 UTC
This does not reproduce. At least not in the scenario proposed in the Description. Also, if this is a bug it is probably a duplicate of https://bugzilla.redhat.com/1251040 because network restoration is remotely related to whether there are Vlans or not.

*** This bug has been marked as a duplicate of bug 1251040 ***

Comment 6 Chaofeng Wu 2015-08-12 09:03:46 UTC
Check with the latest iso rhev-hypervisor6-6.7-20150811.0.iso which provide in BUG1251040 commnet 31. Some network configurations are still missing. So reopen this bug.

After upgrade from rhevh-6.6-20150512 to rhevh-6.7-20150811 and make some network changes, then reboot the system, all the networks are fine, then reboot the system again, you will find that the libvirtd service start up failed during the system boot up process. After the system boot up, check vdsmd and libvirtd service status, all of them are stopped.

Check network configuration in /var/lib/vdsm/persistence/netconf/, all the files are fine. It seems that networks are not restored due to vdsmd service does not run after the system reboot.

Comment 7 Chaofeng Wu 2015-08-12 10:48:51 UTC
Created attachment 1061904 [details]
sosreport

Comment 8 Chaofeng Wu 2015-08-12 10:53:00 UTC
Created attachment 1061906 [details]
manual restart service failed

Comment 9 Meni Yakove 2015-08-12 11:30:11 UTC
Can't reproduce:

1. installed 6.6
2. create VLAN via TUI and register the host in the engine
3. attach some networks, (over bond and over NIC) and reboot.
4. upgrade
5. make some changes in SN > reboot
6. another reboot

Comment 10 Fabian Deutsch 2015-08-13 10:12:49 UTC
One difference between the two networks might be that the DHCP responses are slower in the  network where the failure appears.

Two ways to identify if it's the slow response:
1. Try a small network with a dhcp server (the response should be fast)
2. Try using a static IP

Comment 11 Chaofeng Wu 2015-08-13 11:26:37 UTC
Still reproduce this bug with a static IP.

Comment 12 Chaofeng Wu 2015-08-13 11:40:23 UTC
Created attachment 1062489 [details]
sosreport with static IP address

Comment 13 Chaofeng Wu 2015-08-14 09:49:29 UTC
We just try the following steps in clean environment with rhev-hypervisor6-6.7-20150811.0.iso, also have the same issue.

Steps:
1, PXE install rhev-hypervisor6-6.7-20150811.0.iso, configure eth1 with static IP and vlan tag 20 then register to RHEV-M3.5.4
2, On RHEV-M web portal, the host status is up, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save.
3, After all the networks configure successful, then reboot the system.
4, After the system up, break bond0, then create eth0 and eth3 as bond0, drag testnet2 to bond0, drag testnet1 to eth2, save.
5, All the networks are up, then reboot the system.
6, Reboot system more than twice, check the vdsmd and libvirtd service status.

Comment 14 Chaofeng Wu 2015-08-14 10:08:51 UTC
Created attachment 1062957 [details]
sosreport in a clean environment

Comment 15 Ido Barkan 2015-08-16 14:12:38 UTC
Hi Chaofeng,
Can you please try to reproduce using this new ifcfg.py file?
I am trying to save us the painful turnaround of rebuilding vdsm and rhevh before we have a tested path.
steps to deploy a rhevh server:
1. remount your root to be writable
   # mount -o rw,remount /
2. copy the attached ifcfg.py to /usr/share/vdsm/network/configurators/
3. compile the .py to .pyc by importing the code using python interactive code.
   # python
   >>> import sys
   >>> sys.path.append('/usr/share/vdsm/')
   >>> from network.configurators import ifcfg
   >>> ^D
4. back to the shell. observe that the pyc file is indeed newer:
   # ls -l /usr/share/vdsm/network/configurators/
5. persist the updated pyc file:
   # persist ifcfg.pyc

Comment 16 Ido Barkan 2015-08-16 14:15:11 UTC
Created attachment 1063522 [details]
patched ifcfg.py code

Comment 17 Chaofeng Wu 2015-08-17 07:24:06 UTC
(In reply to Ido Barkan from comment #15)
> Hi Chaofeng,
> Can you please try to reproduce using this new ifcfg.py file?
> I am trying to save us the painful turnaround of rebuilding vdsm and rhevh
> before we have a tested path.
> steps to deploy a rhevh server:
> 1. remount your root to be writable
>    # mount -o rw,remount /
> 2. copy the attached ifcfg.py to /usr/share/vdsm/network/configurators/
> 3. compile the .py to .pyc by importing the code using python interactive
> code.
>    # python
>    >>> import sys
>    >>> sys.path.append('/usr/share/vdsm/')
>    >>> from network.configurators import ifcfg
>    >>> ^D
> 4. back to the shell. observe that the pyc file is indeed newer:
>    # ls -l /usr/share/vdsm/network/configurators/
> 5. persist the updated pyc file:
>    # persist ifcfg.pyc

Hi Ido,

Try the following steps:

1. Install rhev-hypervisor6-6.7-20150813.0.iso, configure network and follow the your steps to persist new ifcfg.pyc in rhevh, reboot system.
2. Register to RHEV-M, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save.
3. After all the networks configure successful, then reboot the system.

After step 3, we find that some networks are not up. Also I find some error in supervdsm.log:
restore-net::WARNING::2015-08-17 02:23:52,082::__init__::491::root.ovirt.node.utils.fs::(_persist_file) File "/etc/sysconfig/network-scripts/ifcfg-eth1
" had already been persisted
restore-net::ERROR::2015-08-17 02:23:52,082::__init__::52::root::(__exit__) Failed rollback transaction last known good network. ERR=%s
Traceback (most recent call last):
  File "/usr/share/vdsm/network/api.py", line 680, in setupNetworks
  File "/usr/share/vdsm/network/api.py", line 213, in wrapped
  File "/usr/share/vdsm/network/api.py", line 302, in addNetwork
  File "/usr/share/vdsm/network/models.py", line 160, in configure
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 87, in configureBridge
  File "/usr/share/vdsm/network/models.py", line 124, in configure
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 93, in configureVlan
  File "/usr/share/vdsm/network/models.py", line 97, in configure
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 154, in configureNic
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 623, in addNic
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 558, in _createConfFile
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 501, in writeConfFile
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 435, in persist
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 95, in restorecon
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/security.py", line 105, in restorecon
  File "/usr/lib64/python2.6/site-packages/selinux/__init__.py", line 76, in restorecon
TypeError: in method 'matchpathcon', argument 1 of type 'char const *'
MainProcess::DEBUG::2015-08-17 02:23:55,161::supervdsmServer::102::SuperVdsm.ServerCallback::(wrapper) call readMultipathConf with () {}

You can find details in attachment.

Comment 18 Chaofeng Wu 2015-08-17 07:26:33 UTC
Created attachment 1063659 [details]
sosreport ifcfg.py

Comment 22 Dan Kenigsberg 2015-08-25 12:27:06 UTC
calls of selinux.getfilecon() and selinux.chcon(), too, should be passed the utf8-encoded abspath.

restore-net::ERROR::2015-08-25 05:44:35,835::__init__::432::root.ovirt.node.utils.fs::(persist) Failed to persist "/etc/sysconfig/network-scripts/ifcfg-eth1"
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 429, in persist
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 505, in _persist_file
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 451, in copy_attributes
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/security.py", line 112, in getcon
TypeError: in method 'getfilecon', argument 1 of type 'char const *'

Comment 23 Fabian Deutsch 2015-09-02 09:08:00 UTC
*** Bug 1256742 has been marked as a duplicate of this bug. ***

Comment 24 Fabian Deutsch 2015-09-02 13:57:10 UTC
All of the patches have been merged in the master branch, moving this file to MODIFIED.

Comment 26 Chaofeng Wu 2015-11-12 14:58:27 UTC
Verified on the rhev-hypervisor7-7.2-20151104 build.

Version-Release number of selected component (if applicable):
ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch
rhev-hypervisor7-7.2-20151104.0.iso

Steps:
1, Install rhev-hypervisor7-7.1-20151015.0.iso, configure eth0 and register to RHEV-M 3.5.6
2. Create bond0 with eth1 and eth2, and create Network testnet0 and testnet1, drag testnet0 to the bond0 and drag testnet1 to eth2,save the changes.
3. Upgrade to rhev-hypervisor7-7.2-20151104 via RHEV-M
4. Break bond0, recreate bond0 with eth1 and eth3, drag testnet0 to bond0, drag testnet1 to eth2, save the changes.
5. Reboot the system more than two time, check the vdsmd and libvirtd service status, check ovirt.log and ovirt-node.log.

Result:
After step5 vdsmd and libvirtd service status were running successful, there was no error in both ovirt.log and ovirt-node.log.

This bug is fixed, so change the status to VERIFIED.

Comment 28 errata-xmlrpc 2016-03-09 14:35:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0378.html