Bug 1252268 - Networks definitions are missing after restoration of networks that were changed since last network persistence.
Networks definitions are missing after restoration of networks that were chan...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node (Show other bugs)
3.5.4
Unspecified Unspecified
urgent Severity urgent
: ovirt-3.6.0-rc
: 3.6.0
Assigned To: Fabian Deutsch
Chaofeng Wu
: Reopened, ZStream
: 1256742 (view as bug list)
Depends On:
Blocks: 1254305
  Show dependency treegraph
 
Reported: 2015-08-11 02:05 EDT by Chaofeng Wu
Modified: 2016-03-09 09:35 EST (History)
18 users (show)

See Also:
Fixed In Version: ovirt-node-3.3.0-0.4.20150906git14a6024.el7ev
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1254305 (view as bug list)
Environment:
Last Closed: 2016-03-09 09:35:23 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Node
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
libvirt connection error on TUI (446.08 KB, image/png)
2015-08-11 02:05 EDT, Chaofeng Wu
no flags Details
libvirt error during system reboot (489.70 KB, image/png)
2015-08-11 02:06 EDT, Chaofeng Wu
no flags Details
sosreport after install rhevh6.6 (6.21 MB, application/x-xz)
2015-08-11 02:07 EDT, Chaofeng Wu
no flags Details
sosreport upgrade to rhevh6.7 before reboot (6.84 MB, application/x-xz)
2015-08-11 02:11 EDT, Chaofeng Wu
no flags Details
sosreport vdsmd are libvirtd are running after rhevh6.7 reboot twice (6.85 MB, application/x-xz)
2015-08-11 02:12 EDT, Chaofeng Wu
no flags Details
sosreport (6.93 MB, application/x-xz)
2015-08-12 06:48 EDT, Chaofeng Wu
no flags Details
manual restart service failed (298.22 KB, image/png)
2015-08-12 06:53 EDT, Chaofeng Wu
no flags Details
sosreport with static IP address (6.84 MB, application/x-xz)
2015-08-13 07:40 EDT, Chaofeng Wu
no flags Details
sosreport in a clean environment (6.86 MB, application/x-xz)
2015-08-14 06:08 EDT, Chaofeng Wu
no flags Details
patched ifcfg.py code (34.20 KB, text/plain)
2015-08-16 10:15 EDT, Ido Barkan
no flags Details
sosreport ifcfg.py (6.82 MB, application/x-xz)
2015-08-17 03:26 EDT, Chaofeng Wu
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 44924 master MERGED security: Encode filename before restoring it's label Never
oVirt gerrit 44926 ovirt-3.5 MERGED net: always persist ifcfg files. Never
oVirt gerrit 44929 ovirt-3.6 MERGED net: always persist ifcfg files. Never
oVirt gerrit 44941 ovirt-3.5 MERGED security: Encode filename before restoring it's label Never
oVirt gerrit 45301 master MERGED security: Add encode to getfilecon and chcon Never
oVirt gerrit 45312 ovirt-3.5 MERGED security: Add encode to getfilecon and chcon Never

  None (edit)
Description Chaofeng Wu 2015-08-11 02:05:58 EDT
Created attachment 1061329 [details]
libvirt connection error on TUI

Description of problem:
This bug is split from BUG1251040. After upgrade RHEV-H6.6 to RHEV-H6.7, we find out that some network configurations are missing after the RHEV-H upgrade successful and reboot more twice.

Version-Release number of selected component (if applicable):
rhevh-6.6-20150512.0.el6ev.iso
rhev-hypervisor6-6.7-20150804.0.iso

How reproducible:
80%

Steps to Reproduce:
1, PXE install rhevh-6.6-20150512.0.el6ev.iso, configure eth1 with vlan tag 20 then register to RHEV-M3.5.4
2, On RHEV-M web portal, the host status is up, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save.
3, All the networks are up, then reboot the system, upgrage to rhevh-6.6-20150512.0.el6ev.iso to the rhev-hypervisor6-6.7-20150804.0.iso.
4, After upgraded successful and the system up, break bond0, then create eth0 and eth3 as bond0,drag testnet2 to bond0, drag testnet1 to eth2, save.
5, All the networks are up, then reboot the system.
6, Reboot system more than twice, check the vdsmd and libvirtd service status. All of them are not running.
7, Roll back to the previous rhevh6.6, sometimes vdsmd and libvirtd are running, sometimes are not.

Actual results:
Some network configurations include rhevm, testnet2, bond0 are missing, RHEV-H6.7 status is down on RHEV-M portal.

Expected results:
All the network configurations are not lost, RHEV-H6.7 status is up on RHEV-M portal.

Additional info:
Comment 1 Chaofeng Wu 2015-08-11 02:06:38 EDT
Created attachment 1061330 [details]
libvirt error during system reboot
Comment 2 Chaofeng Wu 2015-08-11 02:07:43 EDT
Created attachment 1061331 [details]
sosreport after install rhevh6.6
Comment 3 Chaofeng Wu 2015-08-11 02:11:02 EDT
Created attachment 1061332 [details]
sosreport upgrade to rhevh6.7 before reboot
Comment 4 Chaofeng Wu 2015-08-11 02:12:57 EDT
Created attachment 1061333 [details]
sosreport vdsmd are libvirtd are running after rhevh6.7 reboot twice
Comment 5 Ido Barkan 2015-08-11 10:15:56 EDT
This does not reproduce. At least not in the scenario proposed in the Description. Also, if this is a bug it is probably a duplicate of https://bugzilla.redhat.com/1251040 because network restoration is remotely related to whether there are Vlans or not.

*** This bug has been marked as a duplicate of bug 1251040 ***
Comment 6 Chaofeng Wu 2015-08-12 05:03:46 EDT
Check with the latest iso rhev-hypervisor6-6.7-20150811.0.iso which provide in BUG1251040 commnet 31. Some network configurations are still missing. So reopen this bug.

After upgrade from rhevh-6.6-20150512 to rhevh-6.7-20150811 and make some network changes, then reboot the system, all the networks are fine, then reboot the system again, you will find that the libvirtd service start up failed during the system boot up process. After the system boot up, check vdsmd and libvirtd service status, all of them are stopped.

Check network configuration in /var/lib/vdsm/persistence/netconf/, all the files are fine. It seems that networks are not restored due to vdsmd service does not run after the system reboot.
Comment 7 Chaofeng Wu 2015-08-12 06:48:51 EDT
Created attachment 1061904 [details]
sosreport
Comment 8 Chaofeng Wu 2015-08-12 06:53:00 EDT
Created attachment 1061906 [details]
manual restart service failed
Comment 9 Meni Yakove 2015-08-12 07:30:11 EDT
Can't reproduce:

1. installed 6.6
2. create VLAN via TUI and register the host in the engine
3. attach some networks, (over bond and over NIC) and reboot.
4. upgrade
5. make some changes in SN > reboot
6. another reboot
Comment 10 Fabian Deutsch 2015-08-13 06:12:49 EDT
One difference between the two networks might be that the DHCP responses are slower in the  network where the failure appears.

Two ways to identify if it's the slow response:
1. Try a small network with a dhcp server (the response should be fast)
2. Try using a static IP
Comment 11 Chaofeng Wu 2015-08-13 07:26:37 EDT
Still reproduce this bug with a static IP.
Comment 12 Chaofeng Wu 2015-08-13 07:40:23 EDT
Created attachment 1062489 [details]
sosreport with static IP address
Comment 13 Chaofeng Wu 2015-08-14 05:49:29 EDT
We just try the following steps in clean environment with rhev-hypervisor6-6.7-20150811.0.iso, also have the same issue.

Steps:
1, PXE install rhev-hypervisor6-6.7-20150811.0.iso, configure eth1 with static IP and vlan tag 20 then register to RHEV-M3.5.4
2, On RHEV-M web portal, the host status is up, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save.
3, After all the networks configure successful, then reboot the system.
4, After the system up, break bond0, then create eth0 and eth3 as bond0, drag testnet2 to bond0, drag testnet1 to eth2, save.
5, All the networks are up, then reboot the system.
6, Reboot system more than twice, check the vdsmd and libvirtd service status.
Comment 14 Chaofeng Wu 2015-08-14 06:08:51 EDT
Created attachment 1062957 [details]
sosreport in a clean environment
Comment 15 Ido Barkan 2015-08-16 10:12:38 EDT
Hi Chaofeng,
Can you please try to reproduce using this new ifcfg.py file?
I am trying to save us the painful turnaround of rebuilding vdsm and rhevh before we have a tested path.
steps to deploy a rhevh server:
1. remount your root to be writable
   # mount -o rw,remount /
2. copy the attached ifcfg.py to /usr/share/vdsm/network/configurators/
3. compile the .py to .pyc by importing the code using python interactive code.
   # python
   >>> import sys
   >>> sys.path.append('/usr/share/vdsm/')
   >>> from network.configurators import ifcfg
   >>> ^D
4. back to the shell. observe that the pyc file is indeed newer:
   # ls -l /usr/share/vdsm/network/configurators/
5. persist the updated pyc file:
   # persist ifcfg.pyc
Comment 16 Ido Barkan 2015-08-16 10:15:11 EDT
Created attachment 1063522 [details]
patched ifcfg.py code
Comment 17 Chaofeng Wu 2015-08-17 03:24:06 EDT
(In reply to Ido Barkan from comment #15)
> Hi Chaofeng,
> Can you please try to reproduce using this new ifcfg.py file?
> I am trying to save us the painful turnaround of rebuilding vdsm and rhevh
> before we have a tested path.
> steps to deploy a rhevh server:
> 1. remount your root to be writable
>    # mount -o rw,remount /
> 2. copy the attached ifcfg.py to /usr/share/vdsm/network/configurators/
> 3. compile the .py to .pyc by importing the code using python interactive
> code.
>    # python
>    >>> import sys
>    >>> sys.path.append('/usr/share/vdsm/')
>    >>> from network.configurators import ifcfg
>    >>> ^D
> 4. back to the shell. observe that the pyc file is indeed newer:
>    # ls -l /usr/share/vdsm/network/configurators/
> 5. persist the updated pyc file:
>    # persist ifcfg.pyc

Hi Ido,

Try the following steps:

1. Install rhev-hypervisor6-6.7-20150813.0.iso, configure network and follow the your steps to persist new ifcfg.pyc in rhevh, reboot system.
2. Register to RHEV-M, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save.
3. After all the networks configure successful, then reboot the system.

After step 3, we find that some networks are not up. Also I find some error in supervdsm.log:
restore-net::WARNING::2015-08-17 02:23:52,082::__init__::491::root.ovirt.node.utils.fs::(_persist_file) File "/etc/sysconfig/network-scripts/ifcfg-eth1
" had already been persisted
restore-net::ERROR::2015-08-17 02:23:52,082::__init__::52::root::(__exit__) Failed rollback transaction last known good network. ERR=%s
Traceback (most recent call last):
  File "/usr/share/vdsm/network/api.py", line 680, in setupNetworks
  File "/usr/share/vdsm/network/api.py", line 213, in wrapped
  File "/usr/share/vdsm/network/api.py", line 302, in addNetwork
  File "/usr/share/vdsm/network/models.py", line 160, in configure
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 87, in configureBridge
  File "/usr/share/vdsm/network/models.py", line 124, in configure
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 93, in configureVlan
  File "/usr/share/vdsm/network/models.py", line 97, in configure
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 154, in configureNic
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 623, in addNic
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 558, in _createConfFile
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 501, in writeConfFile
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 435, in persist
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 95, in restorecon
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/security.py", line 105, in restorecon
  File "/usr/lib64/python2.6/site-packages/selinux/__init__.py", line 76, in restorecon
TypeError: in method 'matchpathcon', argument 1 of type 'char const *'
MainProcess::DEBUG::2015-08-17 02:23:55,161::supervdsmServer::102::SuperVdsm.ServerCallback::(wrapper) call readMultipathConf with () {}

You can find details in attachment.
Comment 18 Chaofeng Wu 2015-08-17 03:26:33 EDT
Created attachment 1063659 [details]
sosreport ifcfg.py
Comment 22 Dan Kenigsberg 2015-08-25 08:27:06 EDT
calls of selinux.getfilecon() and selinux.chcon(), too, should be passed the utf8-encoded abspath.

restore-net::ERROR::2015-08-25 05:44:35,835::__init__::432::root.ovirt.node.utils.fs::(persist) Failed to persist "/etc/sysconfig/network-scripts/ifcfg-eth1"
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 429, in persist
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 505, in _persist_file
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/fs/__init__.py", line 451, in copy_attributes
  File "/usr/lib/python2.6/site-packages/ovirt/node/utils/security.py", line 112, in getcon
TypeError: in method 'getfilecon', argument 1 of type 'char const *'
Comment 23 Fabian Deutsch 2015-09-02 05:08:00 EDT
*** Bug 1256742 has been marked as a duplicate of this bug. ***
Comment 24 Fabian Deutsch 2015-09-02 09:57:10 EDT
All of the patches have been merged in the master branch, moving this file to MODIFIED.
Comment 26 Chaofeng Wu 2015-11-12 09:58:27 EST
Verified on the rhev-hypervisor7-7.2-20151104 build.

Version-Release number of selected component (if applicable):
ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch
rhev-hypervisor7-7.2-20151104.0.iso

Steps:
1, Install rhev-hypervisor7-7.1-20151015.0.iso, configure eth0 and register to RHEV-M 3.5.6
2. Create bond0 with eth1 and eth2, and create Network testnet0 and testnet1, drag testnet0 to the bond0 and drag testnet1 to eth2,save the changes.
3. Upgrade to rhev-hypervisor7-7.2-20151104 via RHEV-M
4. Break bond0, recreate bond0 with eth1 and eth3, drag testnet0 to bond0, drag testnet1 to eth2, save the changes.
5. Reboot the system more than two time, check the vdsmd and libvirtd service status, check ovirt.log and ovirt-node.log.

Result:
After step5 vdsmd and libvirtd service status were running successful, there was no error in both ovirt.log and ovirt-node.log.

This bug is fixed, so change the status to VERIFIED.
Comment 28 errata-xmlrpc 2016-03-09 09:35:23 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0378.html

Note You need to log in before you can comment on or make changes to this bug.