Bug 1205711 - VDSM script reset network configuration on every reboot when based on predefined bond
Summary: VDSM script reset network configuration on every reboot when based on predefi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.5.1
Assignee: Ido Barkan
QA Contact: Michael Burman
URL:
Whiteboard: network
: 1209401 1209486 (view as bug list)
Depends On: 1194553 1209486
Blocks: 1174707 1193058
TreeView+ depends on / blocked
 
Reported: 2015-03-25 13:54 UTC by rhev-integ
Modified: 2019-07-11 08:50 UTC (History)
27 users (show)

Fixed In Version: v4.16.12.1
Doc Type: Release Note
Doc Text:
Cause: rhev-3.5.0's vdsm failed to persist ifcfg-bond* files if they have been created via rhev-h's TUI. Consequence: Post reboot, including post upgrade to rhev-3.5.1, rhev-h boots without bond files, and without networks that depends on them. Workaround: re-define the bonds manually in the TUI and re-define the networks on top of them
Clone Of: 1194553
Environment:
Last Closed: 2015-05-27 07:19:17 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:
ylavi: Triaged+


Attachments (Terms of Use)
vdsm log and generated ifcfg files (186.39 KB, application/x-gzip)
2015-04-06 07:04 UTC, Michael Burman
no flags Details
vdsm logs (176.82 KB, application/x-gzip)
2015-04-07 08:49 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1346443 0 None None None Never
oVirt gerrit 38369 0 None None None Never
oVirt gerrit 38597 0 None None None Never

Comment 1 Michael Burman 2015-04-02 08:51:09 UTC
Verified and tested successfully with new build rhevm-3.5.1-0.3.el6ev.noarch
vdsm-4.16.13-1.el6ev.x86_64
rhel 6.6
vdsm upgrade from 4.14 >> 4.16.13-1 
followed next steps:
1) bond0 and bond0.162 created manually outside RHEV-M(network restart)
2) installed server in RHEV-M setup successfully
3) attached VM vlan tagged network(162) to host via SN
4) copied relevant repo's(vt14.12) to server and run 'yum update', vdsm upgraded successfully, but regarding this BZ  1200467, needed to restart vdsmd service manually, operation was successful and no network configuration are broken.(refreshed capabilities) 
5) rebooted server with success. all network configuration saved and didn't broke. including the manually bond0 and bond0.162

I would like to perform the same test with rhev-H 6.6 that including this fix, before moving this bug to verified. In the origin i managed to reproduce this issue only with rhev-H.
Dan, do we have such rhev-H build with this fix?

Comment 2 Eyal Edri 2015-04-02 09:06:29 UTC
tolik - when do you plan to build rhev-h?

Comment 3 Michael Burman 2015-04-05 11:04:01 UTC
Verified and tested successfully with new rhev-hypervisor6-6.6-20150402.0.el6ev.noarch.rpm
that includes vdsm-4.16.13-1.el6ev.x86_64
vdsm upgrade from vdsm-4.14.18-6.el6ev >> vdsm-4.16.13-1.el6ev.x86_64
followed next steps:
1) bond0 and bond0.162 created manually outside RHEV-M - via TUI 
2) installed server in RHEV-M setup(vt14.2) successfully
3) attached VM vlan tagged network (162) to host via SN
4) downloaded and installed rhev-hypervisor6-6.6-20150402.0.el6ev.noarch.rpm in engine
5) put host to maintenance and run 'upgrade' via RHEV-M
6) host rebooted successfully. All network configuration saved and didn't broke. Including the manually(TUI) configured bond0 and bond0.162.

Comment 4 Michael Burman 2015-04-05 11:53:17 UTC
Looks like the fix for this bug, created another issue regarding to this. bond0 and bond0.162 persistent and nothing breaks, but non of the networks or nic's have BOOTPROTO= line in ifcfg files.

Comment 5 Michael Burman 2015-04-05 11:55:49 UTC
Moving back to Assigned instead of creating new BZ.

Comment 6 Dan Kenigsberg 2015-04-06 06:33:44 UTC
Does the network ever have BOOTPROTO line? Are they bridged?

Would you please attach fresh logs of the new effect? Include the generated ifcfg-* files and vdsm's own /var/lib/vdsm/persistence/netconf.

Comment 7 Michael Burman 2015-04-06 07:02:01 UTC
[root@navy-vds1 ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
;vdsmdummy;             8000.000000000000       no
net_bondash             8000.001018244afc       no              bond0.162
rhevm           8000.00145edd0924       no              eth2

Yes, network should have BOOTPROTO= line.

[root@navy-vds1 ~]# ls /var/lib/vdsm/persistence/netconf
bonds  nets
[root@navy-vds1 ~]# ls /var/lib/vdsm/persistence/netconf/nets/
net_bondash  rhevm
[root@navy-vds1 ~]# ls /var/lib/vdsm/persistence/netconf/bonds/
bond0

Note, that when testing this bug, after reboot, host didn't get ip, because BOOTPROTO= was missing in ifcfg-rhevm file, i added BOOTPROTO=dhcp manually.

Comment 8 Michael Burman 2015-04-06 07:04:39 UTC
Created attachment 1011272 [details]
vdsm log and generated ifcfg files

Comment 9 Dan Kenigsberg 2015-04-07 04:26:33 UTC
The interesting part is the content of net_bondash % rhevm files, as well as supervdsm.log during boot time. Can you supply them?

Comment 10 Michael Burman 2015-04-07 08:46:12 UTC
Hi Dan,

I run more tests for that and have several strange behaviors- 

- upgrade from rhev-h 6.6 3.4 >>3.5.1 (unsigned build- rhev-hypervisor6-6.6-20150402.0.el6ev)
BOOTPROTO= line exists after reboot, host is up. vdsm.log attached

- upgrade from rhev-h 6.6 3.5.0>>3.5.1(unsigned build -   rhev-hypervisor6-6.6-20150402.0.el6ev)
No ifcfg files for 'rhevm' and 'net_bondash1' after reboot. host is in non-responsive state, no ip off course. vdsm.log attached.

- upgrade from rhev-h 7.1 3.5.1(vt14.1) >> 3.5.1(vt14.2) unsigned build - rhev-hypervisor6-6.6-20150402.0.el6ev
Can't install server in RHEV-M
MainThread::ERROR::2015-04-07 08:34:45,251::vdsm::134::vds::(run) Exception raised
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 132, in run
    serve_clients(log)
  File "/usr/share/vdsm/vdsm", line 82, in serve_clients
    cif = clientIF.getInstance(irs, log)
  File "/usr/share/vdsm/clientIF.py", line 158, in getInstance
  File "/usr/share/vdsm/clientIF.py", line 112, in __init__
  File "/usr/share/vdsm/clientIF.py", line 162, in _createAcceptor
  File "/usr/share/vdsm/clientIF.py", line 173, in _createSSLContext
  File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 141, in __init__
  File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 166, in _initContext
  File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 145, in _loadCertChain
  File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Context.py", line 103, in load_cert_chain
SSLError: Permission denied

Dan,
we are investigating all of this, still not clear what is going on.

Comment 11 Michael Burman 2015-04-07 08:49:20 UTC
Created attachment 1011659 [details]
vdsm logs

Comment 12 Lior Vernia 2015-04-20 07:58:50 UTC
*** Bug 1209486 has been marked as a duplicate of this bug. ***

Comment 13 Lior Vernia 2015-04-20 08:08:08 UTC
To my understanding should be okay for upgrade 3.4.* --> 3.5.1, but needs release notes for 3.5.0 --> 3.5.1.

Dan, could you supply documentation what goes wrong during a 3.5.0 --> 3.5.1 upgrade, and how to work around it once it breaks?

Comment 14 Yaniv Lavi 2015-04-21 13:11:41 UTC
*** Bug 1209401 has been marked as a duplicate of this bug. ***

Comment 15 Marina Kalinin 2015-04-22 15:01:38 UTC
Dan, reading your release notes, can we refer to this kcs solution to address the workaround you are talking about ?
 https://access.redhat.com/solutions/1346443

Would it require additional steps?

Thanks!

Comment 16 Michael Burman 2015-04-27 12:25:06 UTC
Verified on - 3.5.1-0.4.el6ev


- rhev-h 6.6 3.4.z >> rhev-h 6.6 3.5.1
using the next builds:
rhev-h 6.6 3.4.z 20150123.1.el6ev >> rhev-h 6.6 3.5.1  20150421.0.el6ev

1) clean rhev-h 6.6 3.4.z 20150123.1.el6ev installed via USB
2) bond0.162 configured via TUI with dhcp
3) installed server in RHEV-M, rhevm network created on top of bond0
4) via SN attached network to other NIC

* Host is up after upgrade and reboot, all networks attached to server, rhevm got ip, host is up in RHEV-M and can be activated on 3.5 cluster.


- clean rhev-h 6.6 3.5.1  20150421.0.el6ev

1) clean rhev-h 6.6 3.5.1  20150421.0.el6ev installed via USB
2) bond0.162 configured via TUI with dhcp
3) installed server in RHEV-M, rhevm network created on top of bond0
4) via SN attached network to other NIC

* Host is up after reboot, all networks attached to server, rhevm got ip, host is up in RHEV-M.


- rhev-h 7.1 3.5.1  20150420.0.el7ev

1) clean rhev-h 7.1 3.5.1  20150420.0.el7ev installed via USB
2) bond0.162 configured via TUI with dhcp
3) installed server in RHEV-M, rhevm network created on top of bond0
4) via SN attached network to other NIC

* Host is up after reboot, all networks attached to server, rhevm got ip, host is up in RHEV-M.

Comment 17 Dan Kenigsberg 2015-04-28 11:27:21 UTC
(In reply to Marina from comment #15)

Frankly, I still do not understand why this bug bites us only on upgrade, and not during any reboot. Hence, only testing would tell if  https://access.redhat.com/solutions/1346443 is helpful for the upgrade case.

Comment 18 Eyal Edri 2015-05-27 07:19:17 UTC
was released as part of 3.5.1


Note You need to log in before you can comment on or make changes to this bug.