Bug 1277937

Summary: [RFE] [TEST CLONE BUG] - PLEASE IGNORE, FOR TESTING ONLY
Product: Red Hat Enterprise Virtualization Manager Reporter: Eyal Edri <eedri>
Component: vdsmAssignee: Eyal Edri <eedri>
Status: CLOSED NOTABUG QA Contact: Eyal Edri <eedri>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.6.3CC: amarchuk, rhev-integ
Target Milestone: ovirt-3.6.5Keywords: FutureFeature, Reopened, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: test
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1275371
: 1278046 1278077 1278307 1312351 1312400 (view as bug list) Environment:
Last Closed: 2016-02-26 15:57:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: External RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1278046, 1278077, 1278307, 1312351, 1312400    

Description Eyal Edri 2015-11-04 11:44:07 UTC
+++ This bug was initially created as a clone of Bug #1275371 +++

Description of problem:
Network management bridge was not created during the hosted-engine deployment.
Failed to deploy the HE on Red Hat Enterprise Virtualization Hypervisor release 7.1 (20151015.0.el7ev) over iSCSI because of failure in configuring the management bridge on host during deployment of the HE, bridge was not created and connectivity was lost to the host.
After restarting the host manually, bridge was created, but FQDN of the host became "localhost" instead of originally received from the DHCP.

Version-Release number of selected component (if applicable):
ovirt-node-plugin-rhn-3.2.3-23.el7.noarch
ovirt-host-deploy-offline-1.3.0-3.el7ev.x86_64
ovirt-node-branding-rhev-3.2.3-23.el7.noarch
ovirt-node-selinux-3.2.3-23.el7.noarch
ovirt-hosted-engine-ha-1.2.7.2-1.el7ev.noarch
ovirt-node-plugin-hosted-engine-0.2.0-18.0.el7ev.noarch
ovirt-node-plugin-cim-3.2.3-23.el7.noarch
ovirt-node-plugin-snmp-3.2.3-23.el7.noarch
ovirt-node-3.2.3-23.el7.noarch
ovirt-host-deploy-1.3.2-1.el7ev.noarch
ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch
ovirt-node-plugin-vdsm-0.2.0-26.el7ev.noarch
libvirt-1.2.8-16.el7_1.4.x86_64
mom-0.4.1-5.el7ev.noarch
qemu-kvm-rhev-2.1.2-23.el7_1.10.x86_64
vdsm-4.16.27-1.el7ev.x86_64
sanlock-3.2.2-2.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Install clean Red Hat Enterprise Virtualization Hypervisor release 7.1 (20151015.0.el7ev) on host.
2.Deploy HE over iSCSI via TUI.
3.

Actual results:
Management bridge not created and deployment failed, with customer being disconnected from host.

Expected results:
Deployment should pass.

Additional info:
See logs attached.

--- Additional comment from Nikolai Sednev on 2015-10-26 13:09 EDT ---



--- Additional comment from Sandro Bonazzola on 2015-10-27 08:07:58 EDT ---

The real reason for the failure is:
Traceback (most recent call last):
  File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 1136, in wrapper
  File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 554, in setupNetworks
  File "/usr/share/vdsm/API.py", line 1398, in setupNetworks
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
OSError: [Errno 16] Device or resource busy: '/etc/sysconfig/network-scripts/ifcfg-rhevm'

looks like a node issue

--- Additional comment from Ryan Barry on 2015-10-27 10:58:34 EDT ---

This is a node issue because vdsm is attempting to unlink a bind mounted file, but bind mounting is part and parcel of persistence, and the persistence of ifcfg files is directly handled by vdsm -- there's no clean way to handle this from the node side, since ifcfg-rhevm does not exist before hosted-engine-setup runs, and we don't have any kind of daemon monitoring for persistence or wrapping calls to unlink.

There are two problems:

MainProcess|Thread-17::INFO::2015-10-26 16:29:36,759::__init__::507::root.ovirt.node.utils.fs::(_persist_file) File "/etc/sysconfig/network-scripts/ifcfg-enp4s0" successfully persisted
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:36,761::utils::739::root::(execCmd) /usr/sbin/ifdown enp4s0 (cwd None)
sourceRoute::DEBUG::2015-10-26 16:29:36,879::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1445876976
sourceRoute::INFO::2015-10-26 16:29:36,880::sourceroutethread::60::root::(process_IN_CLOSE_WRITE_filePath) interface enp4s0 is not a libvirt interface
sourceRoute::WARNING::2015-10-26 16:29:36,880::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/enp4s0 already removed
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:36,967::utils::759::root::(execCmd) FAILED: <err> = 'bridge rhevm does not exist!\n'; <rc> = 1
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:36,967::utils::739::root::(execCmd) /usr/bin/systemd-run --scope --slice=vdsm-dhclient /usr/sbin/ifup enp4s0 (cwd None)
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:37,125::utils::759::root::(execCmd) SUCCESS: <err> = 'Running as unit run-18679.scope.\n'; <rc> = 0
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:37,125::utils::739::root::(execCmd) /usr/bin/systemd-run --scope --slice=vdsm-dhclient /usr/sbin/ifup rhevm (cwd None)
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:42,693::utils::759::root::(execCmd) FAILED: <err> = 'Running as unit run-18709.scope.\n'; <rc> = 1
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:42,694::utils::739::root::(execCmd) /usr/sbin/ifdown rhevm (cwd None)
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:43,078::utils::759::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:43,078::utils::739::root::(execCmd) /usr/sbin/ifdown enp4s0 (cwd None)
MainProcess|Thread-17::DEBUG::2015-10-26 16:29:43,525::utils::759::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-17::INFO::2015-10-26 16:29:43,525::ifcfg::332::root::(restoreAtomicNetworkBackup) Rolling back logical networks configuration (restoring atomic logical networks backup)
MainProcess|Thread-17::INFO::2015-10-26 16:29:43,525::ifcfg::372::root::(restoreAtomicBackup) Rolling back configuration (restoring atomic backup)
MainProcess|Thread-17::ERROR::2015-10-26 16:29:43,526::utils::132::root::(rmFile) Removing file: /etc/sysconfig/network-scripts/ifcfg-rhevm failed
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 126, in rmFile
OSError: [Errno 16] Device or resource busy: '/etc/sysconfig/network-scripts/ifcfg-rhevm'
MainProcess|Thread-17::ERROR::2015-10-26 16:29:43,526::supervdsmServer::106::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer", line 104, in wrapper
    res = func(*args, **kwargs)
  File "/usr/share/vdsm/supervdsmServer", line 224, in setupNetworks
    return setupNetworks(networks, bondings, **options)
  File "/usr/share/vdsm/network/api.py", line 696, in setupNetworks
  File "/usr/share/vdsm/network/configurators/__init__.py", line 54, in __exit__
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 75, in rollback
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 454, in restoreBackups
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 375, in restoreAtomicBackup
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 126, in rmFile
OSError: [Errno 16] Device or resource busy: '/etc/sysconfig/network-scripts/ifcfg-rhevm'

The first problem is that it looks like configuring the networks failed and VDSM attempted to roll back. hosted-engine-setup handles creating that bridge, and I haven't seen this issue with NFS hosted engines in recent testing.

Nikolai: does this work with other storage backends?

The second problem is that, even though vdsm.api.network checks whether it's running on a node and imports the node persistence functions, vdsm.network.configurators.ifcfg.ConfigWriter.restoreAtomicBackup directly uses utils.rmFile without checking whether it's running on node, and whether that file is persisted, which should be a different (low priority bug).

The problem is with configuring networks. Reassigning back.

--- Additional comment from Nikolai Sednev on 2015-10-27 12:52:28 EDT ---

I only tried this on iSCSI, not yet with the NFS.

--- Additional comment from Ying Cui on 2015-10-27 23:23:58 EDT ---

I encountered the bug 1270587 on RHEL 7.2 Host before.
Bug 1270587 - [hosted-engine-setup] Deployment fails in setup networks, 'bridged' is configured as 'True' by default by vdsm
Bug 1263311 - setupNetworks fails with a KeyError exception on 'bridged'
Version:
# rpm -qa kernel ovirt-hosted-engine-setup vdsm
kernel-3.10.0-322.el7.x86_64
ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch
vdsm-4.17.9-1.el7ev.noarch
kernel-3.10.0-320.el7.x86_64
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 Beta (Maipo)

See this bug description, I also can reproduce "failure in configuring the management bridge on RHEL 7.2 host during deployment of the HE, bridge was not created and connectivity was lost to the RHEL 7.2 host" (ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch, vdsm-4.17.9-1.el7ev.noarch), and I tried this with NFSv4. 

So it is not node specific issue.

--- Additional comment from Fabian Deutsch on 2015-11-02 03:50:45 EST ---

(In reply to Ryan Barry from comment #3)
…
> The second problem is that, even though vdsm.api.network checks whether it's
> running on a node and imports the node persistence functions,
> vdsm.network.configurators.ifcfg.ConfigWriter.restoreAtomicBackup directly
> uses utils.rmFile without checking whether it's running on node, and whether
> that file is persisted, which should be a different (low priority bug).

This should be fixed on the vdsm side, moving it over.

Comment 5 Sandro Bonazzola 2016-02-26 13:48:39 UTC
Reopened for testing purpose