+++ This bug was initially created as a clone of Bug #1275371 +++ Description of problem: Network management bridge was not created during the hosted-engine deployment. Failed to deploy the HE on Red Hat Enterprise Virtualization Hypervisor release 7.1 (20151015.0.el7ev) over iSCSI because of failure in configuring the management bridge on host during deployment of the HE, bridge was not created and connectivity was lost to the host. After restarting the host manually, bridge was created, but FQDN of the host became "localhost" instead of originally received from the DHCP. Version-Release number of selected component (if applicable): ovirt-node-plugin-rhn-3.2.3-23.el7.noarch ovirt-host-deploy-offline-1.3.0-3.el7ev.x86_64 ovirt-node-branding-rhev-3.2.3-23.el7.noarch ovirt-node-selinux-3.2.3-23.el7.noarch ovirt-hosted-engine-ha-1.2.7.2-1.el7ev.noarch ovirt-node-plugin-hosted-engine-0.2.0-18.0.el7ev.noarch ovirt-node-plugin-cim-3.2.3-23.el7.noarch ovirt-node-plugin-snmp-3.2.3-23.el7.noarch ovirt-node-3.2.3-23.el7.noarch ovirt-host-deploy-1.3.2-1.el7ev.noarch ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch ovirt-node-plugin-vdsm-0.2.0-26.el7ev.noarch libvirt-1.2.8-16.el7_1.4.x86_64 mom-0.4.1-5.el7ev.noarch qemu-kvm-rhev-2.1.2-23.el7_1.10.x86_64 vdsm-4.16.27-1.el7ev.x86_64 sanlock-3.2.2-2.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1.Install clean Red Hat Enterprise Virtualization Hypervisor release 7.1 (20151015.0.el7ev) on host. 2.Deploy HE over iSCSI via TUI. 3. Actual results: Management bridge not created and deployment failed, with customer being disconnected from host. Expected results: Deployment should pass. Additional info: See logs attached. --- Additional comment from Nikolai Sednev on 2015-10-26 13:09 EDT --- --- Additional comment from Sandro Bonazzola on 2015-10-27 08:07:58 EDT --- The real reason for the failure is: Traceback (most recent call last): File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 1136, in wrapper File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 554, in setupNetworks File "/usr/share/vdsm/API.py", line 1398, in setupNetworks File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> File "<string>", line 2, in setupNetworks File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod OSError: [Errno 16] Device or resource busy: '/etc/sysconfig/network-scripts/ifcfg-rhevm' looks like a node issue --- Additional comment from Ryan Barry on 2015-10-27 10:58:34 EDT --- This is a node issue because vdsm is attempting to unlink a bind mounted file, but bind mounting is part and parcel of persistence, and the persistence of ifcfg files is directly handled by vdsm -- there's no clean way to handle this from the node side, since ifcfg-rhevm does not exist before hosted-engine-setup runs, and we don't have any kind of daemon monitoring for persistence or wrapping calls to unlink. There are two problems: MainProcess|Thread-17::INFO::2015-10-26 16:29:36,759::__init__::507::root.ovirt.node.utils.fs::(_persist_file) File "/etc/sysconfig/network-scripts/ifcfg-enp4s0" successfully persisted MainProcess|Thread-17::DEBUG::2015-10-26 16:29:36,761::utils::739::root::(execCmd) /usr/sbin/ifdown enp4s0 (cwd None) sourceRoute::DEBUG::2015-10-26 16:29:36,879::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1445876976 sourceRoute::INFO::2015-10-26 16:29:36,880::sourceroutethread::60::root::(process_IN_CLOSE_WRITE_filePath) interface enp4s0 is not a libvirt interface sourceRoute::WARNING::2015-10-26 16:29:36,880::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/enp4s0 already removed MainProcess|Thread-17::DEBUG::2015-10-26 16:29:36,967::utils::759::root::(execCmd) FAILED: <err> = 'bridge rhevm does not exist!\n'; <rc> = 1 MainProcess|Thread-17::DEBUG::2015-10-26 16:29:36,967::utils::739::root::(execCmd) /usr/bin/systemd-run --scope --slice=vdsm-dhclient /usr/sbin/ifup enp4s0 (cwd None) MainProcess|Thread-17::DEBUG::2015-10-26 16:29:37,125::utils::759::root::(execCmd) SUCCESS: <err> = 'Running as unit run-18679.scope.\n'; <rc> = 0 MainProcess|Thread-17::DEBUG::2015-10-26 16:29:37,125::utils::739::root::(execCmd) /usr/bin/systemd-run --scope --slice=vdsm-dhclient /usr/sbin/ifup rhevm (cwd None) MainProcess|Thread-17::DEBUG::2015-10-26 16:29:42,693::utils::759::root::(execCmd) FAILED: <err> = 'Running as unit run-18709.scope.\n'; <rc> = 1 MainProcess|Thread-17::DEBUG::2015-10-26 16:29:42,694::utils::739::root::(execCmd) /usr/sbin/ifdown rhevm (cwd None) MainProcess|Thread-17::DEBUG::2015-10-26 16:29:43,078::utils::759::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|Thread-17::DEBUG::2015-10-26 16:29:43,078::utils::739::root::(execCmd) /usr/sbin/ifdown enp4s0 (cwd None) MainProcess|Thread-17::DEBUG::2015-10-26 16:29:43,525::utils::759::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|Thread-17::INFO::2015-10-26 16:29:43,525::ifcfg::332::root::(restoreAtomicNetworkBackup) Rolling back logical networks configuration (restoring atomic logical networks backup) MainProcess|Thread-17::INFO::2015-10-26 16:29:43,525::ifcfg::372::root::(restoreAtomicBackup) Rolling back configuration (restoring atomic backup) MainProcess|Thread-17::ERROR::2015-10-26 16:29:43,526::utils::132::root::(rmFile) Removing file: /etc/sysconfig/network-scripts/ifcfg-rhevm failed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 126, in rmFile OSError: [Errno 16] Device or resource busy: '/etc/sysconfig/network-scripts/ifcfg-rhevm' MainProcess|Thread-17::ERROR::2015-10-26 16:29:43,526::supervdsmServer::106::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks Traceback (most recent call last): File "/usr/share/vdsm/supervdsmServer", line 104, in wrapper res = func(*args, **kwargs) File "/usr/share/vdsm/supervdsmServer", line 224, in setupNetworks return setupNetworks(networks, bondings, **options) File "/usr/share/vdsm/network/api.py", line 696, in setupNetworks File "/usr/share/vdsm/network/configurators/__init__.py", line 54, in __exit__ File "/usr/share/vdsm/network/configurators/ifcfg.py", line 75, in rollback File "/usr/share/vdsm/network/configurators/ifcfg.py", line 454, in restoreBackups File "/usr/share/vdsm/network/configurators/ifcfg.py", line 375, in restoreAtomicBackup File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 126, in rmFile OSError: [Errno 16] Device or resource busy: '/etc/sysconfig/network-scripts/ifcfg-rhevm' The first problem is that it looks like configuring the networks failed and VDSM attempted to roll back. hosted-engine-setup handles creating that bridge, and I haven't seen this issue with NFS hosted engines in recent testing. Nikolai: does this work with other storage backends? The second problem is that, even though vdsm.api.network checks whether it's running on a node and imports the node persistence functions, vdsm.network.configurators.ifcfg.ConfigWriter.restoreAtomicBackup directly uses utils.rmFile without checking whether it's running on node, and whether that file is persisted, which should be a different (low priority bug). The problem is with configuring networks. Reassigning back. --- Additional comment from Nikolai Sednev on 2015-10-27 12:52:28 EDT --- I only tried this on iSCSI, not yet with the NFS. --- Additional comment from Ying Cui on 2015-10-27 23:23:58 EDT --- I encountered the bug 1270587 on RHEL 7.2 Host before. Bug 1270587 - [hosted-engine-setup] Deployment fails in setup networks, 'bridged' is configured as 'True' by default by vdsm Bug 1263311 - setupNetworks fails with a KeyError exception on 'bridged' Version: # rpm -qa kernel ovirt-hosted-engine-setup vdsm kernel-3.10.0-322.el7.x86_64 ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch vdsm-4.17.9-1.el7ev.noarch kernel-3.10.0-320.el7.x86_64 # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 Beta (Maipo) See this bug description, I also can reproduce "failure in configuring the management bridge on RHEL 7.2 host during deployment of the HE, bridge was not created and connectivity was lost to the RHEL 7.2 host" (ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch, vdsm-4.17.9-1.el7ev.noarch), and I tried this with NFSv4. So it is not node specific issue. --- Additional comment from Fabian Deutsch on 2015-11-02 03:50:45 EST --- (In reply to Ryan Barry from comment #3) … > The second problem is that, even though vdsm.api.network checks whether it's > running on a node and imports the node persistence functions, > vdsm.network.configurators.ifcfg.ConfigWriter.restoreAtomicBackup directly > uses utils.rmFile without checking whether it's running on node, and whether > that file is persisted, which should be a different (low priority bug). This should be fixed on the vdsm side, moving it over.
Reopened for testing purpose