Description of problem: vdsm service fails to start if dhclient is already running and one of vdsm networks is in down state Version-Release number of selected component (if applicable): Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150828.0.el6ev) vdsm-4.16.26-1.el6ev.x86_64 How reproducible: Steps to Reproduce: 1. Create network 'net1' and attach it to the host via setupnetworks 2. On the host ip link set down net1 3. restart vdsmd service Actual results: vdsm fails to restart/start Expected results: vdsm should be up after restart [root@localhost ~]# /etc/init.d/vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: Running run_final_hooks [ OK ] vdsm stop [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface em1: device em1 is already a member of a bridge; can't enslave it to bridge rhevm. [ OK ] Bringing up interface em2: device em2 is already a member of a bridge; can't enslave it to bridge net1. [ OK ] Bringing up interface net1: [ OK ] Bringing up interface rhevm: Determining IP information for rhevm...dhclient(20819) is already running - exiting. This version of ISC DHCP is based on the release available on ftp.isc.org. Features have been added and other changes have been made to the base software release in order to make it work better with this distribution. Please report for this software via the Red Hat Bugzilla site: http://bugzilla.redhat.com exiting. failed. [FAILED] RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists vdsm: Start dependent network [FAILED] vdsm start [FAILED]
Cat reproduce this issue on rhev-hypervisor6-6.7-20150828. Version-Release number of selected component (if applicable): vdsm-4.16.26-1.el6ev.x86_64 ovirt-node-3.2.3-20.el6.noarch rhev-hypervisor6-6.7-20150828 Test steps: 1. PXE install rhev-hypervisor6-6.7-20150828, configure eth2 as dhcp mode, then register to RHEV-M3.5.4 2. On RHEV-M web portal, the host status is up, then create eth0 and eth1 as bond0, create Network testnet0 and drag to bond0, create Network testnet1 and drag to eth3, save and reboot. 3. Drop into shell, run command # ip link set down testnet1 4. run command # /etc/init.d/vdsmd restart Test result: vdsm fails to restart. [root@localhost ~]# cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150828.0.el6ev) [root@localhost ~]# /etc/init.d/vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: not running [FAILED] vdsm: Running run_final_hooks vdsm stop [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface bond0: Device eth1 has different MAC address than expected, ignoring. device bond0 is already a member of a bridge; can't enslave it to bridge testnet0. [ OK ] Bringing up interface eth2: device eth2 is already a member of a bridge; can't enslave it to bridge rhevm. [ OK ] Bringing up interface eth3: device eth3 is already a member of a bridge; can't enslave it to bridge testnet1. [ OK ] Bringing up interface rhevm: Determining IP information for rhevm...dhclient(4248) is already running - exiting. This version of ISC DHCP is based on the release available on ftp.isc.org. Features have been added and other changes have been made to the base software release in order to make it work better with this distribution. Please report for this software via the Red Hat Bugzilla site: http://bugzilla.redhat.com exiting. failed. [FAILED] Bringing up interface testnet0: RTNETLINK answers: File exists [ OK ] Bringing up interface testnet1: [ OK ] RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists vdsm: Start dependent network [FAILED] vdsm start [FAILED]
Raising priority as we can now see it in RHEVH reinstall flows too.
To me this issue does not look RHEV-H specific, can this also bee seen on RHEL-H?
And: It looks like a manual step (ip link down) is required to produce this bug. Can the bug also be reproduced without doing the manual step?
- Tested with rhel 6.7 - vdsm-4.16.26-1.el6ev.x86_64 with same flow and got --> [root@rose01 ~]# /etc/init.d/vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: Running run_final_hooks [ OK ] vdsm stop [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: device eth0 is already a member of a bridge; can't enslave it to bridge rhevm. [ OK ] Bringing up interface eth1: device eth1 is already a member of a bridge; can't enslave it to bridge t1. [ OK ] Bringing up interface rhevm: Determining IP information for rhevm...dhclient(5226) is already running - exiting. This version of ISC DHCP is based on the release available on ftp.isc.org. Features have been added and other changes have been made to the base software release in order to make it work better with this distribution. Please report for this software via the Red Hat Bugzilla site: http://bugzilla.redhat.com exiting. failed. [FAILED] Bringing up interface t1: Determining IP information for t1... failed. [FAILED] RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists vdsm: Start dependent network [FAILED] vdsm start [FAILED] * But note, vdsmd is running actually --> although it failed to start, vdsmd starting to run after a minute or so. [root@rose01 ~]# /etc/init.d/vdsmd status VDS daemon server is running - Tested with rhev-h 6.7 20150826.0.el6ev and got the same, but please note, vdsmd is running --> root@localhost ~]# /etc/init.d/vdsm vdsmd vdsm-reg [root@localhost ~]# /etc/init.d/vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: Running run_final_hooks [ OK ] vdsm stop [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface bond0: Device eth3 has different MAC address than expected, ignoring. device bond0 is already a member of a bridge; can't enslave it to bridge t1. [ OK ] Bringing up interface eth1: device eth1 is already a member of a bridge; can't enslave it to bridge gggg. [ OK ] Bringing up interface eth2: device eth2 is already a member of a bridge; can't enslave it to bridge rhevm. [ OK ] Bringing up interface gggg: RTNETLINK answers: File exists [ OK ] Bringing up interface rhevm: Determining IP information for rhevm...dhclient(4368) is already running - exiting. This version of ISC DHCP is based on the release available on ftp.isc.org. Features have been added and other changes have been made to the base software release in order to make it work better with this distribution. Please report for this software via the Red Hat Bugzilla site: http://bugzilla.redhat.com exiting. failed. [FAILED] Bringing up interface t1: RTNETLINK answers: File exists [ OK ] RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists vdsm: Start dependent network [FAILED] vdsm start [FAILED] [root@localhost ~]# /etc/init.d/vdsmd status VDS daemon server is running * Hosts got UP in rhev-m after a minute
It's a nasty error, but the host works?
In our tests, VDSM was down when we had this error.
Dan can you please provide some info about the code that start network and if we need it now since we persist all ifcfg files
Tested this with a dummy interface with next steps : 1) rhel 6.7 with vdsm-4.16.26-1.el6ev.x86_64 2) Add to rhev-m rhevm-3.5.4.2-1.3.el6ev.noarch 3) Create dummy interface and attach network to him via Setup Networks 4) Reboot server Result --> [root@rose01 ~]# /etc/init.d/supervdsmd status Super VDSM daemon server is running [root@rose01 ~]# /etc/init.d/libvirtd status libvirtd (pid 6645) is running... [root@rose01 ~]# /etc/init.d/vdsmd status VDS daemon is not running - vdsm is not running and not coming up [root@rose01 ~]# brctl show bridge name bridge id STP enabled interfaces rhevm 8000.d4ae52c48f27 no eth0 t1 8000.d4ae52c48f28 no eth1 vdsm_net 8000.000000000000 no - dummy0 interface is gone ^^ - Host in non-responsive state in rhev-m. Attaching vdsm logs
Created attachment 1069627 [details] vdsm logs
Dummy interface is not a user flow, it's a automation flow right?
yes
Not considered a blocker, please add release note and workaround (if there is one, like stopping the service prior to setup-networks or bringing up the nic).
Andrew, please note to add this to 3.5.4 known issues.
How to reproduce it just with vdsm and command line: # PREPARE ENV phoracek ➜ 1 sudo vagrant init dliappis/centos65minlibvirt phoracek ➜ 1 sudo vagrant up phoracek ➜ 1 sudo vagrant ssh [vagrant@localhost ~]$ sudo yum -y update [vagrant@localhost ~]$ sudo shutdown -r now phoracek ➜ 1 sudo vagrant ssh [vagrant@localhost ~]$ sudo yum -y install http://resources.ovirt.org/pub/yum-repo/ovirt-release35.rpm [vagrant@localhost ~]$ sudo yum -y install vdsm vdsm [vagrant@localhost ~]$ cat /etc/redhat-release CentOS release 6.5 (Final) [vagrant@localhost ~]$ uname -r 2.6.32-431.el6.x86_64 [vagrant@localhost ~]$ yum info vdsm | grep 'Version\|Release' Version : 4.16.26 Release : 0.el6 [vagrant@localhost ~]$ sudo vdsm-tool configure --force [vagrant@localhost ~]$ sudo service vdsmd start # REPRODUCE [vagrant@localhost ~]$ sudo ip l add dummy_1 type dummy [vagrant@localhost ~]$ sudo python >>> from vdsm import vdscli >>> c = vdscli.connect() >>> c.setupNetworks({'net1': {'nic': 'eth0', 'bootproto': 'dhcp', 'blockingdhcp': True}}, {}, {'connectivityCheck': False}) >>> c.setupNetworks({'net2': {'nic': 'dummy_1'}}, {}, {'connectivityCheck': False}) [vagrant@localhost ~]$ sudo ip link set net2 down [vagrant@localhost ~]$ sudo /etc/init.d/vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: Running run_final_hooks [ OK ] vdsm stop [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface dummy_1: device dummy_1 is already a member of a bridge; can't enslave it to bridge net2. [ OK ] Bringing up interface eth0: device eth0 is already a member of a bridge; can't enslave it to bridge net1. [ OK ] Bringing up interface net1: Determining IP information for net1...dhclient(3397) is already running - exiting. This version of ISC DHCP is based on the release available on ftp.isc.org. Features have been added and other changes have been made to the base software release in order to make it work better with this distribution. Please report for this software via the CentOS Bugs Database: http://bugs.centos.org/ exiting. failed. [FAILED] Bringing up interface net2: [ OK ] RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists RTNETLINK answers: File exists vdsm: Start dependent network [FAILED] vdsm start
Was reproduced on non-node, so moving this to vdsm.
WA - is to kill dhclient process
It is easy to reproduce this by taking a vdsm-contolled interface down manually. HOWEVER, can anybody think of a case where it affects a host in a production environment? I'd rather touch this fragile code only if it affects real-life users.
We are not sure about such case, but it failing some of our automation tests.
In case we want to solve this bug: Attached patch successfully passed reproducer from Comment 16.
3.6 does not need this, as it has no support for sysv.
Moving 'requires_doc_text' to '-' in response to Comment 23, as there is now no requirement for a release note.