Bug 1258551 - vdsm fails to start if dhclient is running
Summary: vdsm fails to start if dhclient is running
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.4
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ovirt-3.6.0-rc3
: 3.6.0
Assignee: Petr Horáček
QA Contact: Meni Yakove
URL:
Whiteboard: network
Depends On:
Blocks: 1267500
TreeView+ depends on / blocked
 
Reported: 2015-08-31 15:34 UTC by Meni Yakove
Modified: 2016-03-03 07:00 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1267500 (view as bug list)
Environment:
Last Closed: 2015-09-30 08:20:31 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm logs (469.15 KB, application/x-gzip)
2015-09-03 06:45 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 45787 0 ovirt-3.5 MERGED net: fix vdsmd service start 2020-04-23 10:29:21 UTC

Description Meni Yakove 2015-08-31 15:34:19 UTC
Description of problem:
vdsm service fails to start if dhclient is already running and one of vdsm networks is in down state

Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150828.0.el6ev)
vdsm-4.16.26-1.el6ev.x86_64

How reproducible:


Steps to Reproduce:
1. Create network 'net1' and attach it to the host via setupnetworks
2. On the host ip link set down net1
3. restart vdsmd service

Actual results:
vdsm fails to restart/start

Expected results:
vdsm should be up after restart





[root@localhost ~]# /etc/init.d/vdsmd restart
Shutting down vdsm daemon: 
vdsm watchdog stop                                         [  OK  ]
vdsm: Running run_final_hooks                              [  OK  ]
vdsm stop                                                  [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface em1:  device em1 is already a member of a bridge; can't enslave it to bridge rhevm.
                                                           [  OK  ]
Bringing up interface em2:  device em2 is already a member of a bridge; can't enslave it to bridge net1.
                                                           [  OK  ]
Bringing up interface net1:                                [  OK  ]
Bringing up interface rhevm:  
Determining IP information for rhevm...dhclient(20819) is already running - exiting. 

This version of ISC DHCP is based on the release available
on ftp.isc.org.  Features have been added and other changes
have been made to the base software release in order to make
it work better with this distribution.

Please report for this software via the Red Hat Bugzilla site:
    http://bugzilla.redhat.com

exiting.
 failed.
                                                           [FAILED]
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
vdsm: Start dependent network                              [FAILED]
vdsm start                                                 [FAILED]

Comment 2 Chaofeng Wu 2015-09-01 11:57:13 UTC
Cat reproduce this issue on rhev-hypervisor6-6.7-20150828.

Version-Release number of selected component (if applicable):
vdsm-4.16.26-1.el6ev.x86_64
ovirt-node-3.2.3-20.el6.noarch
rhev-hypervisor6-6.7-20150828

Test steps:
1. PXE install rhev-hypervisor6-6.7-20150828, configure eth2 as dhcp mode, then register to RHEV-M3.5.4
2. On RHEV-M web portal, the host status is up, then create eth0 and eth1 as bond0, create Network testnet0 and drag to bond0, create Network testnet1 and drag to eth3, save and reboot.
3. Drop into shell, run command # ip link set down testnet1
4. run command # /etc/init.d/vdsmd  restart

Test result:
vdsm fails to restart.

[root@localhost ~]# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150828.0.el6ev)

[root@localhost ~]# /etc/init.d/vdsmd  restart
Shutting down vdsm daemon: 
vdsm watchdog stop                                         [  OK  ]
vdsm: not running                                          [FAILED]
vdsm: Running run_final_hooks
vdsm stop                                                  [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface bond0:  Device eth1 has different MAC address than expected, ignoring.
device bond0 is already a member of a bridge; can't enslave it to bridge testnet0.
                                                           [  OK  ]
Bringing up interface eth2:  device eth2 is already a member of a bridge; can't enslave it to bridge rhevm.
                                                           [  OK  ]
Bringing up interface eth3:  device eth3 is already a member of a bridge; can't enslave it to bridge testnet1.
                                                           [  OK  ]
Bringing up interface rhevm:  
Determining IP information for rhevm...dhclient(4248) is already running - exiting. 

This version of ISC DHCP is based on the release available
on ftp.isc.org.  Features have been added and other changes
have been made to the base software release in order to make
it work better with this distribution.

Please report for this software via the Red Hat Bugzilla site:
    http://bugzilla.redhat.com

exiting.
 failed.
                                                           [FAILED]
Bringing up interface testnet0:  RTNETLINK answers: File exists
                                                           [  OK  ]
Bringing up interface testnet1:                            [  OK  ]
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
vdsm: Start dependent network                              [FAILED]
vdsm start                                                 [FAILED]

Comment 3 Gil Klein 2015-09-02 14:27:24 UTC
Raising priority as we can now see it in RHEVH reinstall flows too.

Comment 4 Fabian Deutsch 2015-09-02 14:40:45 UTC
To me this issue does not look RHEV-H specific, can this also bee seen on RHEL-H?

Comment 5 Fabian Deutsch 2015-09-02 14:44:41 UTC
And: It looks like a manual step (ip link down) is required to produce this bug.
Can the bug also be reproduced without doing the manual step?

Comment 6 Michael Burman 2015-09-02 15:20:24 UTC
- Tested with rhel 6.7 - vdsm-4.16.26-1.el6ev.x86_64
with same flow and got -->

[root@rose01 ~]# /etc/init.d/vdsmd restart
Shutting down vdsm daemon: 
vdsm watchdog stop                                         [  OK  ]
vdsm: Running run_final_hooks                              [  OK  ]
vdsm stop                                                  [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:  device eth0 is already a member of a bridge; can't enslave it to bridge rhevm.
                                                           [  OK  ]
Bringing up interface eth1:  device eth1 is already a member of a bridge; can't enslave it to bridge t1.
                                                           [  OK  ]
Bringing up interface rhevm:  
Determining IP information for rhevm...dhclient(5226) is already running - exiting. 

This version of ISC DHCP is based on the release available
on ftp.isc.org.  Features have been added and other changes
have been made to the base software release in order to make
it work better with this distribution.

Please report for this software via the Red Hat Bugzilla site:
    http://bugzilla.redhat.com

exiting.
 failed.
                                                           [FAILED]
Bringing up interface t1:  
Determining IP information for t1... failed.
                                                           [FAILED]
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
vdsm: Start dependent network                              [FAILED]
vdsm start                                                 [FAILED]

* But note, vdsmd is running actually --> although it failed to start, vdsmd starting to run after a minute or so.

[root@rose01 ~]# /etc/init.d/vdsmd status
VDS daemon server is running

- Tested with rhev-h 6.7 20150826.0.el6ev and got the same, but please note, vdsmd is running -->

root@localhost ~]# /etc/init.d/vdsm
vdsmd     vdsm-reg  
[root@localhost ~]# /etc/init.d/vdsmd restart
Shutting down vdsm daemon: 
vdsm watchdog stop                                         [  OK  ]
vdsm: Running run_final_hooks                              [  OK  ]
vdsm stop                                                  [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface bond0:  Device eth3 has different MAC address than expected, ignoring.
device bond0 is already a member of a bridge; can't enslave it to bridge t1.
                                                           [  OK  ]
Bringing up interface eth1:  device eth1 is already a member of a bridge; can't enslave it to bridge gggg.
                                                           [  OK  ]
Bringing up interface eth2:  device eth2 is already a member of a bridge; can't enslave it to bridge rhevm.
                                                           [  OK  ]
Bringing up interface gggg:  RTNETLINK answers: File exists
                                                           [  OK  ]
Bringing up interface rhevm:  
Determining IP information for rhevm...dhclient(4368) is already running - exiting. 

This version of ISC DHCP is based on the release available
on ftp.isc.org.  Features have been added and other changes
have been made to the base software release in order to make
it work better with this distribution.

Please report for this software via the Red Hat Bugzilla site:
    http://bugzilla.redhat.com

exiting.
 failed.
                                                           [FAILED]
Bringing up interface t1:  RTNETLINK answers: File exists
                                                           [  OK  ]
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
vdsm: Start dependent network                              [FAILED]
vdsm start                                                 [FAILED]

[root@localhost ~]# /etc/init.d/vdsmd status
VDS daemon server is running


* Hosts got UP in rhev-m after a minute

Comment 7 Yaniv Lavi 2015-09-02 15:39:33 UTC
It's a nasty error, but the host works?

Comment 8 Meni Yakove 2015-09-02 15:52:35 UTC
In our tests, VDSM was down when we had this error.

Comment 9 Meni Yakove 2015-09-02 16:02:54 UTC
Dan can you please provide some info about the code that start network and if we need it now since we persist all ifcfg files

Comment 10 Michael Burman 2015-09-03 06:44:56 UTC
Tested this with a dummy interface with next steps :

1) rhel 6.7 with vdsm-4.16.26-1.el6ev.x86_64
2) Add to rhev-m rhevm-3.5.4.2-1.3.el6ev.noarch
3) Create dummy interface and attach network to him via Setup Networks 
4) Reboot server 

Result -->

[root@rose01 ~]# /etc/init.d/supervdsmd status
Super VDSM daemon server is running
[root@rose01 ~]# /etc/init.d/libvirtd status
libvirtd (pid  6645) is running...
[root@rose01 ~]# /etc/init.d/vdsmd status
VDS daemon is not running

- vdsm is not running and not coming up 

[root@rose01 ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
rhevm           8000.d4ae52c48f27       no              eth0
t1              8000.d4ae52c48f28       no              eth1
vdsm_net                8000.000000000000       no

- dummy0 interface is gone ^^ 

- Host in non-responsive state in rhev-m.

Attaching vdsm logs

Comment 11 Michael Burman 2015-09-03 06:45:30 UTC
Created attachment 1069627 [details]
vdsm logs

Comment 12 Yaniv Lavi 2015-09-03 08:08:19 UTC
Dummy interface is not a user flow, it's a automation flow right?

Comment 13 Meni Yakove 2015-09-03 08:11:27 UTC
yes

Comment 14 Yaniv Lavi 2015-09-03 08:14:42 UTC
Not considered a blocker, please add release note and workaround (if there is one, like stopping the service prior to setup-networks or bringing up the nic).

Comment 15 Yaniv Lavi 2015-09-03 08:15:19 UTC
Andrew, please note to add this to 3.5.4 known issues.

Comment 16 Petr Horáček 2015-09-07 09:46:15 UTC
How to reproduce it just with vdsm and command line:

# PREPARE ENV
phoracek ➜  1  sudo vagrant init dliappis/centos65minlibvirt
phoracek ➜  1  sudo vagrant up
phoracek ➜  1  sudo vagrant ssh
[vagrant@localhost ~]$ sudo yum -y update
[vagrant@localhost ~]$ sudo shutdown -r now
phoracek ➜  1  sudo vagrant ssh
[vagrant@localhost ~]$ sudo yum -y install http://resources.ovirt.org/pub/yum-repo/ovirt-release35.rpm 
[vagrant@localhost ~]$ sudo yum -y install vdsm vdsm
[vagrant@localhost ~]$ cat /etc/redhat-release 
CentOS release 6.5 (Final)
[vagrant@localhost ~]$ uname -r
2.6.32-431.el6.x86_64
[vagrant@localhost ~]$ yum info vdsm | grep 'Version\|Release'
Version     : 4.16.26
Release     : 0.el6
[vagrant@localhost ~]$ sudo vdsm-tool configure --force
[vagrant@localhost ~]$ sudo service vdsmd start

# REPRODUCE
[vagrant@localhost ~]$ sudo ip l add dummy_1 type dummy
[vagrant@localhost ~]$ sudo python
>>> from vdsm import vdscli
>>> c = vdscli.connect()
>>> c.setupNetworks({'net1': {'nic': 'eth0', 'bootproto': 'dhcp', 'blockingdhcp': True}}, {}, {'connectivityCheck': False})
>>> c.setupNetworks({'net2': {'nic': 'dummy_1'}}, {}, {'connectivityCheck': False})
[vagrant@localhost ~]$ sudo ip link set net2 down
[vagrant@localhost ~]$ sudo /etc/init.d/vdsmd restart
Shutting down vdsm daemon: 
vdsm watchdog stop                                         [  OK  ]
vdsm: Running run_final_hooks                              [  OK  ]
vdsm stop                                                  [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface dummy_1:  device dummy_1 is already a member of a bridge; can't enslave it to bridge net2.
                                                           [  OK  ]
Bringing up interface eth0:  device eth0 is already a member of a bridge; can't enslave it to bridge net1.
                                                           [  OK  ]
Bringing up interface net1:  
Determining IP information for net1...dhclient(3397) is already running - exiting. 

This version of ISC DHCP is based on the release available
on ftp.isc.org.  Features have been added and other changes
have been made to the base software release in order to make
it work better with this distribution.

Please report for this software via the CentOS Bugs Database:
    http://bugs.centos.org/

exiting.
 failed.
                                                           [FAILED]
Bringing up interface net2:                                [  OK  ]
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
vdsm: Start dependent network                              [FAILED]
vdsm start

Comment 17 Fabian Deutsch 2015-09-07 09:51:25 UTC
Was reproduced on non-node, so moving this to vdsm.

Comment 18 Michael Burman 2015-09-16 08:41:26 UTC
WA - is to kill dhclient process

Comment 19 Dan Kenigsberg 2015-09-16 09:51:22 UTC
It is easy to reproduce this by taking a vdsm-contolled interface down manually. HOWEVER, can anybody think of a case where it affects a host in a production environment?

I'd rather touch this fragile code only if it affects real-life users.

Comment 20 Michael Burman 2015-09-16 12:26:35 UTC
We are not sure about such case, but it failing some of our automation tests.

Comment 21 Petr Horáček 2015-09-18 10:03:39 UTC
In case we want to solve this bug: Attached patch successfully passed reproducer from Comment 16.

Comment 23 Dan Kenigsberg 2015-09-30 08:20:31 UTC
3.6 does not need this, as it has no support for sysv.

Comment 24 Lucy Bopf 2016-02-19 07:02:15 UTC
Moving 'requires_doc_text' to '-' in response to Comment 23, as there is now no requirement for a release note.


Note You need to log in before you can comment on or make changes to this bug.