Bug 1165203

Summary: [rhevh66] vdsm does not come up after first reboot after registration
Product: Red Hat Enterprise Virtualization Manager Reporter: Fabian Deutsch <fdeutsch>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED ERRATA QA Contact: Jiri Belka <jbelka>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.0CC: adahms, bazulay, danken, dfediuck, dougsland, ecohen, fabian.deutsch, fdeutsch, gklein, iheim, istein, leiwang, lpeer, lsurette, lvernia, myakove, nyechiel, oourfali, sbonazzo, ybronhei, ycui, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: vdsm-4.16.8.1-6.el6ev Doc Type: Known Issue
Doc Text:
The vdsmd service is not started on Red Hat Enterprise Virtualization Hypervisor 6.6 hosts after the first reboot after registering the Red Hat Enterprise Virtualization Hypervisor 6.6 host with a Red Hat Enterprise Virtualization Manager. This results in Red Hat Enterprise Virtualization Hypervisor 6.6 hosts never being displayed as up. As a workaround, configure networking on the Red Hat Enterprise Virtualization Hypervisor 6.6 host, and start the libvirtd and vdsmd services manually.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-15 09:14:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1164308, 1164311    
Attachments:
Description Flags
RHEV-H side logs
none
logs from right after the registration
none
logs after the first boot none

Description Fabian Deutsch 2014-11-18 14:45:20 UTC
Description of problem:
vdsm and libvirt do not come up after reboot

Version-Release number of selected component (if applicable):
http://download.devel.redhat.com/brewroot/work/tasks/793/8250793/rhev-hypervisor6-6.6-20141114.0.iso

How reproducible:
always

Steps to Reproduce:
1. install 3.5 rhevm
2. install 3.5 6.6 rhevh from above
3. configure netzworkin on rhevh, set password on rhevm page
4. add/register rhevh from rhevm side
5. put rhevh into maintenance mode
6. reboot rhevh from tui

Actual results:
After the reboot vdsm and libvirt are down.
host is down in RHEV-M

Expected results:
Host is up in RHEV-M

Additional info:

Comment 1 Fabian Deutsch 2014-11-18 14:46:20 UTC
This does not happen on the 7.0 build.

Comment 2 Fabian Deutsch 2014-11-18 14:50:57 UTC
Created attachment 958612 [details]
RHEV-H side logs

Comment 3 Douglas Schilling Landgraf 2014-11-19 05:27:48 UTC
Hi Fabian,

I couldn't reproduce the report. However, I see a few complains about this environment from vdsm logs you provided.

Thread-28::ERROR::2014-11-18 14:21:35,201::API::1692::vds::(_getHaInfo) failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory: '/etc/ovirt-hosted-engine/hosted-engine.conf''Is the Hosted Engine setup finished?

Thread-30::INFO::2014-11-18 14:21:36,214::logUtils::44::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID=u'00000002-0002-0002-0002-0000000002bb', options=None)
Thread-30::ERROR::2014-11-18 14:21:36,214::task::866::Storage.TaskManager.Task::(_setError) Task=`52ef1f68-60c7-429f-89b5-ab8f4486e641`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
  File "/usr/share/vdsm/storage/hsm.py", line 609, in getSpmStatus
  File "/usr/share/vdsm/storage/hsm.py", line 325, in getPool
StoragePoolUnknown: Unknown pool id, pool not connected: (u'00000002-0002-0002-0002-0000000002bb',)


Have you re-used your rhev-h instance or have you configured it as hosted-engine/storage anytime? Adding Sandro and Dan in CC too.

Comment 4 Fabian Deutsch 2014-11-19 08:43:08 UTC
It might have been a single time. I also could not reproduce it anymore.

I might have also been to unpatient. It took a while for the nodes to come up.

(That's where I wanted to leave these lines, but 1152916 comment 8).

Comment 5 Fabian Deutsch 2015-01-14 20:39:08 UTC
I've met this bug again.

vdsm does not come up because libvirt des not come up, libvirt does not come up because it requires a configured network.

Comment 6 Fabian Deutsch 2015-01-14 20:43:06 UTC
A workaround is to
1. run dhclient to aquire an ip
2. start libvirtd manually
3. start vdsmd manually

Now and on all subsequent reboots vdsm will come up correct (as far as I can tell).

Comment 7 Fabian Deutsch 2015-01-14 22:16:45 UTC
Created attachment 980216 [details]
logs from right after the registration

This attachement shows the logs and configuration right after the registration when vdsmd is up.

Comment 8 Fabian Deutsch 2015-01-14 22:18:22 UTC
Created attachment 980217 [details]
logs after the first boot

This attachement shows the logs and configuration right after the first reboot after registration

Comment 9 Fabian Deutsch 2015-01-15 05:01:33 UTC
Please note that the filenames of the attachements are wrong, the files are xz compressed, not bz2.

Comment 10 Ilanit Stein 2015-01-15 15:33:19 UTC
It didn't reproduce with these version & flow:

Tested with rhevm 3.5 vt13.6, rhev-hypervisor6-6.6-20140114.0
(rhev-h was vdsm-upgraded from rhev-hypervisor6-6.6-20141218.0)
1. Add rhev-h to rhevm
2. Put host in maintenance.
3. restart rhev-h via TUI
4. Activate rhev-h  => became active.

Comment 11 Fabian Deutsch 2015-01-15 15:52:37 UTC
Reducing the priority because the reproducability seems to be low.

Comment 18 Fabian Deutsch 2015-02-10 09:36:13 UTC
Will this bug be fixed on 3.5.0? Then we should clear the doctext flag.

Comment 19 Jiri Belka 2015-02-10 11:58:27 UTC
ok, RHEV Hypervisor - 6.6 - 20150128.0.el6ev

Comment 20 Dan Kenigsberg 2015-02-10 12:56:25 UTC
Fixed in rhev-3.5.0 https://gerrit.eng.lab.tlv.redhat.com/15482 hence does not require a release note.

Comment 21 Andrew Dahms 2015-02-10 13:02:37 UTC
Thanks for the confirmation - dropped the known issue explanation from the advisory to reflect the '-' here.

Comment 23 Eyal Edri 2015-02-15 09:14:47 UTC
bugs were moved by ERRATA to RELEASE PENDING bug not closed probably due to errata error.
closing as 3.5.0 is released.