Bug 880961 - 3.2 - [Upgrade] vdsm daemon not responding after upgrading from vdsm-4.9-113.4.el6_3 to vdsm-4.9.6-44.0.el6_3
Summary: 3.2 - [Upgrade] vdsm daemon not responding after upgrading from vdsm-4.9-11...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.2.0
Assignee: Federico Simoncelli
QA Contact: Dafna Ron
URL:
Whiteboard: storage
Depends On:
Blocks: 883498 896509 915537
TreeView+ depends on / blocked
 
Reported: 2012-11-28 09:43 UTC by Leonid Natapov
Modified: 2022-07-09 05:51 UTC (History)
11 users (show)

Fixed In Version: vdsm-4.10.2-2.0
Doc Type: Bug Fix
Doc Text:
Previously, the VDSM daemon was unresponsive after upgrading from vdsm-4.9-113.4.el6_3 to vdsm-4.9.6-44.0.el6_3. This was due to a race condition that existed when VDSM started up and when it restarted itself. A patch ensures that the VDSM daemon is responsive after upgrading.
Clone Of:
Environment:
Last Closed: 2013-06-10 20:35:44 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm log (724.38 KB, application/x-gzip)
2012-11-28 10:03 UTC, Leonid Natapov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-47033 0 None None None 2022-07-09 05:51:42 UTC
Red Hat Product Errata RHSA-2013:0886 0 normal SHIPPED_LIVE Moderate: rhev 3.2 - vdsm security and bug fix update 2013-06-11 00:25:02 UTC

Description Leonid Natapov 2012-11-28 09:43:37 UTC
Scenario:
Host with vdsm-4.9.6-44.0.el6_3.x86_64 was upgraded to  vdsm-4.9.6-44.0.el6_3.
(yum update vdsm). After upgrade vdsm daemon not responding to getVdsCapabilities. Getting connection refused from the daemon. VDSMD process is alive and service vdsmd status shows that vdsmd is running. libvird is also running. Full vdsm log attached.

here is the exception:
----------------------
Thread-17::INFO::2012-11-27 13:33:44,073::vm::604::vm.Vm::(_startUnderlyingVm) vmId=`098418b0-f484-4f2a-a371-f4ad1f4248da`::Skipping errors on recovery
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 597, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1396, in _run
    self._connection.lookupByUUIDString(self.id),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2682, in lookupByUUIDString
    if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self)
libvirtError: Cannot write data: Broken pipe
Thread-21::DEBUG::2012-11-27 13:33:44,073::task::588::TaskManager.Task::(_updateState) Task=`15d0ea2e-7c71-4568-a1c8-c5565ceae0d5`::moving from state init -> state preparing
Thread-14::DEBUG::2012-11-27 13:33:44,073::task::588::TaskManager.Task::(_updateState) Task=`8341ada6-d1ba-4e9e-92cf-4c70e1d5bf48`::moving from state preparing -> state finished
Thread-19::INFO::2012-11-27 13:33:44,074::vm::604::vm.Vm::(_startUnderlyingVm) vmId=`6ac9d011-0c0c-4945-a367-74fe5df5fd8d`::Skipping errors on recovery
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 597, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1396, in _run
    self._connection.lookupByUUIDString(self.id),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2682, in lookupByUUIDString
    if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self)

Comment 2 Leonid Natapov 2012-11-28 10:03:03 UTC
Created attachment 653402 [details]
vdsm log

Comment 4 Leonid Natapov 2012-11-28 14:45:56 UTC
yes. sorry. typo :( Changed the title .

Comment 8 Ayal Baron 2012-11-29 13:32:14 UTC
From initial investigation this looks like a race between vdsm starting up and restarting itself (due to libvirt restart after libvirt being upgraded).
reboot / kill -9 should solve the issue.
Since this is during upgrade no VMs should be running on the machine anyway.

Comment 10 Federico Simoncelli 2012-11-30 17:29:59 UTC
commit 9384951815875b72ae52d82f43dc610de2301e8a
Author: Federico Simoncelli <fsimonce>
Date:   Wed Nov 28 12:27:56 2012 -0500

    thread: daemonize HSM_MailMonitor and storageRefresh
    
    In order to successfully complete the prepareForShutdown command
    some additional threads have been switched to daemon.

http://gerrit.ovirt.org/#/c/9559/

Comment 11 Barak 2012-12-02 10:27:48 UTC
looks like the above patch will not handle the upgrade scenario (in the old code the threads are not daemonized) 

Fede ?

Comment 12 Federico Simoncelli 2012-12-02 12:30:25 UTC
(In reply to comment #11)
> looks like the above patch will not handle the upgrade scenario (in the old
> code the threads are not daemonized) 
> 
> Fede ?

That's what I thought too at the beginning (and that's why I didn't switch the bz to post earlier). Looking at the logs instead we see:

MainThread::INFO::2012-11-27 13:33:26,510::vdsm::70::vds::(run) I am the actual vdsm 4.9-44.0
[...]
Thread-21::ERROR::2012-11-27 13:33:44,060::libvirtconnection::93::vds::(wrapper) connection to libvirt broken. taking vdsm down.
[...]
MainThread::INFO::2012-11-27 13:33:44,897::vdsm::75::vds::(run) VDSM main thread ended. Waiting for 9 other threads...
MainThread::INFO::2012-11-27 13:33:44,898::vdsm::78::vds::(run) <_MainThread(MainThread, started 140425021429504)>
MainThread::INFO::2012-11-27 13:33:44,898::vdsm::78::vds::(run) <Thread(libvirtEventLoop, started daemon 140424934115072)>
MainThread::INFO::2012-11-27 13:33:44,898::vdsm::78::vds::(run) <WorkerThread(Thread-64, started daemon 140423955740416)>
MainThread::INFO::2012-11-27 13:33:44,899::vdsm::78::vds::(run) <WorkerThread(Thread-67, started daemon 140423903291136)>
MainThread::INFO::2012-11-27 13:33:44,899::vdsm::78::vds::(run) <WorkerThread(Thread-65, started daemon 140424383543040)>
MainThread::INFO::2012-11-27 13:33:44,899::vdsm::78::vds::(run) <Thread(Thread-43, started daemon 140423485978368)>
MainThread::INFO::2012-11-27 13:33:44,900::vdsm::78::vds::(run) <WorkerThread(Thread-66, started daemon 140423913780992)>
MainThread::INFO::2012-11-27 13:33:44,900::vdsm::78::vds::(run) <WorkerThread(Thread-68, started daemon 140423464998656)>
MainThread::INFO::2012-11-27 13:33:44,900::vdsm::78::vds::(run) <KsmMonitorThread(KsmMonitor, started daemon 140423945250560)>
MainThread::INFO::2012-11-27 13:33:44,901::vdsm::78::vds::(run) <Thread(Thread-12, started 140424373053184)>

This means that the vdsm shutting down (and stuck) was the new one.

Comment 14 Leonid Natapov 2012-12-17 08:47:09 UTC
si25. vdsm-4.9.6-44.2.el6_3.x86_64. was unable to reproduce.

Comment 17 Leonid Natapov 2013-02-13 11:27:10 UTC
sf6. vdsm-4.10.2-7.0.el6ev.x86_64 . looks ok.

Comment 19 Cheryn Tan 2013-04-03 07:02:16 UTC
This bug is currently attached to errata RHBA-2012:14332. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.

* Consequence: What happens when the bug presents.

* Fix: What was done to fix the bug.

* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes

Thanks in advance.

Comment 20 Federico Simoncelli 2013-06-07 09:54:07 UTC
Doc Text is correct.

Comment 22 errata-xmlrpc 2013-06-10 20:35:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html


Note You need to log in before you can comment on or make changes to this bug.