This service will be undergoing maintenance at 00:00 UTC, 2016-09-28. It is expected to last about 1 hours
Bug 880961 - 3.2 - [Upgrade] vdsm daemon not responding after upgrading from vdsm-4.9-113.4.el6_3 to vdsm-4.9.6-44.0.el6_3
3.2 - [Upgrade] vdsm daemon not responding after upgrading from vdsm-4.9-11...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.2.0
Unspecified Unspecified
urgent Severity high
: ---
: 3.2.0
Assigned To: Federico Simoncelli
Dafna Ron
storage
: ZStream
Depends On:
Blocks: 883498 896509 915537
  Show dependency treegraph
 
Reported: 2012-11-28 04:43 EST by Leonid Natapov
Modified: 2016-02-10 12:44 EST (History)
11 users (show)

See Also:
Fixed In Version: vdsm-4.10.2-2.0
Doc Type: Bug Fix
Doc Text:
Previously, the VDSM daemon was unresponsive after upgrading from vdsm-4.9-113.4.el6_3 to vdsm-4.9.6-44.0.el6_3. This was due to a race condition that existed when VDSM started up and when it restarted itself. A patch ensures that the VDSM daemon is responsive after upgrading.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-10 16:35:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
vdsm log (724.38 KB, application/x-gzip)
2012-11-28 05:03 EST, Leonid Natapov
no flags Details

  None (edit)
Description Leonid Natapov 2012-11-28 04:43:37 EST
Scenario:
Host with vdsm-4.9.6-44.0.el6_3.x86_64 was upgraded to  vdsm-4.9.6-44.0.el6_3.
(yum update vdsm). After upgrade vdsm daemon not responding to getVdsCapabilities. Getting connection refused from the daemon. VDSMD process is alive and service vdsmd status shows that vdsmd is running. libvird is also running. Full vdsm log attached.

here is the exception:
----------------------
Thread-17::INFO::2012-11-27 13:33:44,073::vm::604::vm.Vm::(_startUnderlyingVm) vmId=`098418b0-f484-4f2a-a371-f4ad1f4248da`::Skipping errors on recovery
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 597, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1396, in _run
    self._connection.lookupByUUIDString(self.id),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2682, in lookupByUUIDString
    if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self)
libvirtError: Cannot write data: Broken pipe
Thread-21::DEBUG::2012-11-27 13:33:44,073::task::588::TaskManager.Task::(_updateState) Task=`15d0ea2e-7c71-4568-a1c8-c5565ceae0d5`::moving from state init -> state preparing
Thread-14::DEBUG::2012-11-27 13:33:44,073::task::588::TaskManager.Task::(_updateState) Task=`8341ada6-d1ba-4e9e-92cf-4c70e1d5bf48`::moving from state preparing -> state finished
Thread-19::INFO::2012-11-27 13:33:44,074::vm::604::vm.Vm::(_startUnderlyingVm) vmId=`6ac9d011-0c0c-4945-a367-74fe5df5fd8d`::Skipping errors on recovery
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 597, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1396, in _run
    self._connection.lookupByUUIDString(self.id),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2682, in lookupByUUIDString
    if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self)
Comment 2 Leonid Natapov 2012-11-28 05:03:03 EST
Created attachment 653402 [details]
vdsm log
Comment 4 Leonid Natapov 2012-11-28 09:45:56 EST
yes. sorry. typo :( Changed the title .
Comment 8 Ayal Baron 2012-11-29 08:32:14 EST
From initial investigation this looks like a race between vdsm starting up and restarting itself (due to libvirt restart after libvirt being upgraded).
reboot / kill -9 should solve the issue.
Since this is during upgrade no VMs should be running on the machine anyway.
Comment 10 Federico Simoncelli 2012-11-30 12:29:59 EST
commit 9384951815875b72ae52d82f43dc610de2301e8a
Author: Federico Simoncelli <fsimonce@redhat.com>
Date:   Wed Nov 28 12:27:56 2012 -0500

    thread: daemonize HSM_MailMonitor and storageRefresh
    
    In order to successfully complete the prepareForShutdown command
    some additional threads have been switched to daemon.

http://gerrit.ovirt.org/#/c/9559/
Comment 11 Barak 2012-12-02 05:27:48 EST
looks like the above patch will not handle the upgrade scenario (in the old code the threads are not daemonized) 

Fede ?
Comment 12 Federico Simoncelli 2012-12-02 07:30:25 EST
(In reply to comment #11)
> looks like the above patch will not handle the upgrade scenario (in the old
> code the threads are not daemonized) 
> 
> Fede ?

That's what I thought too at the beginning (and that's why I didn't switch the bz to post earlier). Looking at the logs instead we see:

MainThread::INFO::2012-11-27 13:33:26,510::vdsm::70::vds::(run) I am the actual vdsm 4.9-44.0
[...]
Thread-21::ERROR::2012-11-27 13:33:44,060::libvirtconnection::93::vds::(wrapper) connection to libvirt broken. taking vdsm down.
[...]
MainThread::INFO::2012-11-27 13:33:44,897::vdsm::75::vds::(run) VDSM main thread ended. Waiting for 9 other threads...
MainThread::INFO::2012-11-27 13:33:44,898::vdsm::78::vds::(run) <_MainThread(MainThread, started 140425021429504)>
MainThread::INFO::2012-11-27 13:33:44,898::vdsm::78::vds::(run) <Thread(libvirtEventLoop, started daemon 140424934115072)>
MainThread::INFO::2012-11-27 13:33:44,898::vdsm::78::vds::(run) <WorkerThread(Thread-64, started daemon 140423955740416)>
MainThread::INFO::2012-11-27 13:33:44,899::vdsm::78::vds::(run) <WorkerThread(Thread-67, started daemon 140423903291136)>
MainThread::INFO::2012-11-27 13:33:44,899::vdsm::78::vds::(run) <WorkerThread(Thread-65, started daemon 140424383543040)>
MainThread::INFO::2012-11-27 13:33:44,899::vdsm::78::vds::(run) <Thread(Thread-43, started daemon 140423485978368)>
MainThread::INFO::2012-11-27 13:33:44,900::vdsm::78::vds::(run) <WorkerThread(Thread-66, started daemon 140423913780992)>
MainThread::INFO::2012-11-27 13:33:44,900::vdsm::78::vds::(run) <WorkerThread(Thread-68, started daemon 140423464998656)>
MainThread::INFO::2012-11-27 13:33:44,900::vdsm::78::vds::(run) <KsmMonitorThread(KsmMonitor, started daemon 140423945250560)>
MainThread::INFO::2012-11-27 13:33:44,901::vdsm::78::vds::(run) <Thread(Thread-12, started 140424373053184)>

This means that the vdsm shutting down (and stuck) was the new one.
Comment 14 Leonid Natapov 2012-12-17 03:47:09 EST
si25. vdsm-4.9.6-44.2.el6_3.x86_64. was unable to reproduce.
Comment 17 Leonid Natapov 2013-02-13 06:27:10 EST
sf6. vdsm-4.10.2-7.0.el6ev.x86_64 . looks ok.
Comment 19 Cheryn Tan 2013-04-03 03:02:16 EDT
This bug is currently attached to errata RHBA-2012:14332. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.

* Consequence: What happens when the bug presents.

* Fix: What was done to fix the bug.

* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes

Thanks in advance.
Comment 20 Federico Simoncelli 2013-06-07 05:54:07 EDT
Doc Text is correct.
Comment 22 errata-xmlrpc 2013-06-10 16:35:44 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html

Note You need to log in before you can comment on or make changes to this bug.