Bug 723579 - [vdsm] PrepareForShutdown may hang forever when single vm is running and restarting libvirtd.
Summary: [vdsm] PrepareForShutdown may hang forever when single vm is running and rest...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Saggi Mizrahi
QA Contact: David Naori
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-20 15:51 UTC by David Naori
Modified: 2011-12-06 07:31 UTC (History)
11 users (show)

Fixed In Version: vdsm-4.9-98.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 07:31:55 UTC
Target Upstream Version:


Attachments (Terms of Use)
log (579.80 KB, application/x-xz)
2011-07-20 15:51 UTC, David Naori
no flags Details
vdsm.log (355.47 KB, application/x-gzip)
2011-09-25 14:48 UTC, Daniel Paikov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:1782 0 normal SHIPPED_LIVE new packages: vdsm 2011-12-06 11:55:51 UTC

Description David Naori 2011-07-20 15:51:43 UTC
Created attachment 514045 [details]
log

Description of problem:
 PrepareForShutdown may hang forever when single vm is running and restarting libvirtd.

Version-Release number of selected component (if applicable):
vdsm-4.9-83.el6.x86_64

How reproducible:


Steps to Reproduce:
you can run ruth test vmTests.LibvirtConnectionTests.test:
or: 
1) run vm (single host spm on iscsi sd)
2) restart libvirtd

Actual results:
prepareForShutdown hangs forever...

Expected results:


Additional info:
full log attached

Comment 3 Federico Simoncelli 2011-08-03 08:35:09 UTC
What was the libvirt version that you used?
vmTests.LibvirtConnectionTests.test is now passing with:

vdsm-4.9-87.el6.x86_64
libvirt-0.9.4-0rc1.2.el6.x86_64

Is this still reproducible for you?

Comment 4 David Naori 2011-08-03 14:39:05 UTC
(In reply to comment #3)
> What was the libvirt version that you used?
> vmTests.LibvirtConnectionTests.test is now passing with:
> 
> vdsm-4.9-87.el6.x86_64
> libvirt-0.9.4-0rc1.2.el6.x86_64
> 
> Is this still reproducible for you?

yes maybe not with this ruth test, but i can reproduce it.

Comment 5 Federico Simoncelli 2011-08-03 15:09:52 UTC
commit 3e5874b6d5171ac1480cf22133775c032bb28531
Author: Federico Simoncelli <fsimonce>
Date:   Wed Aug 3 14:58:57 2011 +0000

    BZ#723579 AdvancedStatsThread should expose stop()
    
    The VmStatsThread wasn't exposing the stop method and therefore
    it was never stopped during prepareForShutdown.
    Even if it was declared as 'daemon' thread it kept the rest of
    the application running satisfying its requests.
    
    Change-Id: If8501b5286394e4e862e4bae965431cd395101fd

http://gerrit.usersys.redhat.com/784

Comment 7 David Naori 2011-08-29 08:27:20 UTC
Still happens in some scenarios:
single host, spm.
restart libvirtd, try to run vm:


vdsm-4.9-96.el6.x86_64

Thread-61859::ERROR::2011-08-29 10:01:53,067::libvirtconnection::73::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-61859::INFO::2011-08-29 10:01:53,068::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, args: ()
Thread-61859::DEBUG::2011-08-29 10:01:53,069::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: moving from state init -> state preparing
Thread-61859::DEBUG::2011-08-29 10:01:53,070::spm::199::Storage.SPM::(__cleanupMasterMount) unmounting /rhev/data-center/mnt/blockSD/078f4e60-a883-4d68-9c30-c35d7ae014d8/master
MainThread::INFO::2011-08-29 10:01:53,076::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 28 other threads...
MainThread::INFO::2011-08-29 10:01:53,077::vdsm::79::vds::(run) <_MainThread(MainThread, started 139966079985408)>
MainThread::INFO::2011-08-29 10:01:53,079::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 139965987526400)>
MainThread::INFO::2011-08-29 10:01:53,080::vdsm::79::vds::(run) <WorkerThread(Thread-4, started 139965736417024)>
MainThread::INFO::2011-08-29 10:01:53,080::vdsm::79::vds::(run) <KsmMonitorThread(KsmMonitor, started daemon 139965235185408)>
MainThread::INFO::2011-08-29 10:01:53,081::vdsm::79::vds::(run) <Thread(Thread-61859, started 139963913987840)>
MainThread::INFO::2011-08-29 10:01:53,081::vdsm::79::vds::(run) <WorkerThread(Thread-79, started 139964285171456)>
MainThread::INFO::2011-08-29 10:01:53,082::vdsm::79::vds::(run) <WorkerThread(Thread-9, started 139965683967744)>
MainThread::INFO::2011-08-29 10:01:53,083::vdsm::79::vds::(run) <StatsThread(Thread-15, started 139965224695552)>
MainThread::INFO::2011-08-29 10:01:53,083::vdsm::79::vds::(run) <WorkerThread(Thread-76, started 139964316641024)>
MainThread::INFO::2011-08-29 10:01:53,084::vdsm::79::vds::(run) <WorkerThread(4ad0f804-fbff-4ca1-a674-a1e6b98f58d1, started 139965704947456)>
MainThread::INFO::2011-08-29 10:01:53,085::vdsm::79::vds::(run) <WorkerThread(ea0a2b34-50e9-4939-b11c-39075ce1efb7, started 139965715437312)>
MainThread::INFO::2011-08-29 10:01:53,085::vdsm::79::vds::(run) <WorkerThread(Thread-75, started 139965256165120)>
MainThread::INFO::2011-08-29 10:01:53,086::vdsm::79::vds::(run) <WorkerThread(a11ff014-7fa2-4eb6-9aec-38b3e13df563, started 139965266654976)>
MainThread::INFO::2011-08-29 10:01:53,087::vdsm::79::vds::(run) <WorkerThread(Thread-26, started 139964748531456)>
MainThread::INFO::2011-08-29 10:01:53,087::vdsm::79::vds::(run) <WorkerThread(Thread-25, started 139964759021312)>
MainThread::INFO::2011-08-29 10:01:53,088::vdsm::79::vds::(run) <WorkerThread(Thread-8, started 139965694457600)>
MainThread::INFO::2011-08-29 10:01:53,088::vdsm::79::vds::(run) <WorkerThread(Thread-22, started 139964769511168)>
MainThread::INFO::2011-08-29 10:01:53,089::vdsm::79::vds::(run) <WorkerThread(Thread-23, started 139964780001024)>
MainThread::INFO::2011-08-29 10:01:53,090::vdsm::79::vds::(run) <WorkerThread(Thread-24, started 139964790490880)>
MainThread::INFO::2011-08-29 10:01:53,090::vdsm::79::vds::(run) <WorkerThread(Thread-78, started 139964295661312)>
MainThread::INFO::2011-08-29 10:01:53,091::vdsm::79::vds::(run) <HSM_MailMonitor(Thread-27, started 139964327130880)>
MainThread::INFO::2011-08-29 10:01:53,092::vdsm::79::vds::(run) <WorkerThread(Thread-77, started 139964306151168)>
MainThread::INFO::2011-08-29 10:01:53,092::vdsm::79::vds::(run) <Thread(Thread-61864, stopped 139965245675264)>
MainThread::INFO::2011-08-29 10:01:53,093::vdsm::79::vds::(run) <StatsThread(Thread-16, started 139965214205696)>
MainThread::INFO::2011-08-29 10:01:53,094::vdsm::79::vds::(run) <WorkerThread(Thread-5, started 139965725927168)>
MainThread::INFO::2011-08-29 10:01:53,094::vdsm::79::vds::(run) <_DummyThread(Dummy-80, started daemon 139964274681600)>
<WorkerThread(9acc2b95-dbf2-4180-b0f1-e5deb5936ead, started 139965974914816)>
MainThread::INFO::2011-08-29 10:01:53,096::vdsm::79::vds::(run) <WorkerThread(5c1e9dbc-b154-44ae-ae7c-cc038b88ec37, started 139965953935104)>
MainThread::INFO::2011-08-29 10:01:53,096::vdsm::79::vds::(run) <WorkerThread(8fa890b3-c210-4a6a-bf04-13d7e34373a0, started 139965964424960)>
Thread-61859::DEBUG::2011-08-29 10:01:53,135::fileUtils::112::Storage.Misc.excCmd::(umount) '/usr/bin/sudo -n /bin/umount -f /rhev/data-center/mnt/blockSD/078f4e60-a883-4d68-9c30-c35d7ae014d8/master' (cwd None)
Thread-61859::DEBUG::2011-08-29 10:01:53,169::fileUtils::112::Storage.Misc.excCmd::(umount) SUCCESS: <err> = ''; <rc> = 0
Thread-61859::DEBUG::2011-08-29 10:01:53,171::spm::205::Storage.SPM::(__cleanupMasterMount) master `/rhev/data-center/mnt/blockSD/66b3da26-d97b-4477-b7b5-7f2f88d1c39a/master` is not mounted, skipping
Thread-61859::DEBUG::2011-08-29 10:01:53,171::spm::205::Storage.SPM::(__cleanupMasterMount) master `/rhev/data-center/mnt/blockSD/9fb00621-9820-45b9-a3fb-8dc723a30443/master` is not mounted, skipping
Thread-61859::DEBUG::2011-08-29 10:01:53,172::spm::278::Storage.Misc.excCmd::(__releaseLocks) '/usr/bin/killall -g -USR1 spmprotect.sh' (cwd None)
Thread-61859::DEBUG::2011-08-29 10:01:53,207::spm::278::Storage.Misc.excCmd::(__releaseLocks) SUCCESS: <err> = ''; <rc> = 0
Thread-61859::WARNING::2011-08-29 10:01:53,236::spm::283::Storage.SPM::(__releaseLocks) SPM: found lease locks, releasing
Dummy-80::DEBUG::2011-08-29 10:01:54,025::storage_mailbox::623::Storage.Misc.excCmd::(_checkForMail) 'dd if=/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)
Dummy-80::DEBUG::2011-08-29 10:01:54,144::storage_mailbox::623::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0465316 s, 22.0 MB/s\n'; <rc> = 0
Thread-61859::DEBUG::2011-08-29 10:01:54,238::spm::287::Storage.Misc.excCmd::(__releaseLocks) '/usr/bin/killall -0 spmprotect.sh' (cwd None)
Thread-61859::DEBUG::2011-08-29 10:01:54,267::spm::287::Storage.Misc.excCmd::(__releaseLocks) FAILED: <err> = 'spmprotect.sh: no process killed\n'; <rc> = 1
Thread-61859::DEBUG::2011-08-29 10:01:54,269::spm::214::Storage.SPM::(__cleanupSPMLinks) cleaning links; ['/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/vms'] ['/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/tasks']
Dummy-80::INFO::2011-08-29 10:01:56,156::storage_mailbox::673::Storage.MailBox.SpmMailMonitor::(run) SPM_MailMonitor - Incoming mail monitoring thread stopped
Thread-27::INFO::2011-08-29 10:01:57,212::storage_mailbox::434::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mail monitoring thread stopped, clearing outgoing mail
Thread-27::INFO::2011-08-29 10:01:57,213::storage_mailbox::330::Storage.MailBox.HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - ['dd', 'of=/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=512', 'seek=8']
Thread-27::DEBUG::2011-08-29 10:01:57,214::storage_mailbox::334::Storage.Misc.excCmd::(_sendMail) 'dd of=/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=8' (cwd None)
Thread-27::DEBUG::2011-08-29 10:01:58,233::storage_mailbox::334::Storage.Misc.excCmd::(_sendMail) SUCCESS: <err> = '8+0 records in\n8+0 records out\n4096 bytes (4.1 kB) copied, 1.00008 s, 4.1 kB/s\n'; <rc> = 0
Thread-61859::DEBUG::2011-08-29 10:01:59,109::sp::1325::Storage.StoragePool::(stopRepoStats) 078f4e60-a883-4d68-9c30-c35d7ae014d8 stat <StatsThread(Thread-15, stopped 139965224695552)>
Thread-61859::DEBUG::2011-08-29 10:01:59,110::sp::1325::Storage.StoragePool::(stopRepoStats) bab701c1-8baf-4180-9bf1-3445e71a96a2 stat <StatsThread(Thread-16, stopped 139965214205696)>
Thread-61859::DEBUG::2011-08-29 10:01:59,110::taskManager::69::TaskManager::(prepareForShutdown) Request to stop all tasks
Thread-61859::DEBUG::2011-08-29 10:02:02,108::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: finished: None
Thread-61859::DEBUG::2011-08-29 10:02:02,109::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: moving from state preparing -> state finished
Thread-61859::DEBUG::2011-08-29 10:02:02,110::resourceManager::786::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-61859::DEBUG::2011-08-29 10:02:02,110::resourceManager::821::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-61859::DEBUG::2011-08-29 10:02:02,111::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: ref 0 aborting False
Thread-61859::INFO::2011-08-29 10:02:02,112::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, Return response: {'status': {'message': 'OK', 'code': 0}}
Thread-61859::DEBUG::2011-08-29 10:02:02,112::vm::356::vm.Vm::(_startUnderlyingVm) vmId=`bae8e60b-1712-429a-972b-16f734bff1d0`::_ongoingCreations released
Thread-61859::ERROR::2011-08-29 10:02:02,113::vm::380::vm.Vm::(_startUnderlyingVm) vmId=`bae8e60b-1712-429a-972b-16f734bff1d0`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 346, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 973, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/share/vdsm/libvirtconnection.py", line 63, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1989, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: Cannot write data: Broken pipe
Thread-61859::DEBUG::2011-08-29 10:02:02,139::vm::741::vm.Vm::(setDownStatus) vmId=`bae8e60b-1712-429a-972b-16f734bff1d0`::Changed state to Down: Cannot write data: Broken pipe
(END) 

Probably the new process pool problem.

Comment 8 Saggi Mizrahi 2011-08-29 13:23:09 UTC
http://gerrit.usersys.redhat.com/#change,863

Comment 10 Daniel Paikov 2011-09-25 14:48:55 UTC
Created attachment 524788 [details]
vdsm.log

Still reproduces on restart libvirt/run VM on 4.9-104.

Comment 11 Ayal Baron 2011-09-25 19:46:20 UTC
Daniel, any reason you assigned this bug to yourself?

Comment 12 Daniel Paikov 2011-09-26 07:51:37 UTC
Interesting. Some kind of a Bugzilla hiccup.

Comment 13 Daniel Paikov 2011-10-05 15:11:02 UTC
Checked on 4.9-104, as discussed with dnaori and fsimonce.

Comment 14 errata-xmlrpc 2011-12-06 07:31:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html


Note You need to log in before you can comment on or make changes to this bug.