Hide Forgot
Created attachment 514045 [details] log Description of problem: PrepareForShutdown may hang forever when single vm is running and restarting libvirtd. Version-Release number of selected component (if applicable): vdsm-4.9-83.el6.x86_64 How reproducible: Steps to Reproduce: you can run ruth test vmTests.LibvirtConnectionTests.test: or: 1) run vm (single host spm on iscsi sd) 2) restart libvirtd Actual results: prepareForShutdown hangs forever... Expected results: Additional info: full log attached
What was the libvirt version that you used? vmTests.LibvirtConnectionTests.test is now passing with: vdsm-4.9-87.el6.x86_64 libvirt-0.9.4-0rc1.2.el6.x86_64 Is this still reproducible for you?
(In reply to comment #3) > What was the libvirt version that you used? > vmTests.LibvirtConnectionTests.test is now passing with: > > vdsm-4.9-87.el6.x86_64 > libvirt-0.9.4-0rc1.2.el6.x86_64 > > Is this still reproducible for you? yes maybe not with this ruth test, but i can reproduce it.
commit 3e5874b6d5171ac1480cf22133775c032bb28531 Author: Federico Simoncelli <fsimonce> Date: Wed Aug 3 14:58:57 2011 +0000 BZ#723579 AdvancedStatsThread should expose stop() The VmStatsThread wasn't exposing the stop method and therefore it was never stopped during prepareForShutdown. Even if it was declared as 'daemon' thread it kept the rest of the application running satisfying its requests. Change-Id: If8501b5286394e4e862e4bae965431cd395101fd http://gerrit.usersys.redhat.com/784
Still happens in some scenarios: single host, spm. restart libvirtd, try to run vm: vdsm-4.9-96.el6.x86_64 Thread-61859::ERROR::2011-08-29 10:01:53,067::libvirtconnection::73::vds::(wrapper) connection to libvirt broken. taking vdsm down. Thread-61859::INFO::2011-08-29 10:01:53,068::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, args: () Thread-61859::DEBUG::2011-08-29 10:01:53,069::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: moving from state init -> state preparing Thread-61859::DEBUG::2011-08-29 10:01:53,070::spm::199::Storage.SPM::(__cleanupMasterMount) unmounting /rhev/data-center/mnt/blockSD/078f4e60-a883-4d68-9c30-c35d7ae014d8/master MainThread::INFO::2011-08-29 10:01:53,076::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 28 other threads... MainThread::INFO::2011-08-29 10:01:53,077::vdsm::79::vds::(run) <_MainThread(MainThread, started 139966079985408)> MainThread::INFO::2011-08-29 10:01:53,079::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 139965987526400)> MainThread::INFO::2011-08-29 10:01:53,080::vdsm::79::vds::(run) <WorkerThread(Thread-4, started 139965736417024)> MainThread::INFO::2011-08-29 10:01:53,080::vdsm::79::vds::(run) <KsmMonitorThread(KsmMonitor, started daemon 139965235185408)> MainThread::INFO::2011-08-29 10:01:53,081::vdsm::79::vds::(run) <Thread(Thread-61859, started 139963913987840)> MainThread::INFO::2011-08-29 10:01:53,081::vdsm::79::vds::(run) <WorkerThread(Thread-79, started 139964285171456)> MainThread::INFO::2011-08-29 10:01:53,082::vdsm::79::vds::(run) <WorkerThread(Thread-9, started 139965683967744)> MainThread::INFO::2011-08-29 10:01:53,083::vdsm::79::vds::(run) <StatsThread(Thread-15, started 139965224695552)> MainThread::INFO::2011-08-29 10:01:53,083::vdsm::79::vds::(run) <WorkerThread(Thread-76, started 139964316641024)> MainThread::INFO::2011-08-29 10:01:53,084::vdsm::79::vds::(run) <WorkerThread(4ad0f804-fbff-4ca1-a674-a1e6b98f58d1, started 139965704947456)> MainThread::INFO::2011-08-29 10:01:53,085::vdsm::79::vds::(run) <WorkerThread(ea0a2b34-50e9-4939-b11c-39075ce1efb7, started 139965715437312)> MainThread::INFO::2011-08-29 10:01:53,085::vdsm::79::vds::(run) <WorkerThread(Thread-75, started 139965256165120)> MainThread::INFO::2011-08-29 10:01:53,086::vdsm::79::vds::(run) <WorkerThread(a11ff014-7fa2-4eb6-9aec-38b3e13df563, started 139965266654976)> MainThread::INFO::2011-08-29 10:01:53,087::vdsm::79::vds::(run) <WorkerThread(Thread-26, started 139964748531456)> MainThread::INFO::2011-08-29 10:01:53,087::vdsm::79::vds::(run) <WorkerThread(Thread-25, started 139964759021312)> MainThread::INFO::2011-08-29 10:01:53,088::vdsm::79::vds::(run) <WorkerThread(Thread-8, started 139965694457600)> MainThread::INFO::2011-08-29 10:01:53,088::vdsm::79::vds::(run) <WorkerThread(Thread-22, started 139964769511168)> MainThread::INFO::2011-08-29 10:01:53,089::vdsm::79::vds::(run) <WorkerThread(Thread-23, started 139964780001024)> MainThread::INFO::2011-08-29 10:01:53,090::vdsm::79::vds::(run) <WorkerThread(Thread-24, started 139964790490880)> MainThread::INFO::2011-08-29 10:01:53,090::vdsm::79::vds::(run) <WorkerThread(Thread-78, started 139964295661312)> MainThread::INFO::2011-08-29 10:01:53,091::vdsm::79::vds::(run) <HSM_MailMonitor(Thread-27, started 139964327130880)> MainThread::INFO::2011-08-29 10:01:53,092::vdsm::79::vds::(run) <WorkerThread(Thread-77, started 139964306151168)> MainThread::INFO::2011-08-29 10:01:53,092::vdsm::79::vds::(run) <Thread(Thread-61864, stopped 139965245675264)> MainThread::INFO::2011-08-29 10:01:53,093::vdsm::79::vds::(run) <StatsThread(Thread-16, started 139965214205696)> MainThread::INFO::2011-08-29 10:01:53,094::vdsm::79::vds::(run) <WorkerThread(Thread-5, started 139965725927168)> MainThread::INFO::2011-08-29 10:01:53,094::vdsm::79::vds::(run) <_DummyThread(Dummy-80, started daemon 139964274681600)> <WorkerThread(9acc2b95-dbf2-4180-b0f1-e5deb5936ead, started 139965974914816)> MainThread::INFO::2011-08-29 10:01:53,096::vdsm::79::vds::(run) <WorkerThread(5c1e9dbc-b154-44ae-ae7c-cc038b88ec37, started 139965953935104)> MainThread::INFO::2011-08-29 10:01:53,096::vdsm::79::vds::(run) <WorkerThread(8fa890b3-c210-4a6a-bf04-13d7e34373a0, started 139965964424960)> Thread-61859::DEBUG::2011-08-29 10:01:53,135::fileUtils::112::Storage.Misc.excCmd::(umount) '/usr/bin/sudo -n /bin/umount -f /rhev/data-center/mnt/blockSD/078f4e60-a883-4d68-9c30-c35d7ae014d8/master' (cwd None) Thread-61859::DEBUG::2011-08-29 10:01:53,169::fileUtils::112::Storage.Misc.excCmd::(umount) SUCCESS: <err> = ''; <rc> = 0 Thread-61859::DEBUG::2011-08-29 10:01:53,171::spm::205::Storage.SPM::(__cleanupMasterMount) master `/rhev/data-center/mnt/blockSD/66b3da26-d97b-4477-b7b5-7f2f88d1c39a/master` is not mounted, skipping Thread-61859::DEBUG::2011-08-29 10:01:53,171::spm::205::Storage.SPM::(__cleanupMasterMount) master `/rhev/data-center/mnt/blockSD/9fb00621-9820-45b9-a3fb-8dc723a30443/master` is not mounted, skipping Thread-61859::DEBUG::2011-08-29 10:01:53,172::spm::278::Storage.Misc.excCmd::(__releaseLocks) '/usr/bin/killall -g -USR1 spmprotect.sh' (cwd None) Thread-61859::DEBUG::2011-08-29 10:01:53,207::spm::278::Storage.Misc.excCmd::(__releaseLocks) SUCCESS: <err> = ''; <rc> = 0 Thread-61859::WARNING::2011-08-29 10:01:53,236::spm::283::Storage.SPM::(__releaseLocks) SPM: found lease locks, releasing Dummy-80::DEBUG::2011-08-29 10:01:54,025::storage_mailbox::623::Storage.Misc.excCmd::(_checkForMail) 'dd if=/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None) Dummy-80::DEBUG::2011-08-29 10:01:54,144::storage_mailbox::623::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0465316 s, 22.0 MB/s\n'; <rc> = 0 Thread-61859::DEBUG::2011-08-29 10:01:54,238::spm::287::Storage.Misc.excCmd::(__releaseLocks) '/usr/bin/killall -0 spmprotect.sh' (cwd None) Thread-61859::DEBUG::2011-08-29 10:01:54,267::spm::287::Storage.Misc.excCmd::(__releaseLocks) FAILED: <err> = 'spmprotect.sh: no process killed\n'; <rc> = 1 Thread-61859::DEBUG::2011-08-29 10:01:54,269::spm::214::Storage.SPM::(__cleanupSPMLinks) cleaning links; ['/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/vms'] ['/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/tasks'] Dummy-80::INFO::2011-08-29 10:01:56,156::storage_mailbox::673::Storage.MailBox.SpmMailMonitor::(run) SPM_MailMonitor - Incoming mail monitoring thread stopped Thread-27::INFO::2011-08-29 10:01:57,212::storage_mailbox::434::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mail monitoring thread stopped, clearing outgoing mail Thread-27::INFO::2011-08-29 10:01:57,213::storage_mailbox::330::Storage.MailBox.HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - ['dd', 'of=/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=512', 'seek=8'] Thread-27::DEBUG::2011-08-29 10:01:57,214::storage_mailbox::334::Storage.Misc.excCmd::(_sendMail) 'dd of=/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=8' (cwd None) Thread-27::DEBUG::2011-08-29 10:01:58,233::storage_mailbox::334::Storage.Misc.excCmd::(_sendMail) SUCCESS: <err> = '8+0 records in\n8+0 records out\n4096 bytes (4.1 kB) copied, 1.00008 s, 4.1 kB/s\n'; <rc> = 0 Thread-61859::DEBUG::2011-08-29 10:01:59,109::sp::1325::Storage.StoragePool::(stopRepoStats) 078f4e60-a883-4d68-9c30-c35d7ae014d8 stat <StatsThread(Thread-15, stopped 139965224695552)> Thread-61859::DEBUG::2011-08-29 10:01:59,110::sp::1325::Storage.StoragePool::(stopRepoStats) bab701c1-8baf-4180-9bf1-3445e71a96a2 stat <StatsThread(Thread-16, stopped 139965214205696)> Thread-61859::DEBUG::2011-08-29 10:01:59,110::taskManager::69::TaskManager::(prepareForShutdown) Request to stop all tasks Thread-61859::DEBUG::2011-08-29 10:02:02,108::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: finished: None Thread-61859::DEBUG::2011-08-29 10:02:02,109::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: moving from state preparing -> state finished Thread-61859::DEBUG::2011-08-29 10:02:02,110::resourceManager::786::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-61859::DEBUG::2011-08-29 10:02:02,110::resourceManager::821::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-61859::DEBUG::2011-08-29 10:02:02,111::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: ref 0 aborting False Thread-61859::INFO::2011-08-29 10:02:02,112::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, Return response: {'status': {'message': 'OK', 'code': 0}} Thread-61859::DEBUG::2011-08-29 10:02:02,112::vm::356::vm.Vm::(_startUnderlyingVm) vmId=`bae8e60b-1712-429a-972b-16f734bff1d0`::_ongoingCreations released Thread-61859::ERROR::2011-08-29 10:02:02,113::vm::380::vm.Vm::(_startUnderlyingVm) vmId=`bae8e60b-1712-429a-972b-16f734bff1d0`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 346, in _startUnderlyingVm self._run() File "/usr/share/vdsm/libvirtvm.py", line 973, in _run self._connection.createXML(domxml, flags), File "/usr/share/vdsm/libvirtconnection.py", line 63, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1989, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Cannot write data: Broken pipe Thread-61859::DEBUG::2011-08-29 10:02:02,139::vm::741::vm.Vm::(setDownStatus) vmId=`bae8e60b-1712-429a-972b-16f734bff1d0`::Changed state to Down: Cannot write data: Broken pipe (END) Probably the new process pool problem.
http://gerrit.usersys.redhat.com/#change,863
Created attachment 524788 [details] vdsm.log Still reproduces on restart libvirt/run VM on 4.9-104.
Daniel, any reason you assigned this bug to yourself?
Interesting. Some kind of a Bugzilla hiccup.
Checked on 4.9-104, as discussed with dnaori and fsimonce.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html