| Summary: | [vdsm] PrepareForShutdown may hang forever when single vm is running and restarting libvirtd. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | David Naori <dnaori> | ||||||
| Component: | vdsm | Assignee: | Saggi Mizrahi <smizrahi> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | David Naori <dnaori> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.1 | CC: | abaron, bazulay, danken, dnaori, dpaikov, hateya, iheim, ilvovsky, mgoldboi, syeghiay, ykaul | ||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | vdsm-4.9-98.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2011-12-06 07:31:55 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
What was the libvirt version that you used? vmTests.LibvirtConnectionTests.test is now passing with: vdsm-4.9-87.el6.x86_64 libvirt-0.9.4-0rc1.2.el6.x86_64 Is this still reproducible for you? (In reply to comment #3) > What was the libvirt version that you used? > vmTests.LibvirtConnectionTests.test is now passing with: > > vdsm-4.9-87.el6.x86_64 > libvirt-0.9.4-0rc1.2.el6.x86_64 > > Is this still reproducible for you? yes maybe not with this ruth test, but i can reproduce it. commit 3e5874b6d5171ac1480cf22133775c032bb28531
Author: Federico Simoncelli <fsimonce>
Date: Wed Aug 3 14:58:57 2011 +0000
BZ#723579 AdvancedStatsThread should expose stop()
The VmStatsThread wasn't exposing the stop method and therefore
it was never stopped during prepareForShutdown.
Even if it was declared as 'daemon' thread it kept the rest of
the application running satisfying its requests.
Change-Id: If8501b5286394e4e862e4bae965431cd395101fd
http://gerrit.usersys.redhat.com/784
Still happens in some scenarios:
single host, spm.
restart libvirtd, try to run vm:
vdsm-4.9-96.el6.x86_64
Thread-61859::ERROR::2011-08-29 10:01:53,067::libvirtconnection::73::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-61859::INFO::2011-08-29 10:01:53,068::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, args: ()
Thread-61859::DEBUG::2011-08-29 10:01:53,069::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: moving from state init -> state preparing
Thread-61859::DEBUG::2011-08-29 10:01:53,070::spm::199::Storage.SPM::(__cleanupMasterMount) unmounting /rhev/data-center/mnt/blockSD/078f4e60-a883-4d68-9c30-c35d7ae014d8/master
MainThread::INFO::2011-08-29 10:01:53,076::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 28 other threads...
MainThread::INFO::2011-08-29 10:01:53,077::vdsm::79::vds::(run) <_MainThread(MainThread, started 139966079985408)>
MainThread::INFO::2011-08-29 10:01:53,079::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 139965987526400)>
MainThread::INFO::2011-08-29 10:01:53,080::vdsm::79::vds::(run) <WorkerThread(Thread-4, started 139965736417024)>
MainThread::INFO::2011-08-29 10:01:53,080::vdsm::79::vds::(run) <KsmMonitorThread(KsmMonitor, started daemon 139965235185408)>
MainThread::INFO::2011-08-29 10:01:53,081::vdsm::79::vds::(run) <Thread(Thread-61859, started 139963913987840)>
MainThread::INFO::2011-08-29 10:01:53,081::vdsm::79::vds::(run) <WorkerThread(Thread-79, started 139964285171456)>
MainThread::INFO::2011-08-29 10:01:53,082::vdsm::79::vds::(run) <WorkerThread(Thread-9, started 139965683967744)>
MainThread::INFO::2011-08-29 10:01:53,083::vdsm::79::vds::(run) <StatsThread(Thread-15, started 139965224695552)>
MainThread::INFO::2011-08-29 10:01:53,083::vdsm::79::vds::(run) <WorkerThread(Thread-76, started 139964316641024)>
MainThread::INFO::2011-08-29 10:01:53,084::vdsm::79::vds::(run) <WorkerThread(4ad0f804-fbff-4ca1-a674-a1e6b98f58d1, started 139965704947456)>
MainThread::INFO::2011-08-29 10:01:53,085::vdsm::79::vds::(run) <WorkerThread(ea0a2b34-50e9-4939-b11c-39075ce1efb7, started 139965715437312)>
MainThread::INFO::2011-08-29 10:01:53,085::vdsm::79::vds::(run) <WorkerThread(Thread-75, started 139965256165120)>
MainThread::INFO::2011-08-29 10:01:53,086::vdsm::79::vds::(run) <WorkerThread(a11ff014-7fa2-4eb6-9aec-38b3e13df563, started 139965266654976)>
MainThread::INFO::2011-08-29 10:01:53,087::vdsm::79::vds::(run) <WorkerThread(Thread-26, started 139964748531456)>
MainThread::INFO::2011-08-29 10:01:53,087::vdsm::79::vds::(run) <WorkerThread(Thread-25, started 139964759021312)>
MainThread::INFO::2011-08-29 10:01:53,088::vdsm::79::vds::(run) <WorkerThread(Thread-8, started 139965694457600)>
MainThread::INFO::2011-08-29 10:01:53,088::vdsm::79::vds::(run) <WorkerThread(Thread-22, started 139964769511168)>
MainThread::INFO::2011-08-29 10:01:53,089::vdsm::79::vds::(run) <WorkerThread(Thread-23, started 139964780001024)>
MainThread::INFO::2011-08-29 10:01:53,090::vdsm::79::vds::(run) <WorkerThread(Thread-24, started 139964790490880)>
MainThread::INFO::2011-08-29 10:01:53,090::vdsm::79::vds::(run) <WorkerThread(Thread-78, started 139964295661312)>
MainThread::INFO::2011-08-29 10:01:53,091::vdsm::79::vds::(run) <HSM_MailMonitor(Thread-27, started 139964327130880)>
MainThread::INFO::2011-08-29 10:01:53,092::vdsm::79::vds::(run) <WorkerThread(Thread-77, started 139964306151168)>
MainThread::INFO::2011-08-29 10:01:53,092::vdsm::79::vds::(run) <Thread(Thread-61864, stopped 139965245675264)>
MainThread::INFO::2011-08-29 10:01:53,093::vdsm::79::vds::(run) <StatsThread(Thread-16, started 139965214205696)>
MainThread::INFO::2011-08-29 10:01:53,094::vdsm::79::vds::(run) <WorkerThread(Thread-5, started 139965725927168)>
MainThread::INFO::2011-08-29 10:01:53,094::vdsm::79::vds::(run) <_DummyThread(Dummy-80, started daemon 139964274681600)>
<WorkerThread(9acc2b95-dbf2-4180-b0f1-e5deb5936ead, started 139965974914816)>
MainThread::INFO::2011-08-29 10:01:53,096::vdsm::79::vds::(run) <WorkerThread(5c1e9dbc-b154-44ae-ae7c-cc038b88ec37, started 139965953935104)>
MainThread::INFO::2011-08-29 10:01:53,096::vdsm::79::vds::(run) <WorkerThread(8fa890b3-c210-4a6a-bf04-13d7e34373a0, started 139965964424960)>
Thread-61859::DEBUG::2011-08-29 10:01:53,135::fileUtils::112::Storage.Misc.excCmd::(umount) '/usr/bin/sudo -n /bin/umount -f /rhev/data-center/mnt/blockSD/078f4e60-a883-4d68-9c30-c35d7ae014d8/master' (cwd None)
Thread-61859::DEBUG::2011-08-29 10:01:53,169::fileUtils::112::Storage.Misc.excCmd::(umount) SUCCESS: <err> = ''; <rc> = 0
Thread-61859::DEBUG::2011-08-29 10:01:53,171::spm::205::Storage.SPM::(__cleanupMasterMount) master `/rhev/data-center/mnt/blockSD/66b3da26-d97b-4477-b7b5-7f2f88d1c39a/master` is not mounted, skipping
Thread-61859::DEBUG::2011-08-29 10:01:53,171::spm::205::Storage.SPM::(__cleanupMasterMount) master `/rhev/data-center/mnt/blockSD/9fb00621-9820-45b9-a3fb-8dc723a30443/master` is not mounted, skipping
Thread-61859::DEBUG::2011-08-29 10:01:53,172::spm::278::Storage.Misc.excCmd::(__releaseLocks) '/usr/bin/killall -g -USR1 spmprotect.sh' (cwd None)
Thread-61859::DEBUG::2011-08-29 10:01:53,207::spm::278::Storage.Misc.excCmd::(__releaseLocks) SUCCESS: <err> = ''; <rc> = 0
Thread-61859::WARNING::2011-08-29 10:01:53,236::spm::283::Storage.SPM::(__releaseLocks) SPM: found lease locks, releasing
Dummy-80::DEBUG::2011-08-29 10:01:54,025::storage_mailbox::623::Storage.Misc.excCmd::(_checkForMail) 'dd if=/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)
Dummy-80::DEBUG::2011-08-29 10:01:54,144::storage_mailbox::623::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0465316 s, 22.0 MB/s\n'; <rc> = 0
Thread-61859::DEBUG::2011-08-29 10:01:54,238::spm::287::Storage.Misc.excCmd::(__releaseLocks) '/usr/bin/killall -0 spmprotect.sh' (cwd None)
Thread-61859::DEBUG::2011-08-29 10:01:54,267::spm::287::Storage.Misc.excCmd::(__releaseLocks) FAILED: <err> = 'spmprotect.sh: no process killed\n'; <rc> = 1
Thread-61859::DEBUG::2011-08-29 10:01:54,269::spm::214::Storage.SPM::(__cleanupSPMLinks) cleaning links; ['/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/vms'] ['/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/tasks']
Dummy-80::INFO::2011-08-29 10:01:56,156::storage_mailbox::673::Storage.MailBox.SpmMailMonitor::(run) SPM_MailMonitor - Incoming mail monitoring thread stopped
Thread-27::INFO::2011-08-29 10:01:57,212::storage_mailbox::434::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mail monitoring thread stopped, clearing outgoing mail
Thread-27::INFO::2011-08-29 10:01:57,213::storage_mailbox::330::Storage.MailBox.HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - ['dd', 'of=/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=512', 'seek=8']
Thread-27::DEBUG::2011-08-29 10:01:57,214::storage_mailbox::334::Storage.Misc.excCmd::(_sendMail) 'dd of=/rhev/data-center/54d75ab9-3a7e-42f4-bb8d-e7c799862b11/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=8' (cwd None)
Thread-27::DEBUG::2011-08-29 10:01:58,233::storage_mailbox::334::Storage.Misc.excCmd::(_sendMail) SUCCESS: <err> = '8+0 records in\n8+0 records out\n4096 bytes (4.1 kB) copied, 1.00008 s, 4.1 kB/s\n'; <rc> = 0
Thread-61859::DEBUG::2011-08-29 10:01:59,109::sp::1325::Storage.StoragePool::(stopRepoStats) 078f4e60-a883-4d68-9c30-c35d7ae014d8 stat <StatsThread(Thread-15, stopped 139965224695552)>
Thread-61859::DEBUG::2011-08-29 10:01:59,110::sp::1325::Storage.StoragePool::(stopRepoStats) bab701c1-8baf-4180-9bf1-3445e71a96a2 stat <StatsThread(Thread-16, stopped 139965214205696)>
Thread-61859::DEBUG::2011-08-29 10:01:59,110::taskManager::69::TaskManager::(prepareForShutdown) Request to stop all tasks
Thread-61859::DEBUG::2011-08-29 10:02:02,108::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: finished: None
Thread-61859::DEBUG::2011-08-29 10:02:02,109::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: moving from state preparing -> state finished
Thread-61859::DEBUG::2011-08-29 10:02:02,110::resourceManager::786::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-61859::DEBUG::2011-08-29 10:02:02,110::resourceManager::821::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-61859::DEBUG::2011-08-29 10:02:02,111::task::492::TaskManager.Task::(_debug) Task 2ca47b83-91e8-41f7-8095-1f345c0ca26a: ref 0 aborting False
Thread-61859::INFO::2011-08-29 10:02:02,112::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, Return response: {'status': {'message': 'OK', 'code': 0}}
Thread-61859::DEBUG::2011-08-29 10:02:02,112::vm::356::vm.Vm::(_startUnderlyingVm) vmId=`bae8e60b-1712-429a-972b-16f734bff1d0`::_ongoingCreations released
Thread-61859::ERROR::2011-08-29 10:02:02,113::vm::380::vm.Vm::(_startUnderlyingVm) vmId=`bae8e60b-1712-429a-972b-16f734bff1d0`::The vm start process failed
Traceback (most recent call last):
File "/usr/share/vdsm/vm.py", line 346, in _startUnderlyingVm
self._run()
File "/usr/share/vdsm/libvirtvm.py", line 973, in _run
self._connection.createXML(domxml, flags),
File "/usr/share/vdsm/libvirtconnection.py", line 63, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1989, in createXML
if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: Cannot write data: Broken pipe
Thread-61859::DEBUG::2011-08-29 10:02:02,139::vm::741::vm.Vm::(setDownStatus) vmId=`bae8e60b-1712-429a-972b-16f734bff1d0`::Changed state to Down: Cannot write data: Broken pipe
(END)
Probably the new process pool problem.
Created attachment 524788 [details]
vdsm.log
Still reproduces on restart libvirt/run VM on 4.9-104.
Daniel, any reason you assigned this bug to yourself? Interesting. Some kind of a Bugzilla hiccup. Checked on 4.9-104, as discussed with dnaori and fsimonce. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html |
Created attachment 514045 [details] log Description of problem: PrepareForShutdown may hang forever when single vm is running and restarting libvirtd. Version-Release number of selected component (if applicable): vdsm-4.9-83.el6.x86_64 How reproducible: Steps to Reproduce: you can run ruth test vmTests.LibvirtConnectionTests.test: or: 1) run vm (single host spm on iscsi sd) 2) restart libvirtd Actual results: prepareForShutdown hangs forever... Expected results: Additional info: full log attached