Bug 802759
Summary: | 3.1 - deadlock after activateStorageDomain ran | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Avihai Shoham <ashoham> | ||||
Component: | vdsm | Assignee: | Saggi Mizrahi <smizrahi> | ||||
Status: | CLOSED ERRATA | QA Contact: | Jakub Libosvar <jlibosva> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.3 | CC: | abaron, acathrow, bazulay, cpelland, danken, dron, hateya, iheim, ilvovsky, jbiddle, jlibosva, smizrahi, syeghiay, yeylon, ykaul, zdover | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | vdsm-4.9.6-18.0 | Doc Type: | Bug Fix | ||||
Doc Text: |
Previously, a race condition in Python's subprocess Popen caused virtual machine creation to fail. A patch to VDSM prevents virtual machine failure when this race condition is present.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-12-04 18:55:09 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Avihai Shoham
2012-03-13 12:58:23 UTC
This is actually not a revenge of According to sagi this bug fixed "ages ago" Well, apparently it isn't, I put up new patches that are a bit more robust. Please test with the patches applied Created attachment 571126 [details]
repro on March19
this log repro this issue on March19
(In reply to comment #7) > > this log repro this issue on March19 but with which vdsm? Saggi's patch has been taken upstream only yesterday. I've managed to reproduce this when testing patches. I happens randomly when forking\execing in python Some days it doesn't happen at all and some days it happens all the time. Take into account that I've been doing (because of testing) about a 10000 forks a test run, running a test run every code change and still it happened only rarely for me. The origin is a deadlock in python which I can't quite nail the root cause of. I know where it's stuck I don't know why it's stuck. If you are interested Python is getting deadlock trying to get the local thread context log after a fork() in order to reinit the GIL In any case, It should be fixed with this stack (which avoids the problem by avoiding forking all together) http://gerrit.ovirt.org/#q,status:open+project:vdsm+branch:master+topic:coop,n,z Solves problems with forking\execing\process pool so if there is no way of reproducing since we do not know exactly why it happens, and we are adding new code that avoids forking all together, how can we verify this bug? Removing QA_ACK until we get instructions on what needs to be tested here. Setting QE conditional NACK on Reproducer and Requirements. According to Saggi's comment 11, it is a nasty, not completely clear, race condition in Python's subprocess.Popen. We do not have a clear reproducer for this, or a simple way to verify the bug. The best I can see for QE is to stress-test Vdsm with multiple (as many as possible) block storage domains. Saggi assumes that the problem noticed by Avihai shall be gone when his http://gerrit.ovirt.org/3944 is in. we didn't encounter it following our various automation runs nor in manual storage sanity, and also on scalability (tried my-self with domain constructed with 100 pvs). vdsm-4.9.6-21.0.el6_3.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1508.html |