Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Previously, a race condition in Python's subprocess Popen caused virtual machine creation to fail. A patch to VDSM prevents virtual machine failure when this race condition is present.
when trying to create VM i saw an error in the engine log "Please handle Storage Domain issues and retry the operation"
i executed getDeviceList which never return with results
looking in the ols vdsm log you can see that last thread that lock on a resource see Thread-114297 on attached vdsm.log.2.xz
other resources that request for a lock don't get it.
I've managed to reproduce this when testing patches.
I happens randomly when forking\execing in python
Some days it doesn't happen at all and some days it happens all the time.
Take into account that I've been doing (because of testing) about a 10000 forks a test run, running a test run every code change and still it happened only rarely for me.
The origin is a deadlock in python which I can't quite nail the root cause of.
I know where it's stuck I don't know why it's stuck.
If you are interested Python is getting deadlock trying to get the local thread context log after a fork() in order to reinit the GIL
In any case, It should be fixed with this stack (which avoids the problem by avoiding forking all together)
http://gerrit.ovirt.org/#q,status:open+project:vdsm+branch:master+topic:coop,n,z
Solves problems with forking\execing\process pool
so if there is no way of reproducing since we do not know exactly why it happens, and we are adding new code that avoids forking all together, how can we verify this bug?
According to Saggi's comment 11, it is a nasty, not completely clear, race condition in Python's subprocess.Popen. We do not have a clear reproducer for this, or a simple way to verify the bug.
The best I can see for QE is to stress-test Vdsm with multiple (as many as possible) block storage domains.
Saggi assumes that the problem noticed by Avihai shall be gone when his
http://gerrit.ovirt.org/3944
is in.
we didn't encounter it following our various automation runs nor in manual storage sanity, and also on scalability (tried my-self with domain constructed with 100 pvs).
vdsm-4.9.6-21.0.el6_3.x86_64
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHSA-2012-1508.html