Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1022924

Summary:

hosted engine | two qemu processes with the same VM ID runs on two different machines. Also the sanlock resource taken on two machines on the same storage.

Product:

Red Hat Enterprise Linux 6

Reporter:

Leonid Natapov <lnatapov>

Component:

libvirt

Assignee:

Michal Privoznik <mprivozn>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

6.5

CC:

bazulay, berrange, dfediuck, dyuan, eblake, fsimonce, gpadgett, iheim, jdenemar, lcui, lnatapov, lpeer, mjenner, msivak, mzhan, rbalakri, scohen, ydu, yeylon, zhwang

Target Milestone:

Keywords:

ZStream

Target Release:

6.5

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

storage

Fixed In Version:

libvirt-0.10.2-30.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-10-14 04:18:02 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1029629, 1029632

Attachments:

Description	Flags
green	none
purple	none
Script to analyse libvirtd.log for race conditions	none
Analysis of libvirtd.log.13.xz on machine purple	none

Description Leonid Natapov 2013-10-24 09:32:06 UTC

qemu process runs on two hosts. When I do cat /proc/5204/cmdline | less on both hosts I see the same VM's uuid.
Also the sanlock resource is taken on two hosts on the same storage.
Actually,no specific actions were done to lead this situation. Happened overnight. vdsm logs from both hosts attached.


[root@purple-vds2 ~]# sanlock status
daemon fc5aa354-0e24-4007-a378-cf1b04820c4b.purple-vds
p -1 helper
p -1 listener
p -1 status
p 14892 HostedEngine
s ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:2:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/dom_md/ids:0
s 8550cb2d-6a96-4dcc-90f5-45d9e696bc97:2:/dev/8550cb2d-6a96-4dcc-90f5-45d9e696bc97/ids:0
s fe86dfb9-b53f-4d5d-b24a-f9f90c45cf25:2:/dev/fe86dfb9-b53f-4d5d-b24a-f9f90c45cf25/ids:0
s hosted-engine:2:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/ha_agent/hosted-engine.lockspace:0
r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:241 p 14892




[root@green-vdsc ~]# sanlock status
daemon 2fef2dd6-a966-46e8-9654-54887b39a449.green-vdsc
p -1 helper
p -1 listener
p 5204 5204
p -1 
p 15942 HostedEngine
p -1 status
s 8550cb2d-6a96-4dcc-90f5-45d9e696bc97:3:/dev/8550cb2d-6a96-4dcc-90f5-45d9e696bc97/ids:0
s fe86dfb9-b53f-4d5d-b24a-f9f90c45cf25:3:/dev/fe86dfb9-b53f-4d5d-b24a-f9f90c45cf25/ids:0
s ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:3:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/dom_md/ids:0
s hosted-engine:3:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/ha_agent/hosted-engine.lockspace:0
r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:0 p 15942

Comment 1 Leonid Natapov 2013-10-24 09:33:27 UTC

Created attachment 815713 [details]
green

Comment 2 Leonid Natapov 2013-10-24 09:34:01 UTC

Created attachment 815714 [details]
purple

Comment 3 Doron Fediuck 2013-10-24 10:18:59 UTC

Leonid,
can you please verify the host id in the configuration and meta data file?

Comment 4 Eduardo Warszawski 2013-10-24 19:32:09 UTC

On green-vdsc, at Oct24 03:55, qemu was started.
libvirt reported an error to acquire the lock

Thread-258284::ERROR::2013-10-24 03:55:30,277::vm::2132::vm.Vm:_startUnderlyingVm) vmId=`ac687a24-05cd-43fb-9f9d-c53169fa8cd0`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 2092, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/vm.py", line 2959, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error Failed to acquire lock: error -243

but libvirt then wrote to its own log

2013-10-24 00:55:30.275+0000: 8283: debug : qemuMonitorSetCapabilities:1171 : mon=0x7f3f6c185ed0
Caught Segmentation violation dumping internal log buffer:


    ====== start of log =====

^@772 : Removing cgroup /cgroup/cpuacct/libvirt/qemu/HostedEngine/ and all child cgroups

and leaked the recently-started qemu process.

There's probably a bug in sanlock or on libvirt's usage thereof.

Comment 5 Eduardo Warszawski 2013-10-24 19:41:04 UTC

Need to check correlation with:
https://bugzilla.redhat.com/show_bug.cgi?id=1019229

Comment 7 Leonid Natapov 2013-10-27 10:10:05 UTC

Fede,you can connect to green-vdsc.qa.lab and purple-vds2.qa.lab
I didn't touch them since we discovered the problem.

Comment 8 Federico Simoncelli 2013-10-28 09:52:20 UTC

This seems entirely a libvirt issue. As described in comment 4 on green-vdsc libvirt crashed, but also on purple-vds2 the vm got lost somehow:

# ps aux | grep qemu
qemu     14892  4.3 42.4 6652168 4296664 ?     Sl   Oct24 275:24 /usr/libexec/qemu-kvm -name HostedEngine -S -M rhel6.5.0 ...

# stat -c %z /proc/14892
2013-10-24 03:17:54.878902958 +0300

from libvirt log:

2013-10-24 00:17:54.875+0000: 2387: debug : virCommandRunAsync:2229 : About to run LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name HostedEngine -S -M rhel6.5.0  ...
...
2013-10-24 00:17:55.149+0000: 2387: debug : qemuMonitorOpenInternal:765 : QEMU_MONITOR_NEW: mon=0x7ff6b80a0040 refs=2 fd=30
...
2013-10-24 00:18:19.333+0000: 2353: debug : qemuMonitorEmitResume:1028 : mon=0x7ff6b80a0040

But now running:

# virsh -r list
 Id    Name                           State
----------------------------------------------------

the vm is not reported.

Libvirt was never restarted (or crashed) since the vm creation. Last restart was:

2013-10-23 12:06:23.139+0000: 2353: info : libvirt version: 0.10.2, package: 29.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2013-10-09-06:25:35, x86-026.build.eng.bos.redhat.com)

And the process is still the same:

# ps aux | grep libvirt
root      2353  0.1  0.3 935904 33984 ?        SLsl Oct23   7:31 /usr/sbin/libvirtd --listen

# stat -c %z /proc/2353
2013-10-23 15:06:22.067687916 +0300

I haven't been able to identify (from the logs) exactly when the vm stopped to be reported.

At the moment the status is:

1. on green-vdsc libvirt crashed (apparently during the vm creation), the lease was not acquired, it seems to me that the vm was never unpaused
2. on purple-vds2 the lease was acquired and the vm was unpaused but then libvirt (never restarted or crashed) lost track of the vm

At the moment trying to start the vm on any host fails because the volume lease is correctly acquired by the vm running on purple-vds2 (but not reported by libvirt).

In my opinion there was no disk corruption as the vm on green-vdsc failed to acquire the lease and it wasn't unpaused but I suppose we'll find out about this only after we finished debugging.


What worried me the most was that the volume resource was reported on both hosts (according to the bug description):

[root@purple-vds2 ~]# sanlock status
daemon fc5aa354-0e24-4007-a378-cf1b04820c4b.purple-vds
...
r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:241 p 14892


[root@green-vdsc ~]# sanlock status
daemon 2fef2dd6-a966-46e8-9654-54887b39a449.green-vdsc
...
r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:0 p 15942


analyzing this closely I'm almost certain that it's a reporting issue. In fact on purple-vds2 (where the vm took the lease and was unpaused) the lease was successfully acquired:

2013-10-24 03:18:15+0300 43963 [4203]: s4:r13 resource ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_hosted__engine1/ef4a31eb-6
88f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0 for 4,20,14892
2013-10-24 03:18:19+0300 43967 [2329]: s4:r14 resource ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0 for 2,17,15173
2013-10-24 03:18:19+0300 43967 [2329]: r14 acquire_token resource exists

and it is still present now in the status:

[root@purple-vds2 ~]# sanlock status
daemon fc5aa354-0e24-4007-a378-cf1b04820c4b.purple-vds
...
r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:241 p 14892


but on green-vdsc disappeared as in fact "sanlock status" reports also the on-fly requests. In particular the one in the description was on-fly when the "sanlock status" was issued:

r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:0 p 15942

but the request was refused (see 15942):

2013-10-24 12:09:13+0300 329125 [5698]: s5:r2865 resource ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0 for 4,20,15942
2013-10-24 12:09:20+0300 329132 [5698]: r2865 paxos_acquire owner 2 delta 2 28 75828 alive
2013-10-24 12:09:20+0300 329132 [5698]: r2865 cmd_acquire 4,20,15942 acquire_token -243

2013-10-24 09:09:20.303+0000: 29535: error : virLockManagerSanlockAcquire:971 : internal error Failed to acquire lock: error -243
2013-10-24 09:09:20.303+0000: 29535: debug : qemuProcessStop:4244 : Shutting down VM 'HostedEngine' pid=15942 flags=0

Moreover I've also been able to reproduce the reporting issue on my hosts.

Comment 11 Eric Blake 2013-10-28 17:55:59 UTC

danpb posted two race fixes where libvirt could lose track of a running guest (I'm not sure if these have any relation to the problems here, but the fact that this bug describes a running VM that libvirt lost track of means it is worth mentioning):
https://www.redhat.com/archives/libvir-list/2013-October/msg01136.html
https://www.redhat.com/archives/libvir-list/2013-October/msg01113.html

I've logged in to both machines, and verified that on both purple and green, a qemu process for the VM named HostedEngine is still running, and that in both cases, libvirtd has lost track of that child process.  Which host was it running on first?  I'm also attempting to manually determine what state each of the two qemu processes are in without killing any running processes yet...

Comment 12 Eric Blake 2013-10-28 19:40:28 UTC

I'm stumped on how to determine what state the qemu processes on both machines are still at.  Normally, when libvirtd dies but leaves a qemu process running, it leaves /var/lib/libvirt/qemu/$dom.monitor as a socket file.  While libvirtd is off, you can then 'nc -U /var/lib/libvirt/qemu/$dom.monitor' to send QMP commands to to the stranded guest and determine its state:

# service libvirtd stop
Redirecting to /bin/systemctl stop  libvirtd.service
# nc -U /var/lib/libvirt/qemu/f18x86_64.monitor
{"QMP": {"version": {"qemu": {"micro": 1, "minor": 6, "major": 1}, "package": ""}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute":"query-status"}
{"return": {"status": "running", "singlestep": false, "running": true}}
^C

but on both green and purple, the file /var/lib/libvirt/qemu/HostedEngine.monitor is missing, so the nc -U fails.  I can recreate a socket with 'nc -l -U /var/lib/libvirt/qemu/HostedEngine.monitor', but can't seem to make that socket do anything useful.  Without some more ideas, I'm not sure if it's possible to connect to the qemu processes to learn about their state.

Comment 13 Eric Blake 2013-10-28 23:10:06 UTC

I see this in green's logs prior to its first crash:

2013-10-24 00:55:13.468+0000: 8282: debug : virNetworkFree:10622 : network=0x7f3
f70010360
2013-10-24 00:55:18.000+0000: 8285: warning : qemuDomainObjBeginJobInternal:839 
: Cannot start job (modify, none) for domain HostedEngine; current job is (modif
y, none) owned by (8284, 0)
2013-10-24 00:55:18.000+0000: 8285: error : qemuDomainObjBeginJobInternal:843 : 
Timed out during operation: cannot acquire state change lock

I wonder if commit c7ac2519b may be relevant, as that patch was designed to fix a case of incorrect async job handling, although it was more related to migration.  Beyond that, nothing else jumped out at me in the logs as an obvious source of a bug.

Comment 14 Eric Blake 2013-10-28 23:24:02 UTC

(In reply to Eric Blake from comment #13)

> 
> I wonder if commit c7ac2519b may be relevant, as that patch was designed to
> fix a case of incorrect async job handling, although it was more related to
> migration.

See bug 1018267 for more analysis about backporting c7ac2519; which right now is on 7.0 but not 6.5.

Comment 15 Daniel Berrangé 2013-10-29 17:46:32 UTC

There is something really very strange & unexplained going on here. I've been looking at the logs on purple-vds2.

 - There is a QEMU process running with PID 14892
 - This process has threads 4892  14895  14896  3713  3762  4057  4083, of which 14895 and 14896 will be the 2 VCPU threads

 - In /var/log/libvirt/libvirtd.log.13.xz we find the monitor command related to these vCPUs

  2013-10-24 00:18:15.013+0000: 2353: debug : qemuMonitorIOProcess:354 : QEMU_MONITOR_IO_PROCESS: mon=0x7ff6b80a0040 buf={"return": [{"current": true, "CPU": 0, "pc": 4294967280, "halted": false, "thread_id": 14895}, {"current": false, "CPU": 1, "pc": 4294967280, "halted": true, "thread_id": 14896}], "id": "libvirt-5"}

 - Looking back in time for all log messages related to  mon=0x7ff6b80a0040 we can see where the monitor was first opened

  2013-10-24 00:17:55.141+0000: 2387: debug : qemuProcessWaitForMonitor:1768 : Connect monitor to 0x7ff6a8000920 'HostedEngine'
  2013-10-24 00:17:55.149+0000: 2387: debug : qemuMonitorOpenInternal:765 : QEMU_MONITOR_NEW: mon=0x7ff6b80a0040 refs=2 fd=30


 - Looking back further in thread 2387 we can see where the create attempt started

  2013-10-24 00:17:31.857+0000: 2387: debug : virDomainCreateXML:1937 : conn=0x7ff6b4001160, xmlDesc=<?xml version="1.0" encoding="utf-8"?><domain type="kvm">

 - Looking forward we can see where startup completed successfully, with all leases acquired

  2013-10-24 00:18:15.189+0000: 2387: debug : qemuProcessStart:4072 : Starting domain CPUs
  2013-10-24 00:18:15.189+0000: 2387: debug : qemuProcessStartCPUs:2865 : Using lock state '(null)'
  2013-10-24 00:18:19.330+0000: 2387: debug : qemuMonitorStartCPUs:1197 : mon=0x7ff6b80a0040
  2013-10-24 00:18:19.332+0000: 2387: debug : qemuMonitorSend:904 : QEMU_MONITOR_SEND_MSG: mon=0x7ff6b80a0040 msg={"execute":"cont","id":"libvirt-8"}
  2013-10-24 00:18:19.401+0000: 2387: debug : qemuProcessStart:4093 : Writing domain status to disk


So to summarize

 - At 00:17:31.857 we started creation of guest "HostedEngine"
 - At 00:18:19.401 we completed creation of guest and its CPUs are running


Now here is where things get wierd... in between those two times, we see 3 other attempts to start the same guest


 - 2013-10-24 00:17:44.199+0000: 2389: debug : virDomainCreateXML:1937 : conn=0x7ff6b4001160, xmlDesc=<?xml version="1.0" encoding="utf-8"?><domain type="kvm">
 - 2013-10-24 00:17:56.256+0000: 2388: debug : virDomainCreateXML:1937 : conn=0x7ff6b4001160, xmlDesc=<?xml version="1.0" encoding="utf-8"?><domain type="kvm">
 - 2013-10-24 00:18:08.350+0000: 2390: debug : virDomainCreateXML:1937 : conn=0x7ff6b4001160, xmlDesc=<?xml version="1.0" encoding="utf-8"?><domain type="kvm">

Looking at the first of these attempts - thread 2389 we can see it goes all the way through startup - as far as trying to start CPUs

 - 2013-10-24 00:17:55.398+0000: 2389: debug : qemuProcessStart:4072 : Starting domain CPUs
 - 2013-10-24 00:17:55.398+0000: 2389: debug : qemuProcessStartCPUs:2865 : Using lock state '(null)'
 - 2013-10-24 00:18:13.448+0000: 2389: error : virLockManagerSanlockAcquire:971 : internal error Failed to acquire lock: error -243

So sanlock is doing a good job here and preventing the attempt to start the same VM twice, and so that thread shuts down the QEMU instance it spawned.

 - 2013-10-24 00:18:13.448+0000: 2389: debug : qemuProcessStop:4244 : Shutting down VM 'HostedEngine' pid=14723 flags=0

So I think we have a pretty clear race condition here. The first thread 2387 which successfully starts QEMU, is racing with one of the other 3 attempts to start QEMU and those other 3 threads never see that there is already a VM called "HostedEngine" with UUID ac687a24-05cd-43fb-9f9d-c53169fa8cd0 that is being started. The cleanup code from these 3 failed attempts are deleting the monitor socket of the QEMU that actually started succesfully, and also causing libvirt to loose all trace of this running VM.

Clearly this is fubar in libvirt somewhere, fortunately sanlock is working as designed and protecting us.

I've not looked at the second machine 'green-vdsc', but I'm betting the same thing is true of libvirt on that host.

Comment 16 Daniel Berrangé 2013-10-29 17:50:48 UTC

BTW, while we are debugging this please do *NOT* reboot or change anything running on 'green-vdsc' and 'purple-vds2' - these are the only machines showing the problem, so their state is critical to diagnosing the problem

Comment 17 Daniel Berrangé 2013-10-31 15:30:51 UTC

Created attachment 817966 [details]
Script to analyse libvirtd.log for race conditions

This script is intended to be run against libvirtd.log files and will look at all domain startup attempts and report on race conditions detected. It is hardcoded to examine the guest "HostedEngine".

Comment 18 Daniel Berrangé 2013-10-31 15:48:58 UTC

Created attachment 817968 [details]
Analysis of libvirtd.log.13.xz on machine purple

This is the output of the race condition analysis on purple - only libvirtd.log.13.xz showed signs of a race.

The problem scanario starts at timestamp 2013-10-24 00:16:30.558+000 when HostedEngine is requested to boot. This is handled by thread 2388

Before this boot attempt is completed, RHEV makes a second request at time 2013-10-24 00:16:42.842+0000, handled by thread 2387. This somehow gets access to the same virDomainObjPtr as thread 2387, and then waits on a lock held by thread 2388

When thread 2388's start attempt fails, because it is a transient guest, it removes the virDomainObjPtr from the list of VMs. Thread 2387 still has a reference to this orphaned domain object instance, and now begins to start it too.

Before this boot attempt completes, RHEV makes a third request to start the same guest at time 2013-10-24 00:16:55.100+0000, handled by thread 2389 and a fourth request handled by 2388. The fourth request is quickly rejected.

The third request, however, starts the bootup process and has a completely separate virDomainObjPtr, since the instance owned by thread 2387 was orphaned from the list of domains.

So now we have 2 threads starting the same named VM in parallel and libvirtd only remembers one of them.

This race sequence occurs several more times. Nearly all the boot attempts on host purple are being aborted as sanlock reports the lock is still held.

At  2013-10-24 00:18:19.332+0000 there is however a successful start attempt, presumably because the sanlock lease became free.  Unfortunately this thread had one of the orphaned virDomainObjPtr instances, so the QEMU is now running, but unknown to libvirtd.

There are at minimum 2 bugs here

 - RHEV is making multiple parallel requests to start the same named VM
 - libvirt is not correctl handling mutual exclusion when concurrent start requests are made.

I believe the flaw is in the virDomainObjIsDuplicate() method. When booting a transient VM, it is permissible that the VM with matching name/uuid pre-exist, provided it is shutoff and persistent.  We are only checking whether it is shutoff though, not persistent. This would be ok, provided we set dom->def->id to indicate it is running, before it is unlocked again.

Unfortunately qemuDomainObjBeginJobWithDriver will unlock & relock virDomainObjPtr, so there is a tiny window where the 2nd request may see a shutoff virDomainObjPtr in the domain list and think that is valid.

When starting transient VMs, virDomainObjIsDuplicate needs to check whether the virDomainObjPtr is persistent or not.

Comment 20 Daniel Berrangé 2013-10-31 18:44:04 UTC

Fix for the race posted upstream

  https://www.redhat.com/archives/libvir-list/2013-October/msg01348.html

This will need re-writing for 0.10.2 trees, since the code has changed. The check will need to go in virDomainObjIsDuplicate instead.

Also 0.10.2 does not exhibit the libvirtd crash scenario, only the orphan VM scenario

Comment 21 Michal Privoznik 2013-11-06 14:17:24 UTC

Moving to POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-November/msg00486.html

Comment 33 yanbing du 2014-04-14 07:46:51 UTC

1. Reproduce this bug with libvirt-0.10.2-29.el6.x86_64
Following the steps in comment 27, create/destroy VM from 3 remote clients, after a while, there'd be orphaned qemu process appearing:
# virsh list --all
 Id    Name                           State
--------------------------------------------------


# ps aux|grep qemu
qemu     24031 25.6  0.0 1283008 20772 ?       Sl   15:29   0:07 /usr/libexec/qemu-kvm -name rhel6 -S -M rhel6.5.0 -enable-kvm -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d78dce96-23e3-b6ad-d40a-34ca8e0f5e0d -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel6.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -netdev tap,fd=28,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:d6:77:5f,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
root     25847  0.0  0.0 103252   852 pts/4    S+   15:29   0:00 grep qemu


2. Verify this bug with libvirt-0.10.2-32.el6.x86_64, no more orphaned qemu process appear.

Comment 35 errata-xmlrpc 2014-10-14 04:18:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1374.html