Bug 1022924
Summary: | hosted engine | two qemu processes with the same VM ID runs on two different machines. Also the sanlock resource taken on two machines on the same storage. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Leonid Natapov <lnatapov> | ||||||||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | 6.5 | CC: | bazulay, berrange, dfediuck, dyuan, eblake, fsimonce, gpadgett, iheim, jdenemar, lcui, lnatapov, lpeer, mjenner, msivak, mzhan, rbalakri, scohen, ydu, yeylon, zhwang | ||||||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||||||
Target Release: | 6.5 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | storage | ||||||||||||
Fixed In Version: | libvirt-0.10.2-30.el6 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2014-10-14 04:18:02 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1029629, 1029632 | ||||||||||||
Attachments: |
|
Description
Leonid Natapov
2013-10-24 09:32:06 UTC
Created attachment 815713 [details]
green
Created attachment 815714 [details]
purple
Leonid, can you please verify the host id in the configuration and meta data file? On green-vdsc, at Oct24 03:55, qemu was started. libvirt reported an error to acquire the lock Thread-258284::ERROR::2013-10-24 03:55:30,277::vm::2132::vm.Vm:_startUnderlyingVm) vmId=`ac687a24-05cd-43fb-9f9d-c53169fa8cd0`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 2092, in _startUnderlyingVm self._run() File "/usr/share/vdsm/vm.py", line 2959, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error Failed to acquire lock: error -243 but libvirt then wrote to its own log 2013-10-24 00:55:30.275+0000: 8283: debug : qemuMonitorSetCapabilities:1171 : mon=0x7f3f6c185ed0 Caught Segmentation violation dumping internal log buffer: ====== start of log ===== ^@772 : Removing cgroup /cgroup/cpuacct/libvirt/qemu/HostedEngine/ and all child cgroups and leaked the recently-started qemu process. There's probably a bug in sanlock or on libvirt's usage thereof. Need to check correlation with: https://bugzilla.redhat.com/show_bug.cgi?id=1019229 Fede,you can connect to green-vdsc.qa.lab and purple-vds2.qa.lab I didn't touch them since we discovered the problem. This seems entirely a libvirt issue. As described in comment 4 on green-vdsc libvirt crashed, but also on purple-vds2 the vm got lost somehow: # ps aux | grep qemu qemu 14892 4.3 42.4 6652168 4296664 ? Sl Oct24 275:24 /usr/libexec/qemu-kvm -name HostedEngine -S -M rhel6.5.0 ... # stat -c %z /proc/14892 2013-10-24 03:17:54.878902958 +0300 from libvirt log: 2013-10-24 00:17:54.875+0000: 2387: debug : virCommandRunAsync:2229 : About to run LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name HostedEngine -S -M rhel6.5.0 ... ... 2013-10-24 00:17:55.149+0000: 2387: debug : qemuMonitorOpenInternal:765 : QEMU_MONITOR_NEW: mon=0x7ff6b80a0040 refs=2 fd=30 ... 2013-10-24 00:18:19.333+0000: 2353: debug : qemuMonitorEmitResume:1028 : mon=0x7ff6b80a0040 But now running: # virsh -r list Id Name State ---------------------------------------------------- the vm is not reported. Libvirt was never restarted (or crashed) since the vm creation. Last restart was: 2013-10-23 12:06:23.139+0000: 2353: info : libvirt version: 0.10.2, package: 29.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2013-10-09-06:25:35, x86-026.build.eng.bos.redhat.com) And the process is still the same: # ps aux | grep libvirt root 2353 0.1 0.3 935904 33984 ? SLsl Oct23 7:31 /usr/sbin/libvirtd --listen # stat -c %z /proc/2353 2013-10-23 15:06:22.067687916 +0300 I haven't been able to identify (from the logs) exactly when the vm stopped to be reported. At the moment the status is: 1. on green-vdsc libvirt crashed (apparently during the vm creation), the lease was not acquired, it seems to me that the vm was never unpaused 2. on purple-vds2 the lease was acquired and the vm was unpaused but then libvirt (never restarted or crashed) lost track of the vm At the moment trying to start the vm on any host fails because the volume lease is correctly acquired by the vm running on purple-vds2 (but not reported by libvirt). In my opinion there was no disk corruption as the vm on green-vdsc failed to acquire the lease and it wasn't unpaused but I suppose we'll find out about this only after we finished debugging. What worried me the most was that the volume resource was reported on both hosts (according to the bug description): [root@purple-vds2 ~]# sanlock status daemon fc5aa354-0e24-4007-a378-cf1b04820c4b.purple-vds ... r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:241 p 14892 [root@green-vdsc ~]# sanlock status daemon 2fef2dd6-a966-46e8-9654-54887b39a449.green-vdsc ... r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:0 p 15942 analyzing this closely I'm almost certain that it's a reporting issue. In fact on purple-vds2 (where the vm took the lease and was unpaused) the lease was successfully acquired: 2013-10-24 03:18:15+0300 43963 [4203]: s4:r13 resource ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_hosted__engine1/ef4a31eb-6 88f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0 for 4,20,14892 2013-10-24 03:18:19+0300 43967 [2329]: s4:r14 resource ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0 for 2,17,15173 2013-10-24 03:18:19+0300 43967 [2329]: r14 acquire_token resource exists and it is still present now in the status: [root@purple-vds2 ~]# sanlock status daemon fc5aa354-0e24-4007-a378-cf1b04820c4b.purple-vds ... r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:241 p 14892 but on green-vdsc disappeared as in fact "sanlock status" reports also the on-fly requests. In particular the one in the description was on-fly when the "sanlock status" was issued: r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:0 p 15942 but the request was refused (see 15942): 2013-10-24 12:09:13+0300 329125 [5698]: s5:r2865 resource ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0 for 4,20,15942 2013-10-24 12:09:20+0300 329132 [5698]: r2865 paxos_acquire owner 2 delta 2 28 75828 alive 2013-10-24 12:09:20+0300 329132 [5698]: r2865 cmd_acquire 4,20,15942 acquire_token -243 2013-10-24 09:09:20.303+0000: 29535: error : virLockManagerSanlockAcquire:971 : internal error Failed to acquire lock: error -243 2013-10-24 09:09:20.303+0000: 29535: debug : qemuProcessStop:4244 : Shutting down VM 'HostedEngine' pid=15942 flags=0 Moreover I've also been able to reproduce the reporting issue on my hosts. danpb posted two race fixes where libvirt could lose track of a running guest (I'm not sure if these have any relation to the problems here, but the fact that this bug describes a running VM that libvirt lost track of means it is worth mentioning): https://www.redhat.com/archives/libvir-list/2013-October/msg01136.html https://www.redhat.com/archives/libvir-list/2013-October/msg01113.html I've logged in to both machines, and verified that on both purple and green, a qemu process for the VM named HostedEngine is still running, and that in both cases, libvirtd has lost track of that child process. Which host was it running on first? I'm also attempting to manually determine what state each of the two qemu processes are in without killing any running processes yet... I'm stumped on how to determine what state the qemu processes on both machines are still at. Normally, when libvirtd dies but leaves a qemu process running, it leaves /var/lib/libvirt/qemu/$dom.monitor as a socket file. While libvirtd is off, you can then 'nc -U /var/lib/libvirt/qemu/$dom.monitor' to send QMP commands to to the stranded guest and determine its state: # service libvirtd stop Redirecting to /bin/systemctl stop libvirtd.service # nc -U /var/lib/libvirt/qemu/f18x86_64.monitor {"QMP": {"version": {"qemu": {"micro": 1, "minor": 6, "major": 1}, "package": ""}, "capabilities": []}} {"execute":"qmp_capabilities"} {"return": {}} {"execute":"query-status"} {"return": {"status": "running", "singlestep": false, "running": true}} ^C but on both green and purple, the file /var/lib/libvirt/qemu/HostedEngine.monitor is missing, so the nc -U fails. I can recreate a socket with 'nc -l -U /var/lib/libvirt/qemu/HostedEngine.monitor', but can't seem to make that socket do anything useful. Without some more ideas, I'm not sure if it's possible to connect to the qemu processes to learn about their state. I see this in green's logs prior to its first crash: 2013-10-24 00:55:13.468+0000: 8282: debug : virNetworkFree:10622 : network=0x7f3 f70010360 2013-10-24 00:55:18.000+0000: 8285: warning : qemuDomainObjBeginJobInternal:839 : Cannot start job (modify, none) for domain HostedEngine; current job is (modif y, none) owned by (8284, 0) 2013-10-24 00:55:18.000+0000: 8285: error : qemuDomainObjBeginJobInternal:843 : Timed out during operation: cannot acquire state change lock I wonder if commit c7ac2519b may be relevant, as that patch was designed to fix a case of incorrect async job handling, although it was more related to migration. Beyond that, nothing else jumped out at me in the logs as an obvious source of a bug. (In reply to Eric Blake from comment #13) > > I wonder if commit c7ac2519b may be relevant, as that patch was designed to > fix a case of incorrect async job handling, although it was more related to > migration. See bug 1018267 for more analysis about backporting c7ac2519; which right now is on 7.0 but not 6.5. There is something really very strange & unexplained going on here. I've been looking at the logs on purple-vds2. - There is a QEMU process running with PID 14892 - This process has threads 4892 14895 14896 3713 3762 4057 4083, of which 14895 and 14896 will be the 2 VCPU threads - In /var/log/libvirt/libvirtd.log.13.xz we find the monitor command related to these vCPUs 2013-10-24 00:18:15.013+0000: 2353: debug : qemuMonitorIOProcess:354 : QEMU_MONITOR_IO_PROCESS: mon=0x7ff6b80a0040 buf={"return": [{"current": true, "CPU": 0, "pc": 4294967280, "halted": false, "thread_id": 14895}, {"current": false, "CPU": 1, "pc": 4294967280, "halted": true, "thread_id": 14896}], "id": "libvirt-5"} - Looking back in time for all log messages related to mon=0x7ff6b80a0040 we can see where the monitor was first opened 2013-10-24 00:17:55.141+0000: 2387: debug : qemuProcessWaitForMonitor:1768 : Connect monitor to 0x7ff6a8000920 'HostedEngine' 2013-10-24 00:17:55.149+0000: 2387: debug : qemuMonitorOpenInternal:765 : QEMU_MONITOR_NEW: mon=0x7ff6b80a0040 refs=2 fd=30 - Looking back further in thread 2387 we can see where the create attempt started 2013-10-24 00:17:31.857+0000: 2387: debug : virDomainCreateXML:1937 : conn=0x7ff6b4001160, xmlDesc=<?xml version="1.0" encoding="utf-8"?><domain type="kvm"> - Looking forward we can see where startup completed successfully, with all leases acquired 2013-10-24 00:18:15.189+0000: 2387: debug : qemuProcessStart:4072 : Starting domain CPUs 2013-10-24 00:18:15.189+0000: 2387: debug : qemuProcessStartCPUs:2865 : Using lock state '(null)' 2013-10-24 00:18:19.330+0000: 2387: debug : qemuMonitorStartCPUs:1197 : mon=0x7ff6b80a0040 2013-10-24 00:18:19.332+0000: 2387: debug : qemuMonitorSend:904 : QEMU_MONITOR_SEND_MSG: mon=0x7ff6b80a0040 msg={"execute":"cont","id":"libvirt-8"} 2013-10-24 00:18:19.401+0000: 2387: debug : qemuProcessStart:4093 : Writing domain status to disk So to summarize - At 00:17:31.857 we started creation of guest "HostedEngine" - At 00:18:19.401 we completed creation of guest and its CPUs are running Now here is where things get wierd... in between those two times, we see 3 other attempts to start the same guest - 2013-10-24 00:17:44.199+0000: 2389: debug : virDomainCreateXML:1937 : conn=0x7ff6b4001160, xmlDesc=<?xml version="1.0" encoding="utf-8"?><domain type="kvm"> - 2013-10-24 00:17:56.256+0000: 2388: debug : virDomainCreateXML:1937 : conn=0x7ff6b4001160, xmlDesc=<?xml version="1.0" encoding="utf-8"?><domain type="kvm"> - 2013-10-24 00:18:08.350+0000: 2390: debug : virDomainCreateXML:1937 : conn=0x7ff6b4001160, xmlDesc=<?xml version="1.0" encoding="utf-8"?><domain type="kvm"> Looking at the first of these attempts - thread 2389 we can see it goes all the way through startup - as far as trying to start CPUs - 2013-10-24 00:17:55.398+0000: 2389: debug : qemuProcessStart:4072 : Starting domain CPUs - 2013-10-24 00:17:55.398+0000: 2389: debug : qemuProcessStartCPUs:2865 : Using lock state '(null)' - 2013-10-24 00:18:13.448+0000: 2389: error : virLockManagerSanlockAcquire:971 : internal error Failed to acquire lock: error -243 So sanlock is doing a good job here and preventing the attempt to start the same VM twice, and so that thread shuts down the QEMU instance it spawned. - 2013-10-24 00:18:13.448+0000: 2389: debug : qemuProcessStop:4244 : Shutting down VM 'HostedEngine' pid=14723 flags=0 So I think we have a pretty clear race condition here. The first thread 2387 which successfully starts QEMU, is racing with one of the other 3 attempts to start QEMU and those other 3 threads never see that there is already a VM called "HostedEngine" with UUID ac687a24-05cd-43fb-9f9d-c53169fa8cd0 that is being started. The cleanup code from these 3 failed attempts are deleting the monitor socket of the QEMU that actually started succesfully, and also causing libvirt to loose all trace of this running VM. Clearly this is fubar in libvirt somewhere, fortunately sanlock is working as designed and protecting us. I've not looked at the second machine 'green-vdsc', but I'm betting the same thing is true of libvirt on that host. BTW, while we are debugging this please do *NOT* reboot or change anything running on 'green-vdsc' and 'purple-vds2' - these are the only machines showing the problem, so their state is critical to diagnosing the problem Created attachment 817966 [details]
Script to analyse libvirtd.log for race conditions
This script is intended to be run against libvirtd.log files and will look at all domain startup attempts and report on race conditions detected. It is hardcoded to examine the guest "HostedEngine".
Created attachment 817968 [details]
Analysis of libvirtd.log.13.xz on machine purple
This is the output of the race condition analysis on purple - only libvirtd.log.13.xz showed signs of a race.
The problem scanario starts at timestamp 2013-10-24 00:16:30.558+000 when HostedEngine is requested to boot. This is handled by thread 2388
Before this boot attempt is completed, RHEV makes a second request at time 2013-10-24 00:16:42.842+0000, handled by thread 2387. This somehow gets access to the same virDomainObjPtr as thread 2387, and then waits on a lock held by thread 2388
When thread 2388's start attempt fails, because it is a transient guest, it removes the virDomainObjPtr from the list of VMs. Thread 2387 still has a reference to this orphaned domain object instance, and now begins to start it too.
Before this boot attempt completes, RHEV makes a third request to start the same guest at time 2013-10-24 00:16:55.100+0000, handled by thread 2389 and a fourth request handled by 2388. The fourth request is quickly rejected.
The third request, however, starts the bootup process and has a completely separate virDomainObjPtr, since the instance owned by thread 2387 was orphaned from the list of domains.
So now we have 2 threads starting the same named VM in parallel and libvirtd only remembers one of them.
This race sequence occurs several more times. Nearly all the boot attempts on host purple are being aborted as sanlock reports the lock is still held.
At 2013-10-24 00:18:19.332+0000 there is however a successful start attempt, presumably because the sanlock lease became free. Unfortunately this thread had one of the orphaned virDomainObjPtr instances, so the QEMU is now running, but unknown to libvirtd.
There are at minimum 2 bugs here
- RHEV is making multiple parallel requests to start the same named VM
- libvirt is not correctl handling mutual exclusion when concurrent start requests are made.
I believe the flaw is in the virDomainObjIsDuplicate() method. When booting a transient VM, it is permissible that the VM with matching name/uuid pre-exist, provided it is shutoff and persistent. We are only checking whether it is shutoff though, not persistent. This would be ok, provided we set dom->def->id to indicate it is running, before it is unlocked again.
Unfortunately qemuDomainObjBeginJobWithDriver will unlock & relock virDomainObjPtr, so there is a tiny window where the 2nd request may see a shutoff virDomainObjPtr in the domain list and think that is valid.
When starting transient VMs, virDomainObjIsDuplicate needs to check whether the virDomainObjPtr is persistent or not.
Fix for the race posted upstream https://www.redhat.com/archives/libvir-list/2013-October/msg01348.html This will need re-writing for 0.10.2 trees, since the code has changed. The check will need to go in virDomainObjIsDuplicate instead. Also 0.10.2 does not exhibit the libvirtd crash scenario, only the orphan VM scenario Moving to POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-November/msg00486.html 1. Reproduce this bug with libvirt-0.10.2-29.el6.x86_64 Following the steps in comment 27, create/destroy VM from 3 remote clients, after a while, there'd be orphaned qemu process appearing: # virsh list --all Id Name State -------------------------------------------------- # ps aux|grep qemu qemu 24031 25.6 0.0 1283008 20772 ? Sl 15:29 0:07 /usr/libexec/qemu-kvm -name rhel6 -S -M rhel6.5.0 -enable-kvm -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d78dce96-23e3-b6ad-d40a-34ca8e0f5e0d -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel6.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -netdev tap,fd=28,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:d6:77:5f,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 root 25847 0.0 0.0 103252 852 pts/4 S+ 15:29 0:00 grep qemu 2. Verify this bug with libvirt-0.10.2-32.el6.x86_64, no more orphaned qemu process appear. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1374.html |