Bug 1134154
Summary: | snapshot's race condition | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jincheng Miao <jmiao> | |
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 7.1 | CC: | dyuan, eblake, jdenemar, mzhan, pkrempa, rbalakri, shyu, yanyang | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-1.2.8-2.el7 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1143900 (view as bug list) | Environment: | ||
Last Closed: | 2015-03-05 07:43:10 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1079162 | |||
Bug Blocks: | 1143900 |
Description
Jincheng Miao
2014-08-27 01:52:02 UTC
Looks like this might fix it https://www.redhat.com/archives/libvir-list/2014-August/msg01293.html Fixed upstream: commit 03b994fac0434c08cca13f51c3980c1b1d218047 Author: Jincheng Miao <jmiao> Date: Wed Aug 27 10:10:33 2014 +0800 qemu: call endjob in RevertToSnapshot In qemuDomainRevertToSnapshot(), it will check snap->def->state. But when the state is PMSUSPENDED/NOSTATE/BLOCKED, it forgets to call qemuDomainObjEndJob. https://bugzilla.redhat.com/show_bug.cgi?id=1134154 Bug introduced in commit 1e833899. $ git describe 03b994fac0434c08cca13f51c3980c1b1d218047 v1.2.7-241-g03b994f That was only a partial fix; according to the mail thread, there is a missing lock where concurrent snapshot operations can cause data corruption, which still needs further patching. Hi Eric, I met libvirtd crash when test with latest libvirt.git ## git describe v1.2.7-241-g03b994f Step is as commet 0 Thread 5: ==9932== Invalid read of size 8 ==9932== at 0x210191A5: qemuDomainRevertToSnapshot (qemu_driver.c:14094) ==9932== by 0x5195370: virDomainRevertToSnapshot (libvirt.c:19324) ==9932== by 0x133CEC: remoteDispatchDomainRevertToSnapshotHelper (remote_dispatch.h:7121) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== by 0x50DA4AD: virThreadHelper (virthread.c:197) ==9932== by 0x831DF32: start_thread (pthread_create.c:309) ==9932== by 0x8A2FDEC: clone (clone.S:111) ==9932== Address 0x24075130 is 0 bytes inside a block of size 40 free'd ==9932== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==9932== by 0x5083B89: virFree (viralloc.c:582) ==9932== by 0x511A1FD: virDomainSnapshotObjListDataFree (snapshot_conf.c:753) ==9932== by 0x50A86E7: virHashRemoveEntry (virhash.c:475) ==9932== by 0x20FB7CD9: qemuDomainSnapshotDiscard (qemu_domain.c:2096) ==9932== by 0x21018D29: qemuDomainSnapshotDelete (qemu_driver.c:14474) ==9932== by 0x51955A0: virDomainSnapshotDelete (libvirt.c:19386) ==9932== by 0x133B3C: remoteDispatchDomainSnapshotDeleteHelper (remote_dispatch.h:8700) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== by 0x50DA4AD: virThreadHelper (virthread.c:197) ==9932== ==9932== Invalid read of size 4 ==9932== at 0x210191A8: qemuDomainRevertToSnapshot (qemu_driver.c:14094) ==9932== by 0x5195370: virDomainRevertToSnapshot (libvirt.c:19324) ==9932== by 0x133CEC: remoteDispatchDomainRevertToSnapshotHelper (remote_dispatch.h:7121) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== by 0x50DA4AD: virThreadHelper (virthread.c:197) ==9932== by 0x831DF32: start_thread (pthread_create.c:309) ==9932== by 0x8A2FDEC: clone (clone.S:111) ==9932== Address 0x23fe3e60 is 32 bytes inside a block of size 80 free'd ==9932== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==9932== by 0x5083B89: virFree (viralloc.c:582) ==9932== by 0x511A17C: virDomainSnapshotDefFree (snapshot_conf.c:105) ==9932== by 0x511A1F3: virDomainSnapshotObjListDataFree (snapshot_conf.c:752) ==9932== by 0x50A86E7: virHashRemoveEntry (virhash.c:475) ==9932== by 0x20FB7CD9: qemuDomainSnapshotDiscard (qemu_domain.c:2096) ==9932== by 0x21018D29: qemuDomainSnapshotDelete (qemu_driver.c:14474) ==9932== by 0x51955A0: virDomainSnapshotDelete (libvirt.c:19386) ==9932== by 0x133B3C: remoteDispatchDomainSnapshotDeleteHelper (remote_dispatch.h:8700) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== ==9932== Invalid read of size 8 ==9932== at 0x2101936F: qemuDomainRevertToSnapshot (qemu_driver.c:14152) ==9932== by 0x5195370: virDomainRevertToSnapshot (libvirt.c:19324) ==9932== by 0x133CEC: remoteDispatchDomainRevertToSnapshotHelper (remote_dispatch.h:7121) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== by 0x50DA4AD: virThreadHelper (virthread.c:197) ==9932== by 0x831DF32: start_thread (pthread_create.c:309) ==9932== by 0x8A2FDEC: clone (clone.S:111) ==9932== Address 0x24075130 is 0 bytes inside a block of size 40 free'd ==9932== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==9932== by 0x5083B89: virFree (viralloc.c:582) ==9932== by 0x511A1FD: virDomainSnapshotObjListDataFree (snapshot_conf.c:753) ==9932== by 0x50A86E7: virHashRemoveEntry (virhash.c:475) ==9932== by 0x20FB7CD9: qemuDomainSnapshotDiscard (qemu_domain.c:2096) ==9932== by 0x21018D29: qemuDomainSnapshotDelete (qemu_driver.c:14474) ==9932== by 0x51955A0: virDomainSnapshotDelete (libvirt.c:19386) ==9932== by 0x133B3C: remoteDispatchDomainSnapshotDeleteHelper (remote_dispatch.h:8700) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== by 0x50DA4AD: virThreadHelper (virthread.c:197) ==9932== ==9932== Invalid read of size 8 ==9932== at 0x21019379: qemuDomainRevertToSnapshot (qemu_driver.c:14152) ==9932== by 0x5195370: virDomainRevertToSnapshot (libvirt.c:19324) ==9932== by 0x133CEC: remoteDispatchDomainRevertToSnapshotHelper (remote_dispatch.h:7121) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== by 0x50DA4AD: virThreadHelper (virthread.c:197) ==9932== by 0x831DF32: start_thread (pthread_create.c:309) ==9932== by 0x8A2FDEC: clone (clone.S:111) ==9932== Address 0x23fe3e40 is 0 bytes inside a block of size 80 free'd ==9932== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==9932== by 0x5083B89: virFree (viralloc.c:582) ==9932== by 0x511A17C: virDomainSnapshotDefFree (snapshot_conf.c:105) ==9932== by 0x511A1F3: virDomainSnapshotObjListDataFree (snapshot_conf.c:752) ==9932== by 0x50A86E7: virHashRemoveEntry (virhash.c:475) ==9932== by 0x20FB7CD9: qemuDomainSnapshotDiscard (qemu_domain.c:2096) ==9932== by 0x21018D29: qemuDomainSnapshotDelete (qemu_driver.c:14474) ==9932== by 0x51955A0: virDomainSnapshotDelete (libvirt.c:19386) ==9932== by 0x133B3C: remoteDispatchDomainSnapshotDeleteHelper (remote_dispatch.h:8700) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== ==9932== Invalid read of size 1 ==9932== at 0x20FDE9D8: qemuMonitorEscapeArg (qemu_monitor.c:156) ==9932== by 0x20FEDE4E: qemuMonitorTextLoadSnapshot (qemu_monitor_text.c:2761) ==9932== by 0x21019380: qemuDomainRevertToSnapshot (qemu_driver.c:14152) ==9932== by 0x5195370: virDomainRevertToSnapshot (libvirt.c:19324) ==9932== by 0x133CEC: remoteDispatchDomainRevertToSnapshotHelper (remote_dispatch.h:7121) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== by 0x50DA4AD: virThreadHelper (virthread.c:197) ==9932== by 0x831DF32: start_thread (pthread_create.c:309) ==9932== by 0x8A2FDEC: clone (clone.S:111) ==9932== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==9932== ==9932== ==9932== Process terminating with default action of signal 11 (SIGSEGV) ==9932== Access not within mapped region at address 0x0 ==9932== at 0x20FDE9D8: qemuMonitorEscapeArg (qemu_monitor.c:156) ==9932== by 0x20FEDE4E: qemuMonitorTextLoadSnapshot (qemu_monitor_text.c:2761) ==9932== by 0x21019380: qemuDomainRevertToSnapshot (qemu_driver.c:14152) ==9932== by 0x5195370: virDomainRevertToSnapshot (libvirt.c:19324) ==9932== by 0x133CEC: remoteDispatchDomainRevertToSnapshotHelper (remote_dispatch.h:7121) ==9932== by 0x51DE391: virNetServerProgramDispatch (virnetserverprogram.c:437) ==9932== by 0x15151C: virNetServerHandleJob (virnetserver.c:172) ==9932== by 0x50DAB14: virThreadPoolWorker (virthreadpool.c:145) ==9932== by 0x50DA4AD: virThreadHelper (virthread.c:197) ==9932== by 0x831DF32: start_thread (pthread_create.c:309) ==9932== by 0x8A2FDEC: clone (clone.S:111) ==9932== If you believe this happened as a result of a stack ==9932== overflow in your program's main thread (unlikely but ==9932== possible), you can try to increase the size of the ==9932== main thread stack using the --main-stacksize= flag. ==9932== The main thread stack size used in this run was 8388608. ==9932== ==9932== HEAP SUMMARY: ==9932== in use at exit: 3,205,933 bytes in 24,542 blocks ==9932== total heap usage: 189,650 allocs, 165,108 frees, 691,331,922 bytes allocated ==9932== Fixed upstream by commits: commit a4065dc3e7b17a5fa6916fca2fb28a0d723fe13d Author: Jincheng Miao <jmiao> Date: Thu Aug 28 19:27:04 2014 +0800 qemu: snapshot: Acquire job earlier on snapshot revert/delete The code would lookup the snapshot object before acquiring the job. This could lead to a crash as one thread could delete the snapshot object, while a second thread already had the reference. Signed-off-by: Jincheng Miao <jmiao> Signed-off-by: Peter Krempa <pkrempa> commit b3d2a42e80aaee1f322fc9beb98b6ed541574ab3 Author: Peter Krempa <pkrempa> Date: Wed Sep 3 15:17:50 2014 +0200 qemu: snapshot: Fix job handling when creating snapshots Creating snapshots modifies the domain state. Currently we wouldn't enter the job for certain operations although they would modify the state. Refactor job handling so that everything is covered by an async job. v1.2.8-49-ga4065dc Try to verify this bug with libvirt-1.2.8-2.el7 Steps: 1) Prepare a running guest with OS installed 2) Create internal snapshot for guest 3) Open three terminals, run three cmd at almost same time terminal 1 #while true;do virsh snapshot-revert rh7-g s;done terminal 2 #while true;do virsh snapshot-delete rh7-g s;done terminal 3 # while true;do virsh snapshot-create-as rh7-g s;done Result: 1) Libvirtd will not crash 2) No error like "Invalid target domain state 'nostate'" appear, instead of error "error: Timed out during operation: cannot acquire state change lock" 3) grep error /tmp/libvirtd.log 2014-09-18 05:47:57.000+0000: 4585: error : qemuDomainObjBeginJobInternal:1327 : Timed out during operation: cannot acquire state change lock 4) After long time step 3), finish the three cmds and try to create another snapshot will succeed I have a problem, try to do "virsh save rh7-g /tmp/rh7-g.save" when step 3) is in process will fail with same error "can't acquire state change lock". Moreover, when there is a snapshot-create job, destroy guest will take long time # time virsh destroy rh7-g Domain rh7-g destroyed real 0m10.288s user 0m0.012s sys 0m0.007s Is this expected ? or a little over-protected? If it is not a problem, I will change this bug to VERIFIED status. Thanks Besides, Should I clone this one to RHEL6.6 ? Since revert/delete/create same snapshot at same time will easily crash libvirtd. (In reply to Shanzhi Yu from comment #9) > I have a problem, try to do "virsh save rh7-g /tmp/rh7-g.save" when step 3) > is in process will fail with same error "can't acquire state change lock". > Moreover, when there is a snapshot-create job, destroy guest will take long > time > > # time virsh destroy rh7-g > Domain rh7-g destroyed > > real 0m10.288s > user 0m0.012s > sys 0m0.007s > > Is this expected ? or a little over-protected? If it is not a problem, I > will change this bug to VERIFIED status. Internal snapshot apparently makes qemu get stuck. We re-try killing (sometimes with more force) until it finishes. If the destroy was successful after a while it should be okay. (In reply to Shanzhi Yu from comment #10) > Besides, Should I clone this one to RHEL6.6 ? > Since revert/delete/create same snapshot at same time will easily crash > libvirtd. It's too late for 6.6. 6.7 might be better if you are going to clone it. Change to VERIFIED status according comment 10, and clone new onw to RHEL6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html |