Bug 1032370
Summary: | Snapshots on GlusterFS w/ libgfapi enabled | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Shanzhi Yu <shyu> |
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.0 | CC: | acathrow, ajia, bili, bsarathy, chhu, dallan, deepakcs, dyuan, eblake, eharney, fdeutsch, howey.vernon, iheim, josh, juzhang, lyarwood, mzhan, ndipanov, pkrempa, rbryant, rcyriac, sbonazzo, shyu, yeylon |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-1.1.1-25.el7 | Doc Type: | Enhancement |
Doc Text: | Story Points: | --- | |
Clone Of: | 1017289 | Environment: | |
Last Closed: | 2014-06-13 12:16:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1022961 |
Comment 2
Peter Krempa
2013-11-22 14:10:00 UTC
Test for patch: qemu: Avoid crash in qemuDiskGetActualType 1. Partly reproduced with libvirt: libvirt-1.0.6-1.el7.x86_64, qemu-kvm-1.5.0-2.el7.x86_64. Met the "error: Unable to read from monitor: Connection reset by peer", but the libvirtd is still running. #virsh define test1.xml Domain test defined from test1.xml #virsh dumpxml test| grep disk -A 5 <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hdc' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='1' target='0' unit='0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/test.img'/> <target dev='hdc' bus='ide'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> #virsh start test error: Failed to start domain test error: Unable to read from monitor: Connection reset by peer #service libvirtd status Redirecting to /bin/systemctl status libvirtd.service libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled) Active: active (running) since Fri 2014-03-21 11:43:02 CST; 2min 51s ago ...... #virsh list --all Id Name State ---------------------------------------------------- - test shut off #service libvirtd stop Redirecting to /bin/systemctl stop libvirtd.service #service libvirtd start Redirecting to /bin/systemctl start libvirtd.service #virsh start test error: Failed to start domain test error: internal error process exited while connecting to monitor: qemu-kvm: -device ide-hd,bus=ide.0,unit=2,drive=drive-ide0-0-2,id=ide0-0-2,bootindex=1: Can't create IDE unit 2, bus supports only 2 units qemu-kvm: -device ide-hd,bus=ide.0,unit=2,drive=drive-ide0-0-2,id=ide0-0-2,bootindex=1: Device initialization failed. qemu-kvm: -device ide-hd,bus=ide.0,unit=2,drive=drive-ide0-0-2,id=ide0-0-2,bootindex=1: Device 'ide-hd' could not be initialized 2. Test on libvirt-1.1.1-28.el7.x86_64, qemu-kvm-1.5.3-53.el7.x86_64: get qemu-kvm error message. #virsh define test1.xml Domain test defined from test1.xml #virsh dumpxml test| grep disk -A 5 <disk type='volume' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hdc' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='1' target='0' unit='0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/test.img'/> <target dev='hdc' bus='ide'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> #virsh list --all Id Name State ---------------------------------------------------- - test shut off #virsh start test error: Failed to start domain test error: internal error: process exited while connecting to monitor: qemu-kvm: -device ide-hd,bus=ide.0,unit=2,drive=drive-ide0-0-2,id=ide0-0-2,bootindex=1: Can't create IDE unit 2, bus supports only 2 units qemu-kvm: -device ide-hd,bus=ide.0,unit=2,drive=drive-ide0-0-2,id=ide0-0-2,bootindex=1: Device initialization failed. qemu-kvm: -device ide-hd,bus=ide.0,unit=2,drive=drive-ide0-0-2,id=ide0-0-2,bootindex=1: Device 'ide-hd' could not be initialized #virsh list --all Id Name State ---------------------------------------------------- - test shut off #service libvirtd status Redirecting to /bin/systemctl status libvirtd.service libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled) Active: active (running) since Fri 2014-03-21 10:30:49 CST; 1h 19min ago Main PID: 14340 (libvirtd) ...... #service libvirtd stop Redirecting to /bin/systemctl stop libvirtd.service #service libvirtd start Redirecting to /bin/systemctl start libvirtd.service #virsh start test error: Failed to start domain test error: internal error: process exited while connecting to monitor: qemu-kvm: -device ide-hd,bus=ide.0,unit=2,drive=drive-ide0-0-2,id=ide0-0-2,bootindex=1: Can't create IDE unit 2, bus supports only 2 units qemu-kvm: -device ide-hd,bus=ide.0,unit=2,drive=drive-ide0-0-2,id=ide0-0-2,bootindex=1: Device initialization failed. qemu-kvm: -device ide-hd,bus=ide.0,unit=2,drive=drive-ide0-0-2,id=ide0-0-2,bootindex=1: Device 'ide-hd' could not be initialized Test results: no libvirtd core dump. Test for patch: qemu: snapshot: Avoid libvirtd crash when qemu crashes while snapshotting May related to this qemu-kvm bug: happend only once Bug 959102 - core dump happens when quitting qemu via monitor Did these attempts below, haven't reproduced the libvirtd crash with packages: libvirt-1.0.4-1.1.el7.x86_64 qemu-kvm-1.4.0-3.el7.x86_64 1.1 start a guest 1.2. In the guest: echo "c" > /proc/sysrq-trigger 1.3. do external snapshot #virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec vda,file=/tmp/snap1 Or kill the qemu-kvm, at the same time run 1.3 2.1 Inside guest: #echo 1 >/proc/sys/kernel/panic_on_io_nmi #echo 1 >/proc/sys/kernel/unknown_nmi_panic #echo 1 >/proc/sys/kernel/panic_on_unrecovered_nmi #echo "c" > /proc/sysrq-trigger 2.2. #virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec vda,file=/tmp/snap1 Or kill the qemu-kvm, at the same time run 2.2 (In reply to chhu from comment #10) > Test for patch: qemu: snapshot: Avoid libvirtd crash when qemu crashes while > snapshotting > > May related to this qemu-kvm bug: happend only once > Bug 959102 - core dump happens when quitting qemu via monitor > > Did these attempts below, haven't reproduced the libvirtd crash > with packages: > libvirt-1.0.4-1.1.el7.x86_64 > qemu-kvm-1.4.0-3.el7.x86_64 Or with packages: libvirt-1.1.1-28.el7.x86_64 qemu-kvm-1.5.3-53.el7.x86_64 > > 1.1 start a guest > 1.2. In the guest: echo "c" > /proc/sysrq-trigger > 1.3. do external snapshot > #virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec > vda,file=/tmp/snap1 > Or kill the qemu-kvm, then run 1.3 > > > 2.1 Inside guest: > #echo 1 >/proc/sys/kernel/panic_on_io_nmi > #echo 1 >/proc/sys/kernel/unknown_nmi_panic > #echo 1 >/proc/sys/kernel/panic_on_unrecovered_nmi > #echo "c" > /proc/sysrq-trigger > 2.2. #virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec > vda,file=/tmp/snap1 > Or kill the qemu-kvm, then run 2.2 Hi, Peter Do you think the test for this patch is enough? Or do you have some advice for the reproduce of libvirtd crash described in this patch ? Thank you! Thanks pkrempa's help. Test for patch: qemu: snapshot: Avoid libvirtd crash when qemu crashes while snapshotting: Test results: Reproduce the libvirt error: "error: Unable to read from monitor: Connection reset by peer", no libvirtd core dump. Reproduced with packages: libvirt-1.0.4-1.1.el7.x86_64 qemu-kvm-1.4.0-3.el7.x86_64 And: libvirt-1.1.1-28.el7.x86_64 qemu-kvm-1.5.3-55.el7.x86_64 Test steps: 1. open a terminal: A, start a guest. # virsh create r7g-qcow2.xml Domain r7g-qcow2 created from r7g-qcow2.xml # virsh dumpxml r7g-qcow2| grep disk -A 7 <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/libvirt/images/r7g-qcow2.img'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> # ps -ef|grep qemu-kvm qemu 5864 1 73 15:53 ? 00:00:45 /usr/libexec/qemu-kvm -name r7g-qcow2 ....... # kill -SIGSTOP 5864 # virsh list --all Id Name State ---------------------------------------------------- 7 r7g-qcow2 running 2. open another terminal: B , do snapshot: # virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec vda,file=/tmp/snap1 3. kill the qemu-kvm on terminal A. # kill -9 5864 4. got the error message on terminal B. # virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec vda,file=/tmp/snap1 error: Unable to read from monitor: Connection reset by peer 5. check the libvirtd status on terminal A, it's still running. # virsh list --all Id Name State ---------------------------------------------------- # service libvirtd status Redirecting to /bin/systemctl status libvirtd.service libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled) Active: active (running) since Fri 2014-03-21 14:26:50 CST; 3 days ago Main PID: 31128 (libvirtd) CGroup: /system.slice/libvirtd.service ├─ 1461 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf └─31128 /usr/sbin/libvirtd Did related test below for other patches: case:320797: [virtual disks] Create a iscsi network disk/lun with chap authentication: PASS case:280660: [storage] - support username/password for iSCSI pools: PASS case:216539: Dir based storage pool: PASS case:216542: Disk based storage pool: PASS case:216558: Logical based storage pool: PASS case:216559: mpath based storage pool: PASS case:216555: iSCSI based storage pool: PASS (In reply to chhu from comment #13) > Thanks pkrempa's help. > > Test for patch: qemu: snapshot: Avoid libvirtd crash when qemu crashes > while snapshotting: > > Test results: > Reproduce the libvirt error: "error: Unable to read from monitor: Connection > reset by peer", no libvirtd core dump. > > Reproduced with packages: > libvirt-1.0.4-1.1.el7.x86_64 > qemu-kvm-1.4.0-3.el7.x86_64 > > And: > libvirt-1.1.1-28.el7.x86_64 > qemu-kvm-1.5.3-55.el7.x86_64 > > Test steps: > 1. open a terminal: A, start a guest. > # virsh create r7g-qcow2.xml > Domain r7g-qcow2 created from r7g-qcow2.xml > > # virsh dumpxml r7g-qcow2| grep disk -A 7 > <disk type='file' device='disk'> > <driver name='qemu' type='qcow2' cache='none'/> > <source file='/var/lib/libvirt/images/r7g-qcow2.img'/> > <target dev='vda' bus='virtio'/> > <alias name='virtio-disk0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x06' > function='0x0'/> > </disk> > > # ps -ef|grep qemu-kvm > qemu 5864 1 73 15:53 ? 00:00:45 /usr/libexec/qemu-kvm -name > r7g-qcow2 ....... > > # kill -SIGSTOP 5864 > > # virsh list --all > Id Name State > ---------------------------------------------------- > 7 r7g-qcow2 running > > 2. open another terminal: B , do snapshot: > # virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec > vda,file=/tmp/snap1 > > 3. kill the qemu-kvm on terminal A. > # kill -9 5864 > > 4. got the error message on terminal B. > # virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec > vda,file=/tmp/snap1 > error: Unable to read from monitor: Connection reset by peer > > 5. check the libvirtd status on terminal A, it's still running. > > # virsh list --all > Id Name State > ---------------------------------------------------- > # service libvirtd status > Redirecting to /bin/systemctl status libvirtd.service > libvirtd.service - Virtualization daemon > Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled) > Active: active (running) since Fri 2014-03-21 14:26:50 CST; 3 days ago > Main PID: 31128 (libvirtd) > CGroup: /system.slice/libvirtd.service > ├─ 1461 /sbin/dnsmasq > --conf-file=/var/lib/libvirt/dnsmasq/default.conf > └─31128 /usr/sbin/libvirtd check with pkrempa: "(In reply to chhu from comment #13) > Thanks pkrempa's help. > > Test for patch: qemu: snapshot: Avoid libvirtd crash when qemu crashes > while snapshotting: > > Test results: > Reproduce the libvirt error: "error: Unable to read from monitor: Connection > reset by peer", no libvirtd core dump. > > Reproduced with packages: > libvirt-1.0.4-1.1.el7.x86_64 > qemu-kvm-1.4.0-3.el7.x86_64 > > And: > libvirt-1.1.1-28.el7.x86_64 > qemu-kvm-1.5.3-55.el7.x86_64 > > Test steps: > 1. open a terminal: A, start a guest. > # virsh create r7g-qcow2.xml > Domain r7g-qcow2 created from r7g-qcow2.xml > > # virsh dumpxml r7g-qcow2| grep disk -A 7 > <disk type='file' device='disk'> > <driver name='qemu' type='qcow2' cache='none'/> > <source file='/var/lib/libvirt/images/r7g-qcow2.img'/> > <target dev='vda' bus='virtio'/> > <alias name='virtio-disk0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x06' > function='0x0'/> > </disk> > > # ps -ef|grep qemu-kvm > qemu 5864 1 73 15:53 ? 00:00:45 /usr/libexec/qemu-kvm -name > r7g-qcow2 ....... > > # kill -SIGSTOP 5864 > > # virsh list --all > Id Name State > ---------------------------------------------------- > 7 r7g-qcow2 running > > 2. open another terminal: B , do snapshot: > # virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec > vda,file=/tmp/snap1 > > 3. kill the qemu-kvm on terminal A. > # kill -9 5864 > > 4. got the error message on terminal B. > # virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec > vda,file=/tmp/snap1 > error: Unable to read from monitor: Connection reset by peer > > 5. check the libvirtd status on terminal A, it's still running. > > # virsh list --all > Id Name State > ---------------------------------------------------- > # service libvirtd status > Redirecting to /bin/systemctl status libvirtd.service > libvirtd.service - Virtualization daemon > Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled) > Active: active (running) since Fri 2014-03-21 14:26:50 CST; 3 days ago > Main PID: 31128 (libvirtd) > CGroup: /system.slice/libvirtd.service > ├─ 1461 /sbin/dnsmasq > --conf-file=/var/lib/libvirt/dnsmasq/default.conf > └─31128 usr/sbin/libvirtd check with pkrempa, "if the VM exits the error message above is correct" Try to include patch to reproduce the libvirtd crash: [libvirt] [PATCHv2 1/2] DO NOT APPLY UPSTREAM: Reproducer for disk snapshot crash http://www.redhat.com/archives/libvir-list/2014-January/msg00904.html Step1. Test with libvirt-1.1.1-20.el7, with the patch above, start libvirtd failed. # service libvirtd start Starting libvirtd (via systemctl): Job for libvirtd.service failed. See 'systemctl status libvirtd.service' and 'journalctl -xn' for details. [FAILED] # journalctl -xn -- Logs begin at Thu 2014-03-20 15:41:33 CST, end at Mon 2014-03-24 19:58:55 CST. -- Mar 24 19:55:56 intel-5310-32-1.englab.nay.redhat.com kvm[29151]: 1 guest now active Mar 24 19:55:56 intel-5310-32-1.englab.nay.redhat.com kernel: SELinux: initialized (dev mqueue, type mqueue), Mar 24 19:55:56 intel-5310-32-1.englab.nay.redhat.com kernel: SELinux: initialized (dev proc, type proc), uses Mar 24 19:55:56 intel-5310-32-1.englab.nay.redhat.com kvm[29153]: 0 guests now active Mar 24 19:55:56 intel-5310-32-1.englab.nay.redhat.com kernel: SELinux: initialized (dev mqueue, type mqueue), Mar 24 19:55:56 intel-5310-32-1.englab.nay.redhat.com kernel: SELinux: initialized (dev proc, type proc), uses Mar 24 19:57:25 intel-5310-32-1.englab.nay.redhat.com systemd[1]: libvirtd.service operation timed out. Termin Mar 24 19:58:55 intel-5310-32-1.englab.nay.redhat.com systemd[1]: libvirtd.service stopping timed out (2). Kil Mar 24 19:58:55 intel-5310-32-1.englab.nay.redhat.com systemd[1]: Failed to start Virtualization daemon. -- Subject: Unit libvirtd.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit libvirtd.service has failed. -- -- The result is failed. Mar 24 19:58:55 intel-5310-32-1.englab.nay.redhat.com systemd[1]: Unit libvirtd.service entered failed state. Step2. Tested with libvirt-1.1.1-28.el7, with the patch above, start libvirtd successfully, and do snapshot successfully. # virsh define r7g-qcow2.xml Domain r7g-qcow2 defined from r7g-qcow2.xml # virsh start r7g-qcow2 Domain r7g-qcow2 started # virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec vda,file=/tmp/snap1 error: Operation not supported: live disk snapshot not supported with this QEMU binary # virsh destroy r7g-qcow2 Domain r7g-qcow2 destroyed # virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec vda,file=/tmp/snap1 Domain snapshot snap1 created Hi, pkrempa How about the test result in step1, do you think the patch is verified ? Step3. Tested with libvirt-1.1.1-24.el7, qemu-kvm-rhev-1.5.3-49.el7.x86_64, with the patch above in comment15, start libvirtd successfully, do snapshot failed, no libvirtd crash. # virsh create r7g-qcow2.xml Domain r7g-qcow2 created from r7g-qcow2.xml # virsh snapshot-create-as r7g-qcow2 snap1 --disk-only --diskspec vda,file=/tmp/snap1 error: internal error: End of file from monitor # virsh list --all Id Name State ---------------------------------------------------- # service libvirtd status libvirtd.service - LSB: daemon for libvirt virtualization API Loaded: loaded (/etc/rc.d/init.d/libvirtd) Active: active (running) since Tue 2014-03-25 10:36:50 CST; 2min 3s ago Process: 13712 ExecStart=/etc/rc.d/init.d/libvirtd start (code=exited, status=0/SUCCESS) Main PID: 13718 (libvirtd) CGroup: /system.slice/libvirtd.service ├─ 3654 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf └─13718 libvirtd --daemon ==summary the test result for patch: qemu: snapshot: Avoid libvirtd crash when qemu crashes while snapshotting: 1. test on libvirt-1.1.1-24.el7, libvirt-1.1.1-20.el7, with another patch: [libvirt] [PATCHv2 1/2] DO NOT APPLY UPSTREAM: Reproducer for disk snapshot crash ==failed to start libvirtd or do snapshot failed. 2. test on libvirt-1.1.1-28.el7, with another patch: [libvirt] [PATCHv2 1/2] DO NOT APPLY UPSTREAM: Reproducer for disk snapshot crash ==start libvirtd successfully, do snapshot successfully, no libvirtd crash. I think the patch with current version work well. Did related testcase for this bug: Case: 353064: [storage] - gluster based storage pool -bug1072141 Case: 353090: [storage] - volume operations in gluster pool --bug1075299 Case: 354292: [storage] - netfs based storage pool with source format type=glusterfs --BZ1072714 Case: 354297: [storage] - volume operations in netfs pool with source format type=glusterfs --BZ1072653 Case: 353257: [Virtual disks] dompmsuspend --target disk/mem with glusterfs volume with disk type=network Case: 353258: [Virtual disks] dompmsuspend --target disk/mem with glusterfs volume with disk type=network Case: 354453: [Storage] gluster pool with symlink volume point to a raw/qcow2 image Case: 318495: [Virtual disks]Define/start/destroy/save/restore/internal snapshot a domain with glusterfs volume with disk type=network --BZ 1032370 Case: 318945: [Snapshot] external snapshot with glusterfs volume with disk type=network --BZ1032370, 1031943 Case: 318497: [Vitual disks] Attach/Detach an gluster volumn disk with type=network to/from guest --BZ1045107 Hi, pkrempa Please let me know if I need to do more test beside comment 6,7,8,9,11,14,15,16,17, if you agree, I'll set the bug status to "verified". Thank you! (In reply to chhu from comment #18) > Hi, pkrempa > > Please let me know if I need to do more test beside comment > 6,7,8,9,11,14,15,16,17, if you agree, I'll set the bug status to "verified". > Thank you! Given the complexity of the code required for this feature to work I think that the testing you've done has pretty good coverage (in breadth) of the positive and negative code paths a user might take while trying to use the feature. I don't have anything else I could recommend testing. (In reply to Peter Krempa from comment #19) > (In reply to chhu from comment #18) > > Hi, pkrempa > > > > Please let me know if I need to do more test beside comment > > 6,7,8,9,11,14,15,16,17, if you agree, I'll set the bug status to "verified". > > Thank you! > > Given the complexity of the code required for this feature to work I think > that the testing you've done has pretty good coverage (in breadth) of the > positive and negative code paths a user might take while trying to use the > feature. > > I don't have anything else I could recommend testing. Thank you! Change the status to verified. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. Moving 1017289 (closed, wontfix) from depend to see also, keeping it for reference but not having it in the dependency tree blocking other bugs. |