Bug 1294941
| Summary: | QEMU crash on snapshot revert when using Cirrus | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Yang Yang <yanyang> | ||||||||
| Component: | qemu-kvm | Assignee: | Marc-Andre Lureau <marcandre.lureau> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Ping Li <pingl> | ||||||||
| Severity: | unspecified | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 6.7 | CC: | ailan, aliang, areis, chayang, coli, dyuan, hachen, hhan, juzhang, kraxel, marcandre.lureau, meyang, michen, mkenneth, mzhan, ngu, pingl, qizhu, rbalakri, virt-maint, xuwei, yanyang, ykawada | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | qemu-kvm-0.12.1.2-2.497.el6 | Doc Type: | If docs needed, set a value | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2017-03-21 09:36:03 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1359965, 1392287 | ||||||||||
| Attachments: |
|
||||||||||
Well, guest will be shutoff or pause when doing snapshot-revert. It's easily reproduced after writing some data to guest.
You can use following script to reproduce it:
##
#!/bin/bash
DOM=n1
GUEST_IP=192.168.122.107
error=0
for i in {1..200};do
virsh snapshot-create-as $DOM s1
ssh $GUEST_IP dd if=/dev/urandom of=~/$i bs=500M count=1 2>/dev/null
virsh snapshot-revert $DOM s1
ssh $GUEST_IP cat /etc/hosts
if [ $? -ne 0 ] ; then
echo Error found >> ./err
error=`expr $error + 1`
fi
virsh snapshot-delete $DOM s1
done
echo The error numbers is: $error
##
I meet the bug for 180 times when running the script.
(In reply to Han Han from comment #2) > Well, guest will be shutoff or pause when doing snapshot-revert. It's easily > reproduced after writing some data to guest. > You can use following script to reproduce it: > ## > #!/bin/bash > DOM=n1 > GUEST_IP=192.168.122.107 > error=0 > for i in {1..200};do > virsh snapshot-create-as $DOM s1 > ssh $GUEST_IP dd if=/dev/urandom of=~/$i bs=500M count=1 2>/dev/null > virsh snapshot-revert $DOM s1 > ssh $GUEST_IP cat /etc/hosts > if [ $? -ne 0 ] ; then > echo Error found >> ./err > error=`expr $error + 1` > fi > virsh snapshot-delete $DOM s1 > done > echo The error numbers is: $error > ## > > I meet the bug for 180 times when running the script. And since --live was not specified, this seems like expected behavior to me. I wonder what happens if you wait a little longer and I also wonder if virsh shouldn't handle it. Anyway, reassigning to libvirt for further investigation and triage, because it's hard to see what the expected behavior is based on the documentation of virsh snapshot. If you can confirm this is unexpected behavior on QEMU's part, then please elaborate on what virsh is doing under the hood. (In reply to yangyang from comment #0) > Description of problem: > Guest occasionally hang after reverting to internal running snapshot [...] > Steps to Reproduce: > 1.prepare a running guest with following xml [...] > > 2. take internal snapshot > # virsh snapshot-create-as vm1 s1 > > # arp -a | grep 52:54:00:51:9a:d2 > ? (192.168.122.102) at 52:54:00:51:9a:d2 [ether] on virbr0 > > # ssh 192.168.122.102 > root.122.102's password: > Last login: Thu Dec 31 00:54:15 2015 from 192.168.122.1 > [root@dhcp-66-87-202 ~]# ls > anaconda-ks.cfg install.log install.log.syslog > > guest r/w work > > 3. revert to snapshot s1 > # virsh snapshot-revert vm1 s1 > > cannot login to guest > # ssh 192.168.122.102 > ssh_exchange_identification: Connection closed by remote host > > # ping 192.168.122.102 > PING 192.168.122.102 (192.168.122.102) 56(84) bytes of data. > 64 bytes from 192.168.122.102: icmp_seq=1 ttl=64 time=0.225 ms > > Actual results: > guest hang The above indicates that the SSH connection was refused by the remote host. Did you verify that the guest did actually freeze via vnc/spice or the serial connection? The ping test looks like the guest is actually working. (In reply to Han Han from comment #2) > Well, guest will be shutoff or pause when doing snapshot-revert. It's easily > reproduced after writing some data to guest. > You can use following script to reproduce it: > ## > #!/bin/bash > DOM=n1 > GUEST_IP=192.168.122.107 > error=0 > for i in {1..200};do > virsh snapshot-create-as $DOM s1 > ssh $GUEST_IP dd if=/dev/urandom of=~/$i bs=500M count=1 2>/dev/null > virsh snapshot-revert $DOM s1 > ssh $GUEST_IP cat /etc/hosts > if [ $? -ne 0 ] ; then > echo Error found >> ./err > error=`expr $error + 1` > fi > virsh snapshot-delete $DOM s1 > done > echo The error numbers is: $error > ## > > I meet the bug for 180 times when running the script. This seems to be different from the original report. Additionally this really lacks data (logs, etc ...) to support the case. Additionally the 50% reproducibility from the original error doesn't seem to be the case if it's run 180 times before hitting the issue. (In reply to Peter Krempa from comment #4) > (In reply to yangyang from comment #0) > > Description of problem: > > Guest occasionally hang after reverting to internal running snapshot > > [...] > > > Steps to Reproduce: > > 1.prepare a running guest with following xml > > [...] > > > > > 2. take internal snapshot > > # virsh snapshot-create-as vm1 s1 > > > > # arp -a | grep 52:54:00:51:9a:d2 > > ? (192.168.122.102) at 52:54:00:51:9a:d2 [ether] on virbr0 > > > > # ssh 192.168.122.102 > > root.122.102's password: > > Last login: Thu Dec 31 00:54:15 2015 from 192.168.122.1 > > [root@dhcp-66-87-202 ~]# ls > > anaconda-ks.cfg install.log install.log.syslog > > > > guest r/w work > > > > 3. revert to snapshot s1 > > # virsh snapshot-revert vm1 s1 > > > > cannot login to guest > > # ssh 192.168.122.102 > > ssh_exchange_identification: Connection closed by remote host > > > > # ping 192.168.122.102 > > PING 192.168.122.102 (192.168.122.102) 56(84) bytes of data. > > 64 bytes from 192.168.122.102: icmp_seq=1 ttl=64 time=0.225 ms > > > > Actual results: > > guest hang > > The above indicates that the SSH connection was refused by the remote host. > Did you verify that the guest did actually freeze via vnc/spice or the > serial connection? The ping test looks like the guest is actually working. I connected guest via virt-viewer. The screen is blank. What I type cannot be displayed in the screen It works well via serial connection Created attachment 1127207 [details]
virt-viewer screen
(In reply to Ademar Reis from comment #3) > (In reply to Han Han from comment #2) [...] > > I meet the bug for 180 times when running the script. > > > And since --live was not specified, this seems like expected behavior to me. > I wonder what happens if you wait a little longer and I also wonder if virsh > shouldn't handle it. --live is not supported for internal snapshots by qemu. --live denotes that while the snapshot is being taken the CPUs still run. For reverting snapshots there's no such thing. The complete memory image has to be loaded prior to continuing execution. > > Anyway, reassigning to libvirt for further investigation and triage, because > it's hard to see what the expected behavior is based on the documentation of > virsh snapshot. If you can confirm this is unexpected behavior on QEMU's > part, then please elaborate on what virsh is doing under the hood. From further investigation it seems that certain things stop working while reverting a snapshot (spice/vnc?), while the ssh connection is forbidden by the SSH daemon in the VM. Since the serial connection works, the VM was at least partially reverted correctly. Moving back to qemu. Everything from libvirt's point of view was done correctly in this case. Qemu should investigate why the vnc/spice connection does not work afterwards. The issue with not being able to ssh to the guest might be a nuance of the ssh protocol since the VM was rolled back recently. From comment #1 and #6, I am not sure the guest was configured with spice and/or vnc, or how. I see the screenshot and bug title was later changed though QA could you update the reproducer case. Is this only happening on rhel6 host or can it be reproduced with rhel7 and fedora? (for the records, I tried to reproduce with the following line on fedora without success (spicy-screenshot does connection/disconnection of spice, checking that spice still should "work"): $ while true ; do virsh snapshot-create-as rhel7.0 s1; virsh snapshot-revert rhel7.0 s1; spicy-screenshot --uri spice://localhost:5900; virsh snapshot-delete rhel7.0 s1; done) Created attachment 1145894 [details] gdb backtrace info I reproduced a SEGSEGV with the script like comment2: #!/bin/bash DOM=avocado-vt-vm1 GUEST_IP=192.168.122.148 for i in {1..200};do SNAP=`virsh snapshot-create $DOM |grep -o '[0-9]*'` ssh $GUEST_IP dd if=/dev/urandom of=~/$i bs=500M count=1 2>/dev/null virsh snapshot-revert $DOM $SNAP ssh $GUEST_IP cat /etc/hosts if [ $? -ne 0 ] ; then echo Error found >> ./err break fi virsh snapshot-delete $DOM $SNAP done echo Found error Then I found the error message: error: Unable to read from monitor: Connection reset by peer ssh: connect to host 192.168.122.148 port 22: No route to host Found error Abrt also reports a SEGSEGV problem of qemu. The backtrace is in attachment. The SEGSEGV is reproduced on: qemu-kvm-rhev-0.12.1.2-2.491.el6.x86_64 libvirt-0.10.2-60.el6.x86_64 I can't find yet qemu-kvm-rhev-0.12.1.2-2.491.el6.x86_64 (do you have a repo, or do I need to build it), I couldn't reproduce with qemu-kvm-0.12.1.2-2.491.el6.x86_64, could you? I installed qemu-kvm-rhev-0.12.1.2-2.491.el6.x86_64, and still couldn't reproduce the segv.
fwiw, I am testing with f24 guest and the following domain xml:
<domain type='kvm' id='16'>
<name>f24</name>
<uuid>778c1386-c746-1568-3d9d-fc1b224ba2ff</uuid>
<memory unit='KiB'>1048576</memory>
<currentMemory unit='KiB'>1048576</currentMemory>
<vcpu placement='static'>1</vcpu>
<os>
<type arch='x86_64' machine='rhel6.6.0'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/var/lib/libvirt/images/fedora-24.qcow2'/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>
<controller type='usb' index='0' model='ich9-ehci1'>
<alias name='usb0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci1'>
<alias name='usb0'/>
<master startport='0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci2'>
<alias name='usb0'/>
<master startport='2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci3'>
<alias name='usb0'/>
<master startport='4'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
</controller>
<controller type='ide' index='0'>
<alias name='ide0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<controller type='virtio-serial' index='0'>
<alias name='virtio-serial0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
</controller>
<interface type='network'>
<mac address='52:54:00:78:07:c3'/>
<source network='default'/>
<target dev='vnet0'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/2'/>
<target port='0'/>
<alias name='serial0'/>
</serial>
<console type='pty' tty='/dev/pts/2'>
<source path='/dev/pts/2'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
<channel type='spicevmc'>
<target type='virtio' name='com.redhat.spice.0'/>
<alias name='channel0'/>
<address type='virtio-serial' controller='0' bus='0' port='1'/>
</channel>
<input type='mouse' bus='ps2'/>
<graphics type='spice' port='5900' autoport='yes' listen='127.0.0.1'>
<listen type='address' address='127.0.0.1'/>
</graphics>
<sound model='ich6'>
<alias name='sound0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</sound>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<alias name='video0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</memballoon>
</devices>
<seclabel type='dynamic' model='selinux' relabel='yes'>
<label>unconfined_u:system_r:svirt_t:s0:c756,c850</label>
<imagelabel>unconfined_u:object_r:svirt_image_t:s0:c756,c850</imagelabel>
</seclabel>
</domain>
please provide me with further details to reproduce. thanks
Created attachment 1212321 [details] attach Hi Marc-Andre, sorry for late reply. Versions: kernel-2.6.32-642.el6.x86_64 glibc-2.12-1.203.el6.x86_64 libvirt-0.10.2-61.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.494.el6.x86_64 I reproduced the qemu-kvm-rhev SIGSEGV when running the script in comment2 The guest xml and gdb backtrace are in attachment. Fix included in qemu-kvm-0.12.1.2-2.497.el6 Set the bug as verified according to the Comment 20 *** Bug 1289035 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0621.html |
Description of problem: Guest occasionally hang after reverting to internal running snapshot Version-Release number of selected component (if applicable): libvirt-0.10.2-55.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.482.el6.x86_64 How reproducible: 50% Steps to Reproduce: 1.prepare a running guest with following xml <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/libvirt/images/vm1.qcow2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <interface type='network'> <mac address='52:54:00:51:9a:d2'/> <source network='default'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> 2. take internal snapshot # virsh snapshot-create-as vm1 s1 # arp -a | grep 52:54:00:51:9a:d2 ? (192.168.122.102) at 52:54:00:51:9a:d2 [ether] on virbr0 # ssh 192.168.122.102 root.122.102's password: Last login: Thu Dec 31 00:54:15 2015 from 192.168.122.1 [root@dhcp-66-87-202 ~]# ls anaconda-ks.cfg install.log install.log.syslog guest r/w work 3. revert to snapshot s1 # virsh snapshot-revert vm1 s1 cannot login to guest # ssh 192.168.122.102 ssh_exchange_identification: Connection closed by remote host # ping 192.168.122.102 PING 192.168.122.102 (192.168.122.102) 56(84) bytes of data. 64 bytes from 192.168.122.102: icmp_seq=1 ttl=64 time=0.225 ms Actual results: guest hang Expected results: guest works well Additional info: