RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1294941 - QEMU crash on snapshot revert when using Cirrus
Summary: QEMU crash on snapshot revert when using Cirrus
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.7
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Marc-Andre Lureau
QA Contact: Ping Li
URL:
Whiteboard:
: 1289035 (view as bug list)
Depends On:
Blocks: 1359965 1392287
TreeView+ depends on / blocked
 
Reported: 2015-12-31 07:01 UTC by Yang Yang
Modified: 2018-03-22 03:31 UTC (History)
23 users (show)

Fixed In Version: qemu-kvm-0.12.1.2-2.497.el6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-21 09:36:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
virt-viewer screen (12.21 KB, image/png)
2016-02-15 09:45 UTC, Yang Yang
no flags Details
gdb backtrace info (4.37 KB, application/x-gzip)
2016-04-11 09:57 UTC, Han Han
no flags Details
attach (5.15 KB, application/x-gzip)
2016-10-20 03:43 UTC, Han Han
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0621 0 normal SHIPPED_LIVE Moderate: qemu-kvm security and bug fix update 2017-03-21 12:28:31 UTC

Description Yang Yang 2015-12-31 07:01:45 UTC
Description of problem:
Guest occasionally hang after reverting to internal running snapshot

Version-Release number of selected component (if applicable):
libvirt-0.10.2-55.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.482.el6.x86_64

How reproducible:
50%

Steps to Reproduce:
1.prepare a running guest with following xml
<disk type='file' device='disk'>
        <driver name='qemu' type='qcow2' cache='none'/>
        <source file='/var/lib/libvirt/images/vm1.qcow2'/>
        <target dev='vda' bus='virtio'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
      </disk>
<interface type='network'>
        <mac address='52:54:00:51:9a:d2'/>
        <source network='default'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
      </interface>

2. take internal snapshot
# virsh snapshot-create-as vm1 s1

# arp -a | grep 52:54:00:51:9a:d2
? (192.168.122.102) at 52:54:00:51:9a:d2 [ether] on virbr0

# ssh 192.168.122.102
root.122.102's password: 
Last login: Thu Dec 31 00:54:15 2015 from 192.168.122.1
[root@dhcp-66-87-202 ~]# ls
anaconda-ks.cfg  install.log  install.log.syslog

guest r/w work

3. revert to snapshot s1
# virsh snapshot-revert vm1 s1

cannot login to guest
# ssh 192.168.122.102
ssh_exchange_identification: Connection closed by remote host

# ping 192.168.122.102
PING 192.168.122.102 (192.168.122.102) 56(84) bytes of data.
64 bytes from 192.168.122.102: icmp_seq=1 ttl=64 time=0.225 ms

Actual results:
guest hang

Expected results:
guest works well

Additional info:

Comment 2 Han Han 2016-01-04 02:00:36 UTC
Well, guest will be shutoff or pause when doing snapshot-revert. It's easily reproduced after writing some data to guest.
You can use following script to reproduce it:
##
#!/bin/bash
DOM=n1
GUEST_IP=192.168.122.107
error=0
for i in {1..200};do
    virsh snapshot-create-as $DOM s1
    ssh $GUEST_IP dd if=/dev/urandom of=~/$i bs=500M count=1 2>/dev/null
    virsh snapshot-revert $DOM s1
    ssh $GUEST_IP cat /etc/hosts
    if [ $? -ne 0 ] ; then
        echo Error found >> ./err
        error=`expr $error + 1`
    fi  
    virsh snapshot-delete $DOM s1
done
echo The error numbers is: $error
##

I meet the bug for 180 times when running the script.

Comment 3 Ademar Reis 2016-01-11 13:11:18 UTC
(In reply to Han Han from comment #2)
> Well, guest will be shutoff or pause when doing snapshot-revert. It's easily
> reproduced after writing some data to guest.
> You can use following script to reproduce it:
> ##
> #!/bin/bash
> DOM=n1
> GUEST_IP=192.168.122.107
> error=0
> for i in {1..200};do
>     virsh snapshot-create-as $DOM s1
>     ssh $GUEST_IP dd if=/dev/urandom of=~/$i bs=500M count=1 2>/dev/null
>     virsh snapshot-revert $DOM s1
>     ssh $GUEST_IP cat /etc/hosts
>     if [ $? -ne 0 ] ; then
>         echo Error found >> ./err
>         error=`expr $error + 1`
>     fi  
>     virsh snapshot-delete $DOM s1
> done
> echo The error numbers is: $error
> ##
> 
> I meet the bug for 180 times when running the script.


And since --live was not specified, this seems like expected behavior to me. I wonder what happens if you wait a little longer and I also wonder if virsh shouldn't handle it.

Anyway, reassigning to libvirt for further investigation and triage, because it's hard to see what the expected behavior is based on the documentation of virsh snapshot. If you can confirm this is unexpected behavior on QEMU's part, then please elaborate on what virsh is doing under the hood.

Comment 4 Peter Krempa 2016-02-10 13:09:08 UTC
(In reply to yangyang from comment #0)
> Description of problem:
> Guest occasionally hang after reverting to internal running snapshot

[...]

> Steps to Reproduce:
> 1.prepare a running guest with following xml

[...]

> 
> 2. take internal snapshot
> # virsh snapshot-create-as vm1 s1
> 
> # arp -a | grep 52:54:00:51:9a:d2
> ? (192.168.122.102) at 52:54:00:51:9a:d2 [ether] on virbr0
> 
> # ssh 192.168.122.102
> root.122.102's password: 
> Last login: Thu Dec 31 00:54:15 2015 from 192.168.122.1
> [root@dhcp-66-87-202 ~]# ls
> anaconda-ks.cfg  install.log  install.log.syslog
> 
> guest r/w work
> 
> 3. revert to snapshot s1
> # virsh snapshot-revert vm1 s1
> 
> cannot login to guest
> # ssh 192.168.122.102
> ssh_exchange_identification: Connection closed by remote host
> 
> # ping 192.168.122.102
> PING 192.168.122.102 (192.168.122.102) 56(84) bytes of data.
> 64 bytes from 192.168.122.102: icmp_seq=1 ttl=64 time=0.225 ms
> 
> Actual results:
> guest hang

The above indicates that the SSH connection was refused by the remote host. Did you verify that the guest did actually freeze via vnc/spice or the serial connection? The ping test looks like the guest is actually working.

Comment 5 Peter Krempa 2016-02-10 13:12:49 UTC
(In reply to Han Han from comment #2)
> Well, guest will be shutoff or pause when doing snapshot-revert. It's easily
> reproduced after writing some data to guest.
> You can use following script to reproduce it:
> ##
> #!/bin/bash
> DOM=n1
> GUEST_IP=192.168.122.107
> error=0
> for i in {1..200};do
>     virsh snapshot-create-as $DOM s1
>     ssh $GUEST_IP dd if=/dev/urandom of=~/$i bs=500M count=1 2>/dev/null
>     virsh snapshot-revert $DOM s1
>     ssh $GUEST_IP cat /etc/hosts
>     if [ $? -ne 0 ] ; then
>         echo Error found >> ./err
>         error=`expr $error + 1`
>     fi  
>     virsh snapshot-delete $DOM s1
> done
> echo The error numbers is: $error
> ##
> 
> I meet the bug for 180 times when running the script.

This seems to be different from the original report. Additionally this really lacks data (logs, etc ...) to support the case. Additionally the 50% reproducibility from the original error doesn't seem to be the case if it's run 180 times before hitting the issue.

Comment 6 Yang Yang 2016-02-15 09:42:42 UTC
(In reply to Peter Krempa from comment #4)
> (In reply to yangyang from comment #0)
> > Description of problem:
> > Guest occasionally hang after reverting to internal running snapshot
> 
> [...]
> 
> > Steps to Reproduce:
> > 1.prepare a running guest with following xml
> 
> [...]
> 
> > 
> > 2. take internal snapshot
> > # virsh snapshot-create-as vm1 s1
> > 
> > # arp -a | grep 52:54:00:51:9a:d2
> > ? (192.168.122.102) at 52:54:00:51:9a:d2 [ether] on virbr0
> > 
> > # ssh 192.168.122.102
> > root.122.102's password: 
> > Last login: Thu Dec 31 00:54:15 2015 from 192.168.122.1
> > [root@dhcp-66-87-202 ~]# ls
> > anaconda-ks.cfg  install.log  install.log.syslog
> > 
> > guest r/w work
> > 
> > 3. revert to snapshot s1
> > # virsh snapshot-revert vm1 s1
> > 
> > cannot login to guest
> > # ssh 192.168.122.102
> > ssh_exchange_identification: Connection closed by remote host
> > 
> > # ping 192.168.122.102
> > PING 192.168.122.102 (192.168.122.102) 56(84) bytes of data.
> > 64 bytes from 192.168.122.102: icmp_seq=1 ttl=64 time=0.225 ms
> > 
> > Actual results:
> > guest hang
> 
> The above indicates that the SSH connection was refused by the remote host.
> Did you verify that the guest did actually freeze via vnc/spice or the
> serial connection? The ping test looks like the guest is actually working.

I connected guest via virt-viewer. The screen is blank. What I type cannot be displayed in the screen

It works well via serial connection

Comment 7 Yang Yang 2016-02-15 09:45:02 UTC
Created attachment 1127207 [details]
virt-viewer screen

Comment 8 Peter Krempa 2016-02-15 13:27:01 UTC
(In reply to Ademar Reis from comment #3)
> (In reply to Han Han from comment #2)

[...]

> > I meet the bug for 180 times when running the script.
> 
> 
> And since --live was not specified, this seems like expected behavior to me.
> I wonder what happens if you wait a little longer and I also wonder if virsh
> shouldn't handle it.

--live is not supported for internal snapshots by qemu. --live denotes that while the snapshot is being taken the CPUs still run. For reverting snapshots there's no such thing. The complete memory image has to be loaded prior to continuing execution.

> 
> Anyway, reassigning to libvirt for further investigation and triage, because
> it's hard to see what the expected behavior is based on the documentation of
> virsh snapshot. If you can confirm this is unexpected behavior on QEMU's
> part, then please elaborate on what virsh is doing under the hood.

From further investigation it seems that certain things stop working while reverting a snapshot (spice/vnc?), while the ssh connection is forbidden by the SSH daemon in the VM. 

Since the serial connection works, the VM was at least partially reverted correctly. Moving back to qemu. Everything from libvirt's point of view was done correctly in this case.

Qemu should investigate why the vnc/spice connection does not work afterwards. The issue with not being able to ssh to the guest might be a nuance of the ssh protocol since the VM was rolled back recently.

Comment 9 Marc-Andre Lureau 2016-04-06 16:05:35 UTC
From comment #1 and #6, I am not sure the guest was configured with spice and/or vnc, or how.

I see the screenshot and bug title was later changed though

QA could you update the reproducer case.

Is this only happening on rhel6 host or can it be reproduced with rhel7 and fedora?

(for the records, I tried to reproduce with the following line on fedora without success (spicy-screenshot does connection/disconnection of spice, checking that spice still should "work"):
$ while true ; do virsh snapshot-create-as rhel7.0 s1; virsh snapshot-revert rhel7.0 s1; spicy-screenshot --uri spice://localhost:5900; virsh snapshot-delete rhel7.0 s1; done)

Comment 10 Han Han 2016-04-11 09:57:15 UTC
Created attachment 1145894 [details]
gdb backtrace info

I reproduced a SEGSEGV with the script like comment2:
#!/bin/bash                                                             
DOM=avocado-vt-vm1
GUEST_IP=192.168.122.148
for i in {1..200};do
    SNAP=`virsh snapshot-create $DOM |grep -o '[0-9]*'`
    ssh $GUEST_IP dd if=/dev/urandom of=~/$i bs=500M count=1 2>/dev/null
    virsh snapshot-revert $DOM $SNAP
    ssh $GUEST_IP cat /etc/hosts
    if [ $? -ne 0 ] ; then
        echo Error found >> ./err
        break
    fi
    virsh snapshot-delete $DOM $SNAP
done
echo Found error

Then I found the error message:
error: Unable to read from monitor: Connection reset by peer
ssh: connect to host 192.168.122.148 port 22: No route to host
Found error

Abrt also reports a SEGSEGV problem of qemu.
The backtrace is in attachment.

Comment 11 Han Han 2016-04-11 10:02:18 UTC
The SEGSEGV is reproduced on:
qemu-kvm-rhev-0.12.1.2-2.491.el6.x86_64
libvirt-0.10.2-60.el6.x86_64

Comment 12 Marc-Andre Lureau 2016-09-21 18:08:23 UTC
I can't find yet qemu-kvm-rhev-0.12.1.2-2.491.el6.x86_64 (do you have a repo, or do I need to build it), I couldn't reproduce with qemu-kvm-0.12.1.2-2.491.el6.x86_64, could you?

Comment 13 Marc-Andre Lureau 2016-09-21 19:48:54 UTC
I installed qemu-kvm-rhev-0.12.1.2-2.491.el6.x86_64, and still couldn't reproduce the segv.

fwiw, I am testing with f24 guest and the following domain xml:

<domain type='kvm' id='16'>
  <name>f24</name>
  <uuid>778c1386-c746-1568-3d9d-fc1b224ba2ff</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.6.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/fedora-24.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb0'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb0'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb0'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:78:07:c3'/>
      <source network='default'/>
      <target dev='vnet0'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <graphics type='spice' port='5900' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <sound model='ich6'>
      <alias name='sound0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>unconfined_u:system_r:svirt_t:s0:c756,c850</label>
    <imagelabel>unconfined_u:object_r:svirt_image_t:s0:c756,c850</imagelabel>
  </seclabel>
</domain>

please provide me with further details to reproduce. thanks

Comment 14 Han Han 2016-10-20 03:43:13 UTC
Created attachment 1212321 [details]
attach

Hi Marc-Andre, sorry for late reply.
Versions:
kernel-2.6.32-642.el6.x86_64
glibc-2.12-1.203.el6.x86_64
libvirt-0.10.2-61.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.494.el6.x86_64

I reproduced the qemu-kvm-rhev SIGSEGV when running the script in comment2
The guest xml and gdb backtrace are in attachment.

Comment 19 Yash Mankad 2016-12-02 10:05:38 UTC
Fix included in qemu-kvm-0.12.1.2-2.497.el6

Comment 22 Ping Li 2016-12-15 15:39:03 UTC
Set the bug as verified according to the Comment 20

Comment 23 Gerd Hoffmann 2017-01-04 09:55:15 UTC
*** Bug 1289035 has been marked as a duplicate of this bug. ***

Comment 29 errata-xmlrpc 2017-03-21 09:36:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0621.html


Note You need to log in before you can comment on or make changes to this bug.