Bug 742615

Summary: libvirt snapshot revert should require force in some risky cases
Product: Red Hat Enterprise Linux 6 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: acathrow, ajia, dallan, mzhan, nzhang, rwu, weizhan, whuang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.9.4-15.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:34:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 638510    
Bug Blocks: 747120    

Description Eric Blake 2011-09-30 18:59:09 UTC
Description of problem:
There are two risky snapshot revert situations, which we do not want to allow by default, but where acknowledging the risk can allow a useful revert when the user is sure the risks won't cause problems:
1. reverting to a snapshot created prior to RHEL 6.2. Older snapshots lacked <domain> information in the snapshot metadata, so we can't guarantee that the current configuration matches the configuration in effect at the time the snapshot was created.  In fact, there are cases where mismatch of configuration can cause irreversible corruption to the guest disk images.
2. with an active domain (running or paused), reverting to either an ABI-incompatible active snapshot, or to an inactive snapshot combined with the --running or --paused flag to boot the inactive image, requires that the current qemu be halted and a new qemu process be created.  Any existing VNC or Spice connections to the domain will thus be disconnected, meaning the snapshot reversion is not seamless.

Version-Release number of selected component (if applicable):
libvirt-0.9.4-14.el6


How reproducible:
100%

Steps to Reproduce:
Problem 1: create an inactive snapshot without full <domain> details (by creating with 0.9.4-7.el6 or older, or by using 'virsh snapshot-edit dom snap' to nuke the <domain> details).  All remaining steps are done with the libvirtd version under test.  Optionally - modify the domain configuration, such as by adding or removing a disk.  Then try reverting to the snapshot.

Problem 2a: With libvirt-0.9.4-8.el6 or newer, create an active snapshot.  Then make an incompatible change, such as hot-plugging a disk.  Then try to revert to the snapshot.

Problem 2b: With libvirt-0.9.4-8.el6 or newer, create an inactive snapshot.  Then start the domain, and try to revert using 'virsh snapshot-revert --running dom snap' to revert to the disk state but also boot.
  
Actual results:
Problem 1: If the optional step was taken, and the server lacks force support, and the optional step was taken to modify config, then 'virsh snapshot-revert dom snap' might fail, but more likely it will succeed while not changing the disk config back to what it was at the time of the snapshot.  Furthermore, booting the guest will be using the wrong disks, and can result in irreversible data loss.  If the optional step was not taken, then reverting is safe, but there was no extra effort required to promise that the revert was safe.

Problem 2a: There is no way to do the revert - it is rejected as ABI-incompatible.

Problem 2b: The revert happens, but any existing VNC or Spice connection is dropped, because a new qemu process is created.

Expected results:
Problem 1: Whether or not the optional step was taken, attempting to revert without forcing should fail with a message stating that force is required because domain info was not available.  Use of 'virsh snapshot-revert --force dom snap' should succeed.  If the optional step was taken of modifying config, this means that the config is wrong, but that is user error for using --force in an unsafe manner, and not a libvirt bug.  If the optional step was not taken, then the use of --force allows the user to regain the functionality that they had for doing a safe revert without <domain> info.

Problem 2a: The revert should be forbidden by default, but using --force should let it succeed (the break in VNC or Spice connection is expected).

Problem 2b: The revert should be forbidden without --force, such that --force acknowledges that the break in VNC or Spice connection is expected.

Additional info:
See this upstream patch set:
https://www.redhat.com/archives/libvir-list/2011-September/msg01314.html

Comment 4 Alex Jia 2011-10-08 08:49:58 UTC
Eric, I met different test result from you for Problem 2a, and libvirt complaints again in the last step.

The following are some details (ignore some steps):

For Problem 1:

# qemu-img create -f qcow2 /var/lib/libvirt/images/bar.img 10M
Formatting '/var/lib/libvirt/images/bar.img', fmt=qcow2 size=10485760 encryption=off cluster_size=65536

# virsh edit dom (add the second disk)
Domain dom XML configuration edited.

# virsh dumpxml dom
<domain type='qemu'>
......
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/foo.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/bar.img'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
......
</domain>

# virsh snapshot-revert dom snap

# virsh dumpxml dom
<domain type='qemu'>
......
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/foo.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
......
</domain>

Notes, the second disk is gone.

...... (ignore others test steps in here, please Comment 1)

# virsh snapshot-revert dom snap
error: revert requires force: snapshot 'snap' lacks domain 'dom' rollback info

# virsh snapshot-revert dom snap --force
error: internal error Child process (/usr/bin/qemu-img snapshot -a snap /var/lib/libvirt/images/bar.img) status unexpected: exit status 1

# virsh edit dom (remove that second disk)
Domain dom XML configuration edited.

# virsh snapshot-revert dom snap --force

# echo $?
0

# virsh snapshot-list dom
 Name                 Creation Time             State
------------------------------------------------------------
 snap                 2011-10-08 16:01:09 +0800 shutoff

Actual result: works well for me.

For Problem 2a:

# cat > /tmp/dom.xml <<EOF
<domain type='qemu'>
  <name>dom</name>
  <memory>219200</memory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64'>hvm</type>
    <boot dev='cdrom'/>
  </os>
  <devices>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/foo.img'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <input type='mouse' bus='ps2'/>
    <graphics type='spice' autoport='yes' listen='0.0.0.0'/>
  </devices>
</domain>
EOF

# virsh snapshot-revert dom snap
error: revert requires force: Target domain current memory 219200 does not match source 220160

Eric, I met different result from you, so is this a expected result too?

# pidof qemu-kvm
30079 27390

# virsh snapshot-revert dom snap --force

# pidof qemu-kvm
30129 27390

# virsh snapshot-revert dom snap
error: revert requires force: Target domain current memory 219200 does not match source 220160

Notes, libvirt complaints again.

Other sides are fine for me.

Actual result: FAILED.

For Problem 2b:
# virsh list
 Id Name                 State
----------------------------------
  6 vr-rhel5u4-x86_64-kvm running
 11 dom                  running

# virsh snapshot-revert dom snap

# virsh list
 Id Name                 State
----------------------------------
  6 vr-rhel5u4-x86_64-kvm running

# virsh snapshot-revert dom snap --running

# virsh list
 Id Name                 State
----------------------------------
  6 vr-rhel5u4-x86_64-kvm running
 12 dom                  running

# virsh edit dom (add the second disk)
Domain dom XML configuration not changed.

# virsh start dom
Domain dom started

# virsh snapshot-revert dom snap --running
error: revert requires force: must respawn qemu to start inactive snapshot

# virsh snapshot-revert dom snap --running --force

# virsh list
 Id Name                 State
----------------------------------
  6 vr-rhel5u4-x86_64-kvm running
 14 dom                  running

# virsh dumpxml dom
<domain type='qemu'>
......
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/foo.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
......
</domain>

Notes, the second disk is gone.

Actual result: works well for me.

Comment 5 Eric Blake 2011-10-10 15:54:52 UTC
(In reply to comment #4)
> Eric, I met different test result from you for Problem 2a, and libvirt
> complaints again in the last step.
> 
> The following are some details (ignore some steps):
> 

> For Problem 2a:
> 
> # cat > /tmp/dom.xml <<EOF
> <domain type='qemu'>
>   <name>dom</name>
>   <memory>219200</memory>

This is the value you requested, but it doesn't match with qemu minimum granularity.  So when qemu is actually started, it is getting rounded up at run time:

> # virsh snapshot-revert dom snap
> error: revert requires force: Target domain current memory 219200 does not
> match source 220160

At this point, if you were to 'virsh dumpxml snap', you'd see the difference in memory.

Try repeating the test with dom.xml set to have a memory value that won't get rounded in the first place.

> 
> Eric, I met different result from you, so is this a expected result too?
> 
> # pidof qemu-kvm
> 30079 27390
> 
> # virsh snapshot-revert dom snap --force
> 
> # pidof qemu-kvm
> 30129 27390

This confirms that the --force was sufficient to overcome the difference in memory, due to the forced rounding.  It might be nice if libvirt were taught to perform the same memory rounding before declaring two domain xml as ABI incompatible (as a new BZ), but it's late enough in the 6.2 cycle that I don't think it is worth trying to do that now.

> 
> # virsh snapshot-revert dom snap
> error: revert requires force: Target domain current memory 219200 does not
> match source 220160
> 
> Notes, libvirt complaints again.

Yes, this means that the revert is restoring the config value, not the runtime-rounded value, so once again there is a difference between config and runtime, so the --force is still necessary to overcome the fact that rounding must take place.  Again, once you fix dom.xml to not need rounding in the first place, I think you will be able to get results more like what I was predicting.

Comment 6 Alex Jia 2011-10-11 05:32:31 UTC
(In reply to comment #5)
> > # cat > /tmp/dom.xml <<EOF
> > <domain type='qemu'>
> >   <name>dom</name>
> >   <memory>219200</memory>
> 
> This is the value you requested, but it doesn't match with qemu minimum
> granularity.  So when qemu is actually started, it is getting rounded up at run

Yeah, you're right.

> This confirms that the --force was sufficient to overcome the difference in
> memory, due to the forced rounding.  It might be nice if libvirt were taught to
> perform the same memory rounding before declaring two domain xml as ABI
> incompatible (as a new BZ), but it's late enough in the 6.2 cycle that I don't
> think it is worth trying to do that now.
Agree, it's a another issue, I will file a new bug if necessary.
> 
> > 
> > # virsh snapshot-revert dom snap
> > error: revert requires force: Target domain current memory 219200 does not
> > match source 220160
> > 
> > Notes, libvirt complaints again.
> 
> place, I think you will be able to get results more like what I was predicting.

Yeah, I get the same result like you:

# virsh snapshot-revert dom snap
error: revert requires force: Target domain disk count 1 does not match source 2

# pidof qemu-kvm
13699 12186
# virsh snapshot-revert dom snap --force

# echo $?
0

# pidof qemu-kvm
13719 12186

And other steps are also fine, so move the bug to VERIFIED status.

Version-Release number of selected component:
# uname -r
2.6.32-206.el6.x86_64

# rpm -q libvirt
libvirt-0.9.4-16.el6.x86_64

# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.195.el6.x86_64

Comment 7 errata-xmlrpc 2011-12-06 11:34:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html