Bug 1089921

Summary:	There will be file lost in guest after do blockcommit when guest with non-cached qcow2 disk as source file
Product:	Red Hat Enterprise Linux 7	Reporter:	Shanzhi Yu <shyu>
Component:	libvirt	Assignee:	Libvirt Maintainers <libvirt-maint>
Status:	CLOSED NOTABUG	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.0	CC:	acathrow, bili, chhu, dyuan, eblake, jdenemar, jmiao, juzhang, mzhan, shu
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1089924 (view as bug list)		Environment:
Last Closed:	2014-04-23 12:01:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1089924

Description Shanzhi Yu 2014-04-22 09:00:26 UTC

Description of problem:

There will be file lost in guest after do blockcommit when guest with non-cached qcow2 disk as source file

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-1.5.3-60.el7ev.x86_64
libvirt-1.1.1-29.el7.x86_64


How reproducible:

100%

Steps to Reproduce:

1.prepare an guest with non-cached qcow2 disk as source file

# virsh list
 Id    Name                           State
----------------------------------------------------
 17    rhel6                          running

# virsh domblklist rhel6
Target     Source
------------------------------------------------
sda        /var/lib/libvirt/images/base.img

#virsh dumpxml rhel6|grep disk -A 4

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/base.img'/>
      <target dev='sda' bus='scsi'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>


2.create  external disk-only snapshot, and login guest to create
file with same name as snapshot name

# virsh snapshot-create-as rhel6 s1 --disk-only  
Domain snapshot s1 created

login guest and create file s1

[guest]# echo "hello s1" > s1

3. repeat step2, instead s1 with s2, s3 and s4

# virsh snapshot-list rhel6
 Name                 Creation Time             State
------------------------------------------------------------
 s1                   2014-04-21 17:35:44 +0800 disk-snapshot
 s2                   2014-04-21 17:36:13 +0800 disk-snapshot
 s3                   2014-04-21 17:36:27 +0800 disk-snapshot
 s4                   2014-04-21 17:36:42 +0800 disk-snapshot

# virsh domblklist rhel6
Target     Source
------------------------------------------------
sda        /var/lib/libvirt/images/base.s4

login guest and check the file created in step 2&3

[guest]# ll s*
-rw-r--r--. 1 root root     9 Apr 21 18:10 s1
-rw-r--r--. 1 root root     9 Apr 21 18:10 s2
-rw-r--r--. 1 root root     9 Apr 21 18:10 s3
-rw-r--r--. 1 root root     9 Apr 21 18:11 s4


4. do blockcommit from s2 to base; then check the disk chain

# virsh blockcommit rhel6 sda --top /var/lib/libvirt/images/base.s2  --base /var/lib/libvirt/images/base.img  --verbose --wait
Block Commit: [100 %]
Commit complete

# qemu-img info  --backing-chain /var/lib/libvirt/images/base.s4

/var/lib/libvirt/images/base.img <-- /var/lib/libvirt/images/base.s3 <-- /var/lib/libvirt/images/base.s4


5. change guest source file from base.s4 to base.img

# virsh domblklist rhel6
Target     Source
------------------------------------------------
sda        /var/lib/libvirt/images/base.img

login guest and check the file created in step 2&3

# ll s*
-rw-r--r--. 1 root root 0 Apr 21 18:10 s1
-rw-r--r--. 1 root root 0 Apr 21 18:10 s2

6. change guest source file from base.img to base.s4,
restart guest and login guest check file created in step 2&3

[guest]# ll s*

ls: cannot access s3: Input/output error
-rw-r--r--. 1 root root 0 Apr 21 18:10 s1
-rw-r--r--. 1 root root 0 Apr 21 18:10 s2
srwxr-xr-x. 1 gdm  gdm  0 Apr 21 18:17 s4

7. do blockcommit from s3 to base;then check the disk chain

# virsh blockcommit rhel6 sda --top /var/lib/libvirt/images/base.s3 --base /var/lib/libvirt/images/base.img  --verbose --wait
Block Commit: [100 %]
Commit complete

# qemu-img info  --backing-chain /var/lib/libvirt/images/base.s4

/var/lib/libvirt/images/base.img <-- /var/lib/libvirt/images/base.s4

8. change guest source file from base.s4 to base.img,
restart guest and login guest check file created in step 2&3

[guest]# ll s*
-rw-r--r--. 1 root root 0 Apr 21 18:10 s1
-rw-r--r--. 1 root root 0 Apr 21 18:10 s2




Actual results:

In step 5), the content of file s1 and s2 lost
In step 6), file s3 lost, file s4 change to socket file.
In step 8), file s3, lost

Expected results:



Additional info:

base.img is an qcow2 v2 format file, with rhel6.5 installed.

Comment 1 Jiri Denemark 2014-04-23 12:01:20 UTC

By booting the domain from base.img in step 5, you made any image based on base.img completely useless (i.e., base.s3 and base.s4 contain just garbage). This is because booting from the image changes it (and in case of ext3 even mounting the image which was not cleanly unmounted read-only would change it too).

Comment 2 Shanzhi Yu 2014-04-23 16:48:23 UTC

(In reply to Jiri Denemark from comment #1)
> By booting the domain from base.img in step 5, you made any image based on
> base.img completely useless (i.e., base.s3 and base.s4 contain just
> garbage). This is because booting from the image changes it (and in case of
> ext3 even mounting the image which was not cleanly unmounted read-only would
> change it too).

Jiri,
 
As your explaination, I can unsterstand step 6,7,8 is useless here. But, after commmit base.s2 to base.img, should base.img include file both s1 and s2? If not, what does blockcommit really did here?

Comment 3 Eric Blake 2014-04-23 17:08:37 UTC

Visually, look at it this way, where XX represents a cluster that refers back to the parent file.

Pre-commit, you have:

base.img AA BB CC DD       # Guest saw AA BB CC DD at this point
base.s1  EE XX FF XX       # Guest saw EE BB FF DD at this point
base.s2  GG HH XX XX       # Guest saw GG HH FF DD at this point

After committing base.s2 into base.img, you have:

base.img GG HH FF DD       # base.img now contains all content from s1 and s2
base.s1  EE XX FF XX       # Reading this image would see EE HH FF DD, but
                           # that never happened - the image is now corrupt
base.s2  GG HH XX XX       # this would still read as GG HH FF DD, but since
                           # it relies on corrupt base.s1, it's risky

so after the commit, the best thing is to declare base.s1 and base.s2 as useless, and delete them.  The point of commit is to shorten the chain by modifying the base image and discarding the snapshots that are now no longer needed now that the base image includes the same content.

Comment 4 Eric Blake 2014-04-23 17:20:24 UTC

One other thing to be aware of - you took a --disk-only snapshot, but without requesting --quiesce.  Reverting to that snapshot behaves the same as if you had pulled the power cord from a running machine.  If the OS had not flushed the files to disk prior to when you yank the cord (aka the time when you took the disk snapshot), then changes you made to the filesystem prior to the snapshot may not appear after reverting to that state, because they had not yet been flushed.  Remember that the state of the disk is often inconsistent (lags) in comparison to the state of the file system of a running system, where pulling the power cord abruptly may lose up to several seconds worth of changes as it rolls back to the last known safe journaling point that was actually recorded in the disk (this is intentional - if all file system operations waited for the disk to be consistent, your system would be much slower; the point of journaling filesystems is to cache in-flight file system changes for several seconds and only later catch the disk up to that state, so that a running system has better throughput, which works only as long as you can guarantee that power isn't yanked abruptly).

You probably want to ensure that the guest does sync in between creating a file and taking the snapshot, and/or use the --quiesce flag when creating the snapshot, both in order to ensure that the state of the disk at the time of the snapshot actually contains enough file system state so that reverting to your snapshot will see the files that you are creating in between snapshots.