Bug 1006122

Summary: [qemu][rhel6]win8-32 qcow2 image damaged when do system_reset during wakeup from s3
Product: Red Hat Enterprise Linux 6 Reporter: guo jiang <jguo>
Component: qemu-kvmAssignee: Kevin Wolf <kwolf>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5CC: acathrow, amit.shah, bcao, bsarathy, chayang, famz, flang, hreitz, juzhang, kwolf, michen, mkenneth, qzhang, stefanha, virt-maint, vrozenfe
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-05 22:15:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 912287    
Attachments:
Description Flags
The screenshot of wakeup from s3, and in this status do system_reset guest.
none
qemu-img map/check output (with debug messages added to check) none

Description guo jiang 2013-09-10 04:40:25 UTC
Description of problem:
win8-32 qcow2 image damaged when do system_reset during wakeup from s3

Version-Release number of selected component (if applicable):
  kernel-2.6.32-416.el6.x86_64
  qemu-kvm-rhev-0.12.1.2-2.398.el6.x86_64
  virtio-win-prewhql-0.1.68
  spice-server-0.12.4-2.el6.x86_64  
  seabios-0.6.1.2-28.el6.x86_64
  vgabios-0.6b-3.7.el6.noarch

How reproducible:
1/2

Steps to Reproduce:
1.boot guest /w qxl "-spice port=5931,disable-ticketing -vga qxl 
-global qxl-vga.revision=3 \"
CLI:
/usr/libexec/qemu-kvm \
-M rhel6.5.0 \
-m 2G \
-smp 2,cores=2 \
-cpu 'SandyBridge' \
-usb \
-device usb-tablet \
-enable-kvm \
-drive file=win8-32.qcow2,format=qcow2,if=none,id=drive-blk,cache=writeback,rerror=stop,werror=stop,serial=disk0 \
-device ide-drive,drive=drive-blk,id=blk0-0-0-0,bootindex=1 \
-netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no \
-device e1000,netdev=hostnet0,mac=fe:23:40:21:22:31,id=net0 \
-uuid 8ff96365-d565-4304-8534-f1282aa90267 \
-no-kvm-pit-reinjection \
-chardev socket,id=111a,path=/tmp/monitor-win8-32,server,nowait \
-mon chardev=111a,mode=readline \
-name win8-32 \
-spice port=5931,disable-ticketing \
-vga qxl \
-global qxl-vga.revision=3 \
-rtc base=localtime,clock=host,driftfix=slew \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-monitor stdio

2.Install qxl driver qxlwddm-0.2-1

3.do s3

4.do wakeup

5.during step 4 system_reset

6.repeat step3-step5 many times.


Actual results:
 after step6, win8-32 qcow2 image damaged.
  qemu-img check info:
    $qemu-img check win8-32-old.qcow2 
      ERROR cluster 343662 refcount=1 reference=2

      1 errors were found on the image.
      Data may be corrupted, or further writes to the image may corrupt it.
      Image end offset: 23912579072


Expected results:
No errors were found on the image after do system_reset during wakeup from s3.  

Additional info:
  win8-32 could not wakeup successfully from s3.

Comment 2 guo jiang 2013-09-10 04:45:58 UTC
Created attachment 795820 [details]
The screenshot of wakeup from s3, and in this status do system_reset guest.

Comment 5 Amit Shah 2013-09-10 09:39:31 UTC
system_reset in qemu monitor is like pressing the reset button on a physical computer or plugging off the power cord and plugging it back in.  Since there's no guest cooperation, the guest could be in the middle of using the disk, have unwritten data in memory, and all that will be lost when system_reset is invoked.

Not surprising this can cause image corruption.

Comment 6 Qunfang Zhang 2013-09-12 06:01:16 UTC
(In reply to Amit Shah from comment #5)
> system_reset in qemu monitor is like pressing the reset button on a physical
> computer or plugging off the power cord and plugging it back in.  Since
> there's no guest cooperation, the guest could be in the middle of using the
> disk, have unwritten data in memory, and all that will be lost when
> system_reset is invoked.
> 
> Not surprising this can cause image corruption.

Amit, is that to say this is not a bug?

Comment 7 Kevin Wolf 2013-09-12 07:10:04 UTC
This looks very much like a bug, ERRORs reported by qemu-img check are almost
always bugs.

Can you try if this is reproducible on RHEL 7?

Is it really necessary to install the QXL driver or does it happen without it
as well?

Do you still have the image around, so I could have a look at it?

Comment 8 guo jiang 2013-09-12 07:23:41 UTC
(In reply to Kevin Wolf from comment #7)
> This looks very much like a bug, ERRORs reported by qemu-img check are almost
> always bugs.
> 
> Can you try if this is reproducible on RHEL 7?
> 
> Is it really necessary to install the QXL driver or does it happen without it
> as well?
> 
> Do you still have the image around, so I could have a look at it?

Hi, Kevin

I could not reproduce it with QXL driver installed or without it on rhel6 host,  so I could not analysis the reason why image corruption. I have the image and will uploaded.

Comment 10 Kevin Wolf 2013-09-12 09:36:28 UTC
This is a case of two data clusters pointing to the same cluster in the image
file, as shown by the following 'qemu-img map' output:

Offset          Length          Mapped to       File
...
0x546d70000     0x10000         0x53e6e0000     win8-32-old.qcow2
...
0x7e13f0000     0x90000         0x53e670000     win8-32-old.qcow2
...

The corrupted cluster (343662 * 64k = 0x53e6e0000) is contained in the "mapped
to" area of both allocations.

Comment 11 Kevin Wolf 2013-09-12 09:50:01 UTC
Created attachment 796723 [details]
qemu-img map/check output (with debug messages added to check)

Attaching some qemu-img outputs I gathered on the machine with the broken image.
One is the output of 'qemu-img map' as of current upstream master, the other one
the output of a 'qemu-img check' with added debug output for each reference that
is found and accounted for.

Comment 12 Kevin Wolf 2013-09-12 09:58:30 UTC
Not sure how relevant it is, but from the qemu-img check output:

UPDATE: cluster offset=0x473400000 -> refcount 1
UPDATE: cluster offset=0x473410000 -> refcount 1
UPDATE: cluster offset=0x53e6e0000 -> refcount 1
UPDATE: cluster offset=0x540c90000 -> refcount 1
UPDATE: cluster offset=0x540ca0000 -> refcount 1
...
UPDATE: cluster offset=0x53e6c0000 -> refcount 1
UPDATE: cluster offset=0x53e6d0000 -> refcount 1
UPDATE: cluster offset=0x53e6e0000 -> refcount 2
UPDATE: cluster offset=0x53e6f0000 -> refcount 1
UPDATE: cluster offset=0x53e740000 -> refcount 1

This shows that one of the allocations is a single cluster allocation, whereas
the other one is part of a longer contiguous allocation.

Comment 15 Ademar Reis 2014-06-05 22:15:40 UTC
S3/S4 support is tech-preview in RHEL6 and it'll be promoted to fully supported
at some point, but only in RHEL7.

Therefore we're closing all S3/S4 related bugs in RHEL6. New bugs will be
considered only if they're regressions or break some important use-case or
certification.

RHEL7 is being more extensively tested and effort from QE is underway in
certifying that this particular bug is not present there.

Please reopen with a justification if you believe this bug should not be
closed. We'll consider them on a case-by-case basis following a best effort
approach.


Thank you.