Bug 998443 - _highWrite should not extend a drive if the highest allocated extent is outside the capacity of the volume.
_highWrite should not extend a drive if the highest allocated extent is outsi...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.2.0
x86_64 Linux
urgent Severity high
: ---
: 3.3.0
Assigned To: Federico Simoncelli
Leonid Natapov
storage
: ZStream
Depends On:
Blocks: 1032106 3.3snap3
  Show dependency treegraph
 
Reported: 2013-08-19 06:30 EDT by Lee Yarwood
Modified: 2016-02-10 11:43 EST (History)
11 users (show)

See Also:
Fixed In Version: is24
Doc Type: Bug Fix
Doc Text:
QEMU sometimes returned a value for the highest allocated extent of a volume that was greater than the capacity of the qcow2 volume. In such a case VDSM attempted to extend the volume with every run of _highWrite. It did not ensure that the highest allocated extent was within the capacity of the volume before proceeding. Both _highWrite and _onAbnormalStop now share the same logic about the volume extension.
Story Points: ---
Clone Of:
: 1032106 (view as bug list)
Environment:
Last Closed: 2014-01-21 11:13:09 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
abaron: Triaged+


Attachments (Terms of Use)
corrupt.stp reproducer (677 bytes, text/plain)
2013-08-20 05:25 EDT, Lee Yarwood
no flags Details
corrupt.stp reproducer (677 bytes, text/plain)
2013-10-16 09:51 EDT, Lee Yarwood
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 429853 None None None Never
oVirt gerrit 21150 None None None Never
oVirt gerrit 21382 None None None Never
oVirt gerrit 21383 None None None Never

  None (edit)
Description Lee Yarwood 2013-08-19 06:30:11 EDT
Description of problem:

QEMU may return a value for the highest allocated extent of a volume that is greater than the capacity of the (corrupt) qcow2 volume (BZ#993611 for example). In such a case VDSM  will attempt to extend the volume with every run of _highWrite (BZ#568098 C#25).

While VDSM is not responsible for the corrupt qcow2 volume behind this it should avoid entering an extend loop by ensuring that the highest allocated extent is within the capacity of the volume before proceeding.

Version-Release number of selected component (if applicable):
vdsm-bootstrap-4.10.2-24.1.el6ev

How reproducible:
Always with a corrupt qcow2 volume where QEMU returns an alloc value greater than the capacity of the device.

Steps to Reproduce:
Looking into a simple reproducer now.

Actual results:
Each run of _highWrite against the volume results in the volume being extended.

Expected results:
_highWrite continues when the alloc value is greater than the capacity of the volume resulting in the guest pausing.

Ideally errors should be logged to engine through another check within _onAbnormalStop.

Additional info:
Comment 1 Federico Simoncelli 2013-08-20 04:38:36 EDT
(In reply to Lee Yarwood from comment #0)
> Description of problem:
> While VDSM is not responsible for the corrupt qcow2 volume behind this it
> should avoid entering an extend loop by ensuring that the highest allocated
> extent is within the capacity of the volume before proceeding.

Let's keep in mind that qcow2 has some overhead that grows linearly with the image size. I think that somewhere else in the code we already estimate that as +10% (even though it's probably far less).
Comment 2 Lee Yarwood 2013-08-20 05:22:50 EDT
(In reply to Federico Simoncelli from comment #1)
> (In reply to Lee Yarwood from comment #0)
> > Description of problem:
> > While VDSM is not responsible for the corrupt qcow2 volume behind this it
> > should avoid entering an extend loop by ensuring that the highest allocated
> > extent is within the capacity of the volume before proceeding.
> 
> Let's keep in mind that qcow2 has some overhead that grows linearly with the
> image size. I think that somewhere else in the code we already estimate that
> as +10% (even though it's probably far less).

AFAIK capacity should refer to the virtual size of the volume and not the physical size. With alloc being somewhere within the bounds of the virtual size / capacity of the volume.
Comment 3 Lee Yarwood 2013-08-20 05:25:34 EDT
Created attachment 788397 [details]
corrupt.stp reproducer

Please find a systemtap script that modifies the value of wr_highest_sector for a given volume attached. By using this script below I am able to cause VDSM to enter the extend loop until all space on the given domain is taken up.

# stap -g corrupt.stp "/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/ef3f45b0-8c40-46e9-ad12-7311c834502d/images/eb1c951b-2fab-4e43-ae55-778e1652f6de/f64e24f3-f023-4080-9e2d-332e52e4069b" 9223090561878130176
Setting wr_highest_sector to 0x7fff00000000fe00 for device /rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/ef3f45b0-8c40-46e9-ad12-7311c834502d/images/eb1c951b-2fab-4e43-ae55-778e1652f6de/f64e24f3-f023-4080-9e2d-332e52e4069b
qemu-kvm(4676): 0x7f7d3991e639 /rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/ef3f45b0-8c40-46e9-ad12-7311c834502d/images/eb1c951b-2fab-4e43-ae55-778e1652f6de/f64e24f3-f023-4080-9e2d-332e52e4069b
qemu-kvm(4676): 0x7f7d3991e639: /rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/ef3f45b0-8c40-46e9-ad12-7311c834502d/images/eb1c951b-2fab-4e43-ae55-778e1652f6de/f64e24f3-f023-4080-9e2d-332e52e4069b bs->wr_highest_sector - old value: 0x8ba4ff
qemu-kvm(4676): 0x7f7d3991e639: /rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/ef3f45b0-8c40-46e9-ad12-7311c834502d/images/eb1c951b-2fab-4e43-ae55-778e1652f6de/f64e24f3-f023-4080-9e2d-332e52e4069b bs->wr_highest_sector - new value: 0x7fff00000000fe00

Thread-38874::INFO::2013-08-20 05:14:43,216::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 5368709120 capacity: 10737418240, alloc: 18302628885666988032, phys: 5368709120
Thread-38874::INFO::2013-08-20 05:14:45,269::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 5368709120 capacity: 10737418240, alloc: 18302628885666988032, phys: 6442450944
Thread-38874::INFO::2013-08-20 05:14:47,302::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 6442450944 capacity: 10737418240, alloc: 18302628885666988032, phys: 6442450944
Thread-38874::INFO::2013-08-20 05:14:49,310::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 6442450944 capacity: 10737418240, alloc: 18302628885666988032, phys: 6442450944
Thread-38874::INFO::2013-08-20 05:14:51,318::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 6442450944 capacity: 10737418240, alloc: 18302628885666988032, phys: 7516192768
Thread-38874::INFO::2013-08-20 05:14:53,325::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 7516192768 capacity: 10737418240, alloc: 18302628885666988032, phys: 7516192768
Thread-38874::INFO::2013-08-20 05:14:55,331::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 7516192768 capacity: 10737418240, alloc: 18302628885666988032, phys: 7516192768
Thread-38874::INFO::2013-08-20 05:14:57,338::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 7516192768 capacity: 10737418240, alloc: 18302628885666988032, phys: 8589934592
Thread-38874::INFO::2013-08-20 05:14:59,345::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 7516192768 capacity: 10737418240, alloc: 18302628885666988032, phys: 8589934592
Thread-38874::INFO::2013-08-20 05:15:01,357::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 7516192768 capacity: 10737418240, alloc: 18302628885666988032, phys: 8589934592
Thread-38874::INFO::2013-08-20 05:15:03,363::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 8589934592 capacity: 10737418240, alloc: 18302628885666988032, phys: 8589934592
Thread-38874::INFO::2013-08-20 05:15:05,369::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 8589934592 capacity: 10737418240, alloc: 18302628885666988032, phys: 9663676416
Thread-38874::INFO::2013-08-20 05:15:07,376::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 8589934592 capacity: 10737418240, alloc: 18302628885666988032, phys: 9663676416
Thread-38874::INFO::2013-08-20 05:15:09,383::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 9663676416 capacity: 10737418240, alloc: 18302628885666988032, phys: 9663676416
Thread-38874::INFO::2013-08-20 05:15:11,391::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 9663676416 capacity: 10737418240, alloc: 18302628885666988032, phys: 9663676416
Thread-38874::INFO::2013-08-20 05:15:13,397::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 9663676416 capacity: 10737418240, alloc: 18302628885666988032, phys: 10737418240
Thread-38874::INFO::2013-08-20 05:15:15,403::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 10737418240 capacity: 10737418240, alloc: 18302628885666988032, phys: 10737418240
Thread-38874::INFO::2013-08-20 05:15:17,423::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 10737418240 capacity: 10737418240, alloc: 18302628885666988032, phys: 10737418240
Thread-38874::INFO::2013-08-20 05:15:19,465::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 10737418240 capacity: 10737418240, alloc: 18302628885666988032, phys: 11811160064
Thread-38874::INFO::2013-08-20 05:15:21,474::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 11811160064 capacity: 10737418240, alloc: 18302628885666988032, phys: 11811160064
Thread-38874::INFO::2013-08-20 05:15:23,483::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 11811160064 capacity: 10737418240, alloc: 18302628885666988032, phys: 11811160064
Thread-38874::INFO::2013-08-20 05:15:25,492::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 11811160064 capacity: 10737418240, alloc: 18302628885666988032, phys: 12884901888
Thread-38874::INFO::2013-08-20 05:15:27,499::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 12884901888 capacity: 10737418240, alloc: 18302628885666988032, phys: 12884901888
Thread-38874::INFO::2013-08-20 05:15:29,506::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 12884901888 capacity: 10737418240, alloc: 18302628885666988032, phys: 12884901888
Thread-38874::INFO::2013-08-20 05:15:31,518::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 12884901888 capacity: 10737418240, alloc: 18302628885666988032, phys: 13958643712
Thread-38874::INFO::2013-08-20 05:15:33,525::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 13958643712 capacity: 10737418240, alloc: 18302628885666988032, phys: 13958643712
Thread-38874::INFO::2013-08-20 05:15:35,531::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 13958643712 capacity: 10737418240, alloc: 18302628885666988032, phys: 13958643712
Thread-38874::INFO::2013-08-20 05:15:37,541::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 13958643712 capacity: 10737418240, alloc: 18302628885666988032, phys: 15032385536
Thread-38874::INFO::2013-08-20 05:15:39,550::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 15032385536 capacity: 10737418240, alloc: 18302628885666988032, phys: 15032385536
Thread-38874::INFO::2013-08-20 05:15:41,560::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 15032385536 capacity: 10737418240, alloc: 18302628885666988032, phys: 15032385536
Thread-38874::INFO::2013-08-20 05:15:43,568::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 15032385536 capacity: 10737418240, alloc: 18302628885666988032, phys: 16106127360
Thread-38874::INFO::2013-08-20 05:15:45,577::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 16106127360 capacity: 10737418240, alloc: 18302628885666988032, phys: 16106127360
Thread-38874::INFO::2013-08-20 05:15:47,622::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 16106127360 capacity: 10737418240, alloc: 18302628885666988032, phys: 16106127360
Thread-38874::INFO::2013-08-20 05:15:49,628::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 16106127360 capacity: 10737418240, alloc: 18302628885666988032, phys: 17179869184
Thread-38874::INFO::2013-08-20 05:15:51,635::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 17179869184 capacity: 10737418240, alloc: 18302628885666988032, phys: 17179869184
Thread-38874::INFO::2013-08-20 05:15:53,642::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 17179869184 capacity: 10737418240, alloc: 18302628885666988032, phys: 17179869184
Thread-38874::INFO::2013-08-20 05:15:55,648::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 17179869184 capacity: 10737418240, alloc: 18302628885666988032, phys: 18253611008
Thread-38874::INFO::2013-08-20 05:15:57,655::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 18253611008 capacity: 10737418240, alloc: 18302628885666988032, phys: 18253611008
Thread-38874::INFO::2013-08-20 05:15:59,661::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 18253611008 capacity: 10737418240, alloc: 18302628885666988032, phys: 18253611008
Thread-38874::INFO::2013-08-20 05:16:01,673::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 18253611008 capacity: 10737418240, alloc: 18302628885666988032, phys: 19327352832
Thread-38874::INFO::2013-08-20 05:16:03,680::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 19327352832 capacity: 10737418240, alloc: 18302628885666988032, phys: 19327352832
Thread-38874::INFO::2013-08-20 05:16:05,687::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 19327352832 capacity: 10737418240, alloc: 18302628885666988032, phys: 19327352832
Thread-38874::INFO::2013-08-20 05:16:07,693::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 19327352832 capacity: 10737418240, alloc: 18302628885666988032, phys: 20401094656
Thread-38874::INFO::2013-08-20 05:16:09,701::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 20401094656 capacity: 10737418240, alloc: 18302628885666988032, phys: 20401094656
Thread-38874::INFO::2013-08-20 05:16:11,708::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 20401094656 capacity: 10737418240, alloc: 18302628885666988032, phys: 21474836480
Thread-38874::INFO::2013-08-20 05:16:13,731::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 20401094656 capacity: 10737418240, alloc: 18302628885666988032, phys: 21474836480
Thread-38874::INFO::2013-08-20 05:16:15,737::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 21474836480 capacity: 10737418240, alloc: 18302628885666988032, phys: 21474836480
Thread-38874::INFO::2013-08-20 05:16:17,750::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 21474836480 capacity: 10737418240, alloc: 18302628885666988032, phys: 21474836480
Thread-38874::INFO::2013-08-20 05:16:19,756::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 21474836480 capacity: 10737418240, alloc: 18302628885666988032, phys: 21474836480
Thread-38874::INFO::2013-08-20 05:16:21,764::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 21474836480 capacity: 10737418240, alloc: 18302628885666988032, phys: 21474836480
Thread-38874::INFO::2013-08-20 05:16:23,785::libvirtvm::141::vm.Vm::(_highWrite) vmId=`05960e70-83e3-4b88-9d6f-16e8c22fe016`::ef3f45b0-8c40-46e9-ad12-7311c834502d/f64e24f3-f023-4080-9e2d-332e52e4069b apparent: 21474836480 capacity: 10737418240, alloc: 18302628885666988032, phys: 21474836480
Comment 4 Federico Simoncelli 2013-08-20 06:03:10 EDT
(In reply to Lee Yarwood from comment #2)
> (In reply to Federico Simoncelli from comment #1)
> > (In reply to Lee Yarwood from comment #0)
> > > Description of problem:
> > > While VDSM is not responsible for the corrupt qcow2 volume behind this it
> > > should avoid entering an extend loop by ensuring that the highest allocated
> > > extent is within the capacity of the volume before proceeding.
> > 
> > Let's keep in mind that qcow2 has some overhead that grows linearly with the
> > image size. I think that somewhere else in the code we already estimate that
> > as +10% (even though it's probably far less).
> 
> AFAIK capacity should refer to the virtual size of the volume and not the
> physical size. With alloc being somewhere within the bounds of the virtual
> size / capacity of the volume.

Yes, if (in the semantic we're using) capacity == virtual size (what the guests see as disk size), then using it as upper bound is wrong because the allocated space required (physical LV size) should be higher in order to contain also the qcow2 overhead (roughly a large +10%).
The physical size is irrelevant because it's the current LV size.

This means that (wr_highest_sector * 512) must be < (capacity * 1.1).
Comment 5 Federico Simoncelli 2013-08-20 06:48:43 EDT
Probably it's as easy as adding an additional >= 0 check here:

 def _highWrite(self):
     ...
     capacity, alloc, physical = self._vm._dom.blockInfo(vmDrive.path, 0)

     if physical - alloc >= vmDrive.watermarkLimit:
         continue
     ...

Since physical - alloc < 0 is impossible.
Comment 6 Lee Yarwood 2013-08-21 06:32:26 EDT
(In reply to Federico Simoncelli from comment #5)
> Probably it's as easy as adding an additional >= 0 check here:
> 
>  def _highWrite(self):
>      ...
>      capacity, alloc, physical = self._vm._dom.blockInfo(vmDrive.path, 0)
> 
>      if physical - alloc >= vmDrive.watermarkLimit:
>          continue
>      ...
> 
> Since physical - alloc < 0 is impossible.

So I've tested this approach downstream, continuing to the next volume if physical - alloc < 0.

I had assumed QEMU would pass a ENOSPC error back through libvirt once the volume becomes fully allocated and the guest is paused. However thus far I've only seen QEMU throw EINVAL errors back though libvirt to VDSM. 

Haim previously found this in 6.2 with BZ#710176 but it was closed out as WORKSFORME. I'm going to review the QMP traffic for any clues and will reopen the BZ later today.
Comment 7 Lee Yarwood 2013-10-16 09:51:54 EDT
Created attachment 812936 [details]
corrupt.stp reproducer

A fresh reproducer for qemu-kvm-rhev-0.12.1.2-2.355.el6_4.9.x86_64. As before install stap, kernel and qemu-kvm-rhev debuginfo then run :

# stap -g corrupt.stap "/full/rhev/path/to/volume" ${sector_outside_of_virtual_size}
Comment 12 Leonid Natapov 2013-11-27 11:48:02 EST
Sergey,how can I test it ?
Comment 13 Sergey Gotliv 2013-11-27 17:47:26 EST
Leonid,

1. Create and run VM with qcow2 volume
2. Lee provided a script (see comment#3) to corrupt that volume. Please, run it.
3. Verify that VM is paused.
Comment 15 Leonid Natapov 2013-12-12 08:36:48 EST
is26. tested with the corrupt script. VM is paused after script was executed.
Comment 16 errata-xmlrpc 2014-01-21 11:13:09 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html

Note You need to log in before you can comment on or make changes to this bug.