Bug 1176673
Summary: | [Rhel7.1] After live storage migration on block storage vdsm extends migrated drive using all free space in the vg | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ori Gofen <ogofen> | ||||
Component: | vdsm | Assignee: | Nir Soffer <nsoffer> | ||||
Status: | CLOSED ERRATA | QA Contact: | Ori Gofen <ogofen> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 3.5.0 | CC: | acanan, adahms, alitke, amureini, bazulay, bhamrick, cmestreg, cww, eblake, fromani, gklein, lbopf, lpeer, lsurette, michele, mkalinin, nsoffer, ogofen, rlocke, scohen, vanhoof, yeylon, ykaul, ylavi | ||||
Target Milestone: | ovirt-3.6.0-rc | Keywords: | AutomationBlocker, ZStream | ||||
Target Release: | 3.6.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1174791 | ||||||
: | 1195461 1196049 1197615 1198128 (view as bug list) | Environment: | |||||
Last Closed: | 2016-03-09 19:27:55 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1091094, 1174791, 1196066, 1196067 | ||||||
Bug Blocks: | 1035038, 1082754, 1196049, 1197615, 1198128 | ||||||
Attachments: |
|
Comment 1
Allon Mureinik
2014-12-23 08:10:11 UTC
libvirt-daemon-driver-network-1.2.8-10.el7.x86_64 libvirt-daemon-driver-nodedev-1.2.8-10.el7.x86_64 libvirt-lock-sanlock-1.2.8-10.el7.x86_64 libvirt-client-1.2.8-10.el7.x86_64 libvirt-daemon-driver-qemu-1.2.8-10.el7.x86_64 libvirt-daemon-driver-secret-1.2.8-10.el7.x86_64 libvirt-daemon-config-nwfilter-1.2.8-10.el7.x86_64 libvirt-python-1.2.8-6.el7.x86_64 libvirt-daemon-1.2.8-10.el7.x86_64 libvirt-daemon-driver-interface-1.2.8-10.el7.x86_64 libvirt-daemon-kvm-1.2.8-10.el7.x86_64 libvirt-daemon-driver-storage-1.2.8-10.el7.x86_64 libvirt-daemon-driver-nwfilter-1.2.8-10.el7.x86_64 (In reply to Allon Mureinik from comment #1) > Ori, iiuc, the version information above (except for VDSM) pertains to a 7.0 libvirt version on rhel7.0 host: # rpm -q libvirt-daemon libvirt-daemon-1.1.1-29.el7_0.4.x86_64 *** Bug 1186348 has been marked as a duplicate of this bug. *** Patch 37726 does not fix the root cause of this issue, but at least it limits the damage. Instead of growing until the vg is full, the device will grow to the virtual size, making the device practically pre-allocated. The description of this bug contains some mistakes I'll briefly sum things up here in order to avoid any miss understandings. This bug deals with live migrating cow sparse disk while doing lot of io (like installing OS and lsm) the result is that the live snapshot volume on the targeted domain, loops through lv extension which use up all free space on the targeted domain. the behavior on the source domain seems correct, the lv's are successfully removed I'm trying to see if this is a libvirt problem, by isolating it down to a smaller testcase. Is the problem here that you are doing some action that creates a snapshot, and libvirt then treats the <disk> as a file instead of a block device? Libvirt reports allocation differently for files (based on file size) than for block devices (based on querying qemu for highest sector written), so anything that happens that accidentally converts from a block device to a file could then cause vdsm to see a much larger allocation number, and enter a death spiral of trying to enlarge the device to account for the larger allocation. To know for sure, I'd need to know what the domain XML looks like before and after the point where it goes into the allocation growth spiral, and it would also be nice to have a trace of what libvirt commands are being used (especially if there is a tight loop of querying disk statistics to learn the allocation number). Created attachment 994165 [details]
vdsm debug log on fedora 21
This log show libvirt domain xml during live storage migration flow.
In this flow:
1. We create a snapshot on block storage
2. We use blockRebase to mirror the sanpshot to a disk on another block storage
3. Finally we use blockAbort to pivot to the new disk
To look for domain xml, search _logDomainXML
(In reply to Eric Blake from comment #7) > for the larger allocation. To know for sure, I'd need to know what the > domain XML looks like before and after the point where it goes into the > allocation growth spiral Please check domain xml in attachment 994165 [details] - look for _logDomainXML. 1. Before snapshotCreateXML) <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/04e5c687-b7e4-4968-9f45-d359aa98938b/4b4e65a2-22dc-4166-a097-fc9662c80d11'/> <backingStore/> <target dev='vda' bus='virtio'/> <serial>04e5c687-b7e4-4968-9f45-d359aa98938b</serial> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> 2. Snapshot xml <domainsnapshot> <disks> <disk name="vda" snapshot="external" type="block"> <source dev="/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/04e5c687-b7e4-4968-9f45-d359aa98938b/466e7cf0-ee80-499d-8d2a-c05e605a07cc" file="/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/04e5c687-b7e4-4968-9f45-d359aa98938b/466e7cf0-ee80-499d-8d2a-c05e605a07cc" type="block"/> </disk> </disks> </domainsnapshot> (We are using both dev= and file= as suggested by you to support older libvirt version which do not support type=block.) 3. After createSnapshotXML <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/04e5c687-b7e4-4968-9f45-d359aa98938b/466e7cf0-ee80-499d-8d2a-c05e605a07cc'/> <backingStore type='block' index='1'> <format type='qcow2'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/04e5c687-b7e4-4968-9f45-d359aa98938b/../04e5c687-b7e4-4968-9f45-d359aa98938b/4b4e65a2-22dc-4166-a097-fc9662c80d11'/> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <serial>04e5c687-b7e4-4968-9f45-d359aa98938b</serial> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> 4. After blockRebase <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/04e5c687-b7e4-4968-9f45-d359aa98938b/466e7cf0-ee80-499d-8d2a-c05e605a07cc'/> <backingStore type='block' index='1'> <format type='qcow2'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/04e5c687-b7e4-4968-9f45-d359aa98938b/../04e5c687-b7e4-4968-9f45-d359aa98938b/4b4e65a2-22dc-4166-a097-fc9662c80d11'/> <backingStore/> </backingStore> <mirror type='file' file='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/661f1926-04cc-405a-85eb-802563b48ed3/images/04e5c687-b7e4-4968-9f45-d359aa98938b/466e7cf0-ee80-499d-8d2a-c05e605a07cc' format='qcow2' job='copy'> <format type='qcow2'/> <source file='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/661f1926-04cc-405a-85eb-802563b48ed3/images/04e5c687-b7e4-4968-9f45-d359aa98938b/466e7cf0-ee80-499d-8d2a-c05e605a07cc'/> </mirror> <target dev='vda' bus='virtio'/> <serial>04e5c687-b7e4-4968-9f45-d359aa98938b</serial> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> Note <mirror type=file.. 5. After blockAboort <disk type='file' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source file='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/661f1926-04cc-405a-85eb-802563b48ed3/images/04e5c687-b7e4-4968-9f45-d359aa98938b/466e7cf0-ee80-499d-8d2a-c05e605a07cc'/> <backingStore type='block' index='1'> <format type='qcow2'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/661f1926-04cc-405a-85eb-802563b48ed3/images/04e5c687-b7e4-4968-9f45-d359aa98938b/../04e5c687-b7e4-4968-9f45-d359aa98938b/4b4e65a2-22dc-4166-a097-fc9662c80d11'/> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <serial>04e5c687-b7e4-4968-9f45-d359aa98938b</serial> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> The disk type was converted to file, explaining the bad blockInfo readings. The attached patches are a partial fix that limits the damage. For the real fix we are blocked on libvirt bug 1195461. The new patch (http://gerrit.ovirt.org/38142) fixes the root cause, depending on fixed version of libvirt and libvirt-python. The current proposed doctext is: """ Libvirt 1.2.8 introduced a regression where disk type is converted from "block" to "file" after live storage migration. This breaks the disk extension logic, leading to unwanted extension using all free space in the storage domain. libvirt-daemon-1.2.8-16.el7_1.1 and libvirt-python-1.2.8-6.el7_1.1 fixed this issue by adding a new flag. RHEV-3.5.0-2 is using the new flag when available, fixing this issue. However, the fix in RHEV-3.5.0-2 is effective only when using fixed versions of libvirt-daemon and libvirt-python. Do not perform live storage migration on block storage in RHEL 7.1 unless unless you have the fixed version of libvirt-daemon (libvirt-daemon-1.2.8-16.el7_1.1) and libvirt-python (libvirt-python-1.2.8-6.el7_1.1). These versions should be available as a zero-day async release with RHEL 7.1. """ Most of this is no longer relevant in RHEV 3.6.0. RHEV 3.6.0 requires appropriate libvirt and libvirt-python versions with the relevant fix, so this message becomes a non issue. Should I set requires-doctext- ? Hi Allon, Thank you for the needinfo request. Shall I add this bug to the release notes for the RHEV 3.5.1 release? The flag is already set to '?', so we can review the text and add it in for you. Also, understood that most of this will not be relevant for 3.6.0. Kind regards, Andrew (In reply to Andrew Dahms from comment #15) > Hi Allon, > > Thank you for the needinfo request. > > Shall I add this bug to the release notes for the RHEV 3.5.1 release? The > flag is already set to '?', so we can review the text and add it in for you. > > Also, understood that most of this will not be relevant for 3.6.0. The current text is no longer relevant for neither 3.6.0 or 3.5.1. Can you please remove it from both release notes? Hi Allon, Understood, and thank you for the clarification. Done and done! Please let me know if there is anything else I can do for you. Kind regards, Andrew verified on downstream Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html |