Description of problem: Deleting a snapshot disk while the vm is live causes lv-extend to loop and use up all available space Version-Release number of selected component (if applicable): rhevm-3.5.1-0.1.el6ev.noarch vdsm-4.16.12-2.el7ev.x86_64 How reproducible: Always Steps to Reproduce: Create vm with 2 block disks Create a snapshot Start the vm Via the Storage > Disk Snapshot tab delete one of the snapshot disks When deleting the second snapshot disk the the storage domain is reported as having run out of disk space Actual results: Disk ran out of disk space due to lvextend exception Expected results: Should not reach lvextend exception Additional info: engine.log......................................................................... 2015-03-03 18:36:21,262 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-36) [1804c42c] Correlation ID: 1804c42c, Job ID: c0 a7dab4-9c77-463a-9a11-f27b1335dcb9, Call Stack: null, Custom Event ID: -1, Message: Disk '1177138_Disk1' from Snapshot(s) 'snap1' of VM '1177138' deletion was initiated by admin@intern al. . . . 2015-03-03 18:38:53,928 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-56) [38d525f8] Correlation ID: 1804c42c, Job ID: c0a7dab4-9c77-463a-9a11-f27b1335dcb9, Call Stack: null, Custom Event ID: -1, Message: Disk '1177138_Disk1' from Snapshot(s) '<UNKNOWN>' of VM '1177138' deletion has been completed (User: admin@internal). vdsm.log......................................................................... 6c683fa6-a14b-441a-8065-2f4a976357a1::ERROR::2015-03-03 18:36:42,427::storage_mailbox::172::Storage.SPM.Messages.Extend::(processRequest) processRequest: Exception caught while trying to extend volume: 26b70af6-e0b3-4cac-b9cb-c5fafd1ce611 in domain: ef0105d0-d3cb-403d-8f68-61a2393f9798 Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 166, in processRequest pool.extendVolume(volume['domainID'], volume['volumeID'], size) File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1303, in extendVolume sdCache.produce(sdUUID).extendVolume(volumeUUID, size, isShuttingDown) File "/usr/share/vdsm/storage/blockSD.py", line 1315, in extendVolume lvm.extendLV(self.sdUUID, volumeUUID, size) # , isShuttingDown) File "/usr/share/vdsm/storage/lvm.py", line 1152, in extendLV _resizeLV("lvextend", vgName, lvName, size) File "/usr/share/vdsm/storage/lvm.py", line 1146, in _resizeLV free_size / constants.MEGAB)) VolumeGroupSizeError: Volume Group not big enough: ('ef0105d0-d3cb-403d-8f68-61a2393f9798/26b70af6-e0b3-4cac-b9cb-c5fafd1ce611 4096 > 768 (MiB)',) 2015-03-03 18:34:22,776::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvextend --config ' devices
Created attachment 997803 [details] server, engine and vdsm logs Adding logs
Possible duplicate of bug 1196049. Adam/Nir - please take a look?
Kevin, please attach the output of cat /etc/redhat-release rpm -qa
Created attachment 998038 [details] vdsm log with domainxml
In the vdsm logs Kevin posted, we can see lot of extendDrivesIfNeeded calls, showing wrong allocation/physical readings from libvirt. This readings happen when a block disk is treated as file disk by libvirt. I could not reproduce this with master on Fedora 21: 1. Use existing vm with one block disk 2. Add 2 thin provisioned block disks, 1G each 3. Create snapshot including only the 2 new disks 4. Start vm 5. Select "Storage"/"Disk snapshots" tab 6. Delete one disk snapshot 7. Delete the other No unusual extension request seen in the log, and disk type seems to be correct. Kevin, can you reproduce this using the steps above?
Created attachment 998048 [details] vdsm log with domainxml using ovirt-3.5
I could not reproduce it with ovirt-3.5, which is similar to rhev-3.5.1 on Fedora 21. So it is either RHEL 7.1 related, or the steps to reproduce are incorrect.
Unreproducable for me on RHEL-7.1 beta with the following steps: Create vm with 2 block disks Create a snapshot Start the vm Via the Storage > Disk Snapshot tab delete one of the disk snapshots Via the Storage > Disk Snapshot tab delete the final disk snapshot Both are removed successfully and no unwarranted extensions were observed. Software versions: libvirt-1.2.8-16.el7.x86_64 qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64 vdsm-4.16.10-8.gitc937927.el7.x86_64 I'd really appreciate to know the package environment of the initial reproduction. Also, please make sure you are testing on a fully updated system.
(In reply to Nir Soffer from comment #3) > Kevin, please attach the output of > cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.1 (Maipo) > rpm -qa vdsm-4.16.12-2.el7ev.x86_64 qemu-kvm-rhev-2.1.2-23.el7_1.1.x86_64 libvirt-client-1.2.8-16.el7.x86_64 libvirt-daemon-config-nwfilter-1.2.8-16.el7.x86_64 libvirt-daemon-driver-nodedev-1.2.8-16.el7.x86_64 libvirt-daemon-driver-storage-1.2.8-16.el7.x86_64 libvirt-daemon-1.2.8-16.el7.x86_64 libvirt-python-1.2.8-7.el7.x86_64 libvirt-daemon-driver-interface-1.2.8-16.el7.x86_64 libvirt-lock-sanlock-1.2.8-16.el7.x86_64 libvirt-daemon-driver-network-1.2.8-16.el7.x86_64 libvirt-daemon-kvm-1.2.8-16.el7.x86_64 libvirt-daemon-driver-nwfilter-1.2.8-16.el7.x86_64 libvirt-daemon-driver-secret-1.2.8-16.el7.x86_64 libvirt-daemon-driver-qemu-1.2.8-16.el7.x86_64
Evidence from vdsm logs, explaining comment 5: 1. The log do not show the creation of the disks - next time please post logs that show the entire tested flow 2. As soon as we finish the *first* merge, we start to get bad readings of the current volume watermarks: Thread-523::INFO::2015-03-03 18:34:12,472::logUtils::47::dispatcher::(wrapper) Run and protect: imageSyncVolumeChain, Return response: None ... Thread-523::INFO::2015-03-03 18:34:12,473::vm::6097::vm.Vm::(run) vmId=`d593af3c-f40a-4047-b543-4e28ff5e5cb2`::Synchronization completed (job 30b7ac94-beab-4bfa-80a8-e2f8c85781b5) ... Thread-456::INFO::2015-03-03 18:34:12,576::vm::2547::vm.Vm::(extendDrivesIfNeeded) vmId=`d593af3c-f40a-4047-b543-4e28ff5e5cb2`::Requesting extension for volume 5e22f70f-84ba-4723-9ad5-34aeec7cccdf on domain ef01 05d0-d3cb-403d-8f68-61a2393f9798 (apparent: 1073741824, capacity: 3221225472, allocated: 3221225472, physical: 3221225472) ... Thread-456::INFO::2015-03-03 18:34:16,619::vm::2547::vm.Vm::(extendDrivesIfNeeded) vmId=`d593af3c-f40a-4047-b543-4e28ff5e5cb2`::Requesting extension for volume 5e22f70f-84ba-4723-9ad5-34aeec7cccdf on domain ef0105d0-d3cb-403d-8f68-61a2393f9798 (apparent: 4294967296, capacity: 4294967296, allocated: 4294967296, physical: 4294967296) ... Thread-456::INFO::2015-03-03 18:34:20,637::vm::2547::vm.Vm::(extendDrivesIfNeeded) vmId=`d593af3c-f40a-4047-b543-4e28ff5e5cb2`::Requesting extension for volume 5e22f70f-84ba-4723-9ad5-34aeec7cccdf on domain ef0105d0-d3cb-403d-8f68-61a2393f9798 (apparent: 5368709120, capacity: 5368709120, allocated: 5368709120, physical: 5368709120) Note allocated==physical - it looks like the drive is treated as type="file" but we don't have any info from libvirt to understand what happened. Kevin, we need to reproduce this again your setup. When you reproduce, please dump the libvirt domain xml before and after each snapshot. To dump domain xml you can do: virsh dumpxml vm-name > before-removing-sanpshot1.xml
Ok, I managed to reproduce it on ovirt-3.5. The correct steps to reproduce it are: 1. Create *preallocated* disk on block storage domain 2. Create snapshot including this disk 3. Start vm 4. From "Storage/Disk snapshots" remove the disk snapshot As soon as we finish the merge we start to get bad watermark readings leading to unlimited extension of the drive. Looking in libvirt domain xml, we can see that the drive format was converted to "raw": Before the pivot operation: <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/e64fa26b-8c43-4090-8a9b-f7dbe293982b/2d031756-70ed-4c74-9672-121d621038d6'/> <backingStore type='block' index='1'> <format type='raw'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/e64fa26b-8c43-4090-8a9b-f7dbe293982b/../e64fa26b-8c43-4090-8a9b-f7dbe293982b/f6c44444-2dfd- 4fe9-a989-cd48fddf9031'/> <backingStore/> </backingStore> <mirror type='block' job='active-commit' ready='yes'> <format type='raw'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/e64fa26b-8c43-4090-8a9b-f7dbe293982b/../e64fa26b-8c43-4090-8a9b-f7dbe293982b/f6c44444-2dfd-4fe9-a989-cd48fddf9031'/> </mirror> <target dev='vdb' bus='virtio'/> <serial>e64fa26b-8c43-4090-8a9b-f7dbe293982b</serial> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk> After the pivot operation: <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/cca1d436-8c16-4887-8c07-2a9ecc1c0830/54ba4f22-ee1a-4851-9710-0b242ebdc289/images/e64fa26b-8c43-4090-8a9b-f7dbe293982b/../e64fa26b-8c43-4090-8a9b-f7dbe293982b/f6c44444-2dfd-4fe9-a989-cd48fddf9031'/> <backingStore/> <target dev='vdb' bus='virtio'/> <serial>e64fa26b-8c43-4090-8a9b-f7dbe293982b</serial> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk> Since the disk format was changed to "raw", libvirt watermarks are incorrect. Although disk format was changed to "raw", vdsm continue to check if the the drive need extension: Thread-6767::INFO::2015-03-05 12:26:23,981::vm::2547::vm.Vm::(extendDrivesIfNeeded) vmId=`5f28c331-4460-45fc-bcd1-ff9f7bc67415`::Requesting extension for volume f6c44444-2dfd-4fe9-a989-cd48fddf9031 on domain 54ba4f22-ee1a-4851-9710-0b242ebdc289 (apparent: 1073741824, capacity: 3221225472, allocated: 3221225472, physical: 3221225472) Vdsm should stop checking such drive, but it seems that the drive format was not updated after the pivot operation. This issue was fixed in upstream in: commit 2369e831b157aab3fadde64c1507410b1484198b Author: Adam Litke <alitke> Date: Tue Jan 13 11:46:28 2015 -0500 Live Merge: Update drive.format after active layer merge A merge of the active layer can change the drive format from cow -> raw if a snapshot was merged into a raw backing file. In that case we must correct the VM Drive metadata to ensure the drive is handled properly after the merge has finished. Change-Id: Ieb64bbfe798a27896442a173b7dac41cebc92543 Signed-off-by: Adam Litke <alitke> Reviewed-on: http://gerrit.ovirt.org/36923 Reviewed-by: Francesco Romani <fromani> Reviewed-by: Nir Soffer <nsoffer> Reviewed-by: Dan Kenigsberg <danken> This patch was not backported to 3.5.
Created attachment 998498 [details] vdsm log with domainxml using ovirt-3.5 and preallocated disk
Removing the needinfo, as this bug is resolved.
Fixed the component to VDSM, which reset the flags. Yaniv/Aharon, please re-ack.
Test using version: V3.6 ovirt-engine-3.6.0-0.0.master.20150412172306.git55ba764.el6.noarch vdsm-4.17.0-632.git19a83a2.el7.x86_64 Test scenario: Steps to Reproduce: Create vm with 2 block disks Create a snapshot Start the vm Via the Storage > Disk Snapshot tab delete one of the snapshot disks Via the Storage > Disk Snapshot tab delete the second snapshot disk Both disks are deleted successfully and the storage displays the correct space allocation. Moving to Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html