Created attachment 1699239 [details] engine and vdsm logs Description of problem: Creating a snapshot for a running ISCSI vm (from template) with different disk permutations fails with 'wrong capacity' vdsm exception [1]. I can not say with 100% certinty that this is regression bug because i couldnt reproduce it. But, this is still a bug and need to be treated. [1]: 2020-06-29 22:37:55,012+0300 INFO (tasks/3) [storage.Volume] Request to create snapshot 1d3b56ad-73f1-494a-b990-cd5030540cff/ad4a59c3-5107-49f6-be40-d04217870c79 of volume 1d3b56ad-73f1-494a-b990-cd5030540cff/4071b1f9-60e4-4918-82a2-67218292165e with capacity 10737418240 (blockVolume:527) 2020-06-29 22:37:55,014+0300 DEBUG (tasks/5) [storage.Volume] cloning volume /rhev/data-center/mnt/blockSD/cf04a2e3-6ca1-40f9-b3eb-64393f991d96/images/0170a32e-3dbd-42a0-89a8-f2b16125a14d/619d9509-c182-41e0-913d-1cbaab7e5064 to /rhev/data-center/mnt/blockSD/cf04a2e3-6ca1-40f9-b3eb-64393f991d96/images/0170a32e-3dbd-42a0-89a8-f2b16125a14d/aa8334d1-df5d-4294-90a4-1639c62003cc (volume:956) 2020-06-29 22:37:55,014+0300 DEBUG (tasks/5) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-15 /usr/bin/dd iflag=direct skip=2432 bs=512 if=/dev/cf04a2e3-6ca1-40f9-b3eb-64393f991d96/metadata count=1 (cwd None) (commands:211) 2020-06-29 22:37:55,026+0300 DEBUG (tasks/3) [storage.TaskManager.Task] (Task='022e276a-578d-4964-b08d-3267ad467b26') _save: orig /rhev/data-center/33972a72-d9cd-426d-a688-353cf42f7eb3/mastersd/master/tasks/022e276a-578d-4964-b08d-3267ad467b26 temp /rhev/data-center/33972a72-d9cd-426d-a688-353cf42f7eb3/mastersd/master/tasks/022e276a-578d-4964-b08d-3267ad467b26.temp (task:781) 2020-06-29 22:37:55,034+0300 DEBUG (tasks/7) [storage.Misc.excCmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n512 bytes copied, 0.000510976 s, 1.0 MB/s\n'; <rc> = 0 (commands:224) 2020-06-29 22:37:55,034+0300 DEBUG (tasks/7) [storage.Misc] err: [b'1+0 records in', b'1+0 records out', b'512 bytes copied, 0.000510976 s, 1.0 MB/s'], size: 512 (misc:114) 2020-06-29 22:37:55,035+0300 ERROR (tasks/7) [storage.Volume] Requested capacity 1073741824 < parent capacity 10737418240 (volume:1138) 2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Trying to release resource '01_img_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.03079d6a-b2fa-406c-9087-4dd626fd6839' (resourceManager:548) 2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Released resource '01_img_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.03079d6a-b2fa-406c-9087-4dd626fd6839' (0 active users) (resourceManager:566) 2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Resource '01_img_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.03079d6a-b2fa-406c-9087-4dd626fd6839' is free, finding out if anyone is waiting for it. (resourceManager:572) 2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Trying to release resource '02_vol_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.bc9f19ff-02d1-4d90-8745-e9b67e56ddf2' (resourceManager:548) 2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Released resource '02_vol_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.bc9f19ff-02d1-4d90-8745-e9b67e56ddf2' (0 active users) (resourceManager:566) 2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Resource '02_vol_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.bc9f19ff-02d1-4d90-8745-e9b67e56ddf2' is free, finding out if anyone is waiting for it. (resourceManager:572) 2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] No one is waiting for resource '02_vol_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.bc9f19ff-02d1-4d90-8745-e9b67e56ddf2', Clearing records. (resourceManager:580) 2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] No one is waiting for resource '01_img_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.03079d6a-b2fa-406c-9087-4dd626fd6839', Clearing records. (resourceManager:580) 2020-06-29 22:37:55,036+0300 ERROR (tasks/7) [storage.TaskManager.Task] (Task='71ccdf08-e8f1-4257-bfd9-3d718fc7f6a3') Unexpected error (task:880) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run return fn(*args, **kargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1874, in createVolume initial_size=initialSize) File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 993, in createVolume initial_size=initial_size) File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1139, in create raise se.InvalidParameterException("capacity", capacity) vdsm.storage.exception.InvalidParameterException: Invalid parameter: 'capacity=1073741824' Version-Release number of selected component (if applicable): rhv-4.4.1-5 How reproducible: Reproduced only once by automation run. Tried to reproduce manually without success. Steps to Reproduce: 1. clone vm from a pre-defined template latest-rhel-guest-image-8.2-infra with {'name': 'vm_TestCase5061_2922354448', 'cluster': 'golden_env_mixed_1', 'clone': False, 'vol_sparse': True, 'vol_format': 'cow', 'storagedomain': 'iscsi_2', 'virtio_scsi': True, 'display_type': 'spice', 'type': 'desktop', 'timeout': 900, 'wait': True} 2. Start the vm 3. Creating all disk permutations: virtio scsi cow virtio scsi raw virtio cow virtio raw 4. Add snapshot to VM vm_TestCase5061_2922354448 with {'persist_memory': False, 'description': 'snap_TestCase5061_2922375307', 'wait': True} Actual results: Snapshot fails to create Expected results: Shouldnt fail
Tried to reproduced with what Benny suggested in 1758048#c51 : "it's easy to reproduce by creating a raw disk on iscsi with <1G size and then trying to create a snapshot" But the snapshot succeeded for me.
Might be related to bug 1826365
Ilan, did we saw this issue again? If not I suggest closing it as 'WORKSFORME'.
(In reply to Eyal Shenitzky from comment #3) > Ilan, did we saw this issue again? > If not I suggest closing it as 'WORKSFORME'. The described issue was Never seen again. Closing as 'WORKSFORME'
I am reopening as we were able to reproduce with vdsm-4.40.50.9-1.el8ev.x86_64 during scale tests. https://gitlab.com/nirs/ovirt-stress
vdsm.log.4.xz:2021-04-05 03:31:53,852+0200 DEBUG (tasks/2) [storage.TaskManager.Task] (Task='8e702233-a269-4297-84ba-6d9a4ecf58af') aborting: Task is aborted: "value=Invalid parameter: 'capacity=1073741824' abortedcode=100" (task:935) vdsm.log.4.xz:2021-04-05 03:45:12,035+0200 DEBUG (tasks/1) [storage.TaskManager.Task] (Task='c6525f99-5d32-4f6c-a507-dd2ae84ffa2e') aborting: Task is aborted: "value=Invalid parameter: 'capacity=1073741824' abortedcode=100" (task:935) vdsm.log.4.xz:2021-04-05 03:53:42,817+0200 DEBUG (tasks/1) [storage.TaskManager.Task] (Task='d05b8095-00df-48c4-b6ab-fe8819a0cf3e') aborting: Task is aborted: "value=Invalid parameter: 'capacity=1073741824' abortedcode=100" (task:935)
Benny, can you please take a look?
(In reply to Roman Hodain from comment #10) > vdsm.log.4.xz:2021-04-05 03:31:53,852+0200 DEBUG (tasks/2) > [storage.TaskManager.Task] (Task='8e702233-a269-4297-84ba-6d9a4ecf58af') > aborting: Task is aborted: "value=Invalid parameter: 'capacity=1073741824' ... Roman, do you have complete logs from these incedents? We need vdsm log from the API call to create a volume, showing which values were sent by engine, until the call fail with "Invalid parameter: 'capacity=1073741824'".
(In reply to Nir Soffer from comment #14) Roman, I found the logs you attached, sorry for the noise.
I managed to reproduce this issue locally - 1. Import 'Cirros' image from a glance as a template and use a file-based storage domain for it. 2. Create a VM from the template as 'clone', in the resource allocation select block-based domain for the disk and select 'RAW' format. 3. Try to create a snapshot (cold) for the VM created from the template. Snapshot creation failed with the following message - 2021-11-01 03:52:42,842-04 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) [] BaseAsyncTask::logEndTaskFailure: Task '0544902b-59e1-41ec-936c-e311a3ed60cf' (Parent Command 'CreateSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = value=Invalid parameter: 'capacity=46137344' abortedcode=100, code = 100', -- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = value=Invalid parameter: 'capacity=46137344' abortedcode=100, code = 100' Engine and VDSM logs added.
(In reply to Eyal Shenitzky from comment #20) > I managed to reproduce this issue locally - > > 1. Import 'Cirros' image from a glance as a template and use a file-based > storage domain for it. > 2. Create a VM from the template as 'clone', in the resource allocation > select block-based domain for the disk and select 'RAW' format. > 3. Try to create a snapshot (cold) for the VM created from the template. > > Snapshot creation failed with the following message - > > 2021-11-01 03:52:42,842-04 ERROR > [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) [] > BaseAsyncTask::logEndTaskFailure: Task > '0544902b-59e1-41ec-936c-e311a3ed60cf' (Parent Command 'CreateSnapshot', > Parameters Type > 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with > failure: > -- Result: 'cleanSuccess' > -- Message: 'VDSGenericException: VDSErrorException: Failed to > HSMGetAllTasksStatusesVDS, error = value=Invalid parameter: > 'capacity=46137344' abortedcode=100, code = 100', > -- Exception: 'VDSGenericException: VDSErrorException: Failed to > HSMGetAllTasksStatusesVDS, error = value=Invalid parameter: > 'capacity=46137344' abortedcode=100, code = 100' > > Engine and VDSM logs added. This is the VDSM error - 2021-11-01 03:52:33,066-0400 ERROR (tasks/2) [storage.Volume] Requested capacity 46137344 < parent capacity 134217728 (volume:1223) 2021-11-01 03:52:33,066-0400 ERROR (tasks/2) [storage.TaskManager.Task] (Task='0544902b-59e1-41ec-936c-e311a3ed60cf') Unexpected error (task:877) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run return fn(*args, **kargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1945, in createVolume initial_size=initialSize, add_bitmaps=addBitmaps) File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1216, in createVolume initial_size=initial_size, add_bitmaps=add_bitmaps) File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1224, in create raise se.InvalidParameterException("capacity", capacity) vdsm.storage.exception.InvalidParameterException: Invalid parameter: 'capacity=46137344'
QE doesn't have the capacity to verify this bug during 4.5.1. The team focuses on CBT tasks.
I didn't find 'Cirros' img from glance, so I tried to use some other image from glance and import it as a template (latest_rhcos). I couldn't reproduce it on a version without the fix (used: ovirt-engine4.5.0.6-0.7.el8ev, vdsm-4.50.0.13-1.el8ev) so I don't know if these are the right steps to reproduce it, maybe something is wrong with the image I used? cause all the rest of the steps I did the exact as described in comment 20: " 1. Import 'Cirros' image from a glance as a template and use a file-based storage domain for it. 2. Create a VM from the template as 'clone', in the resource allocation select block-based domain for the disk and select 'RAW' format. 3. Try to create a snapshot (cold) for the VM created from the template. "
Verified. CirrOS wasn't inside glance so I uploaded a CirrOS image less than 128MB, and then created a template from the VM with this image and continued the flow according to the steps in comment 20. could reproduce It on a version without the fix and didn't reproduced in the latest version. Versions: ovirt-engine4.5.1-0.62.el8ev vdsm-4.50.1.1-1.el8ev
Sophie, on which version did you reproduce this issue?
Benny, how this issue was fixed? We know that this happens when system is under load (e.g. during delete snapshot stress test). I think we don't understand yet why this happens and since no patch is attached to this bug I assume this was never fixed.
(In reply to Nir Soffer from comment #32) > Benny, how this issue was fixed? We know that this happens when system is > under load > (e.g. during delete snapshot stress test). I think we don't understand yet > why this > happens and since no patch is attached to this bug I assume this was never > fixed. I fixed it based on Eyal's reproducer which is consistent (comment #20)
(In reply to Nir Soffer from comment #32) > I think we don't understand yet why this happens and since no patch is attached to this bug I assume this was never fixed. The status here is incorrect - the attached PR (https://github.com/oVirt/ovirt-engine/pull/267) is merged
(In reply to Nir Soffer from comment #31) > Sophie, on which version did you reproduce this issue? ovirt-engine-4.5.0.7-0.9.el8ev vdsm-4.50.0.13-1.el8ev
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5555