Bug 1852308 - Snapshot fails to create with 'Invalid parameter: 'capacity=1073741824'' Exception
Summary: Snapshot fails to create with 'Invalid parameter: 'capacity=1073741824'' Exce...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.5.1
: ---
Assignee: Benny Zlotnik
QA Contact: sshmulev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-30 06:05 UTC by Ilan Zuckerman
Modified: 2022-08-22 06:34 UTC (History)
12 users (show)

Fixed In Version: ovirt-engine-4.5.1
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-14 12:54:30 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine and vdsm logs (670.92 KB, application/zip)
2020-06-30 06:05 UTC, Ilan Zuckerman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 267 0 None Draft core: always update volume size when creating a new image 2022-04-13 10:39:39 UTC
Red Hat Product Errata RHSA-2022:5555 0 None None None 2022-07-14 12:55:15 UTC

Description Ilan Zuckerman 2020-06-30 06:05:47 UTC
Created attachment 1699239 [details]
engine and vdsm logs

Description of problem:

Creating a snapshot for a running ISCSI vm (from template) with different disk permutations fails with 'wrong capacity' vdsm exception [1].
I can not say with 100% certinty that this is regression bug because i couldnt reproduce it. But, this is still a bug and need to be treated.


[1]:
2020-06-29 22:37:55,012+0300 INFO  (tasks/3) [storage.Volume] Request to create snapshot 1d3b56ad-73f1-494a-b990-cd5030540cff/ad4a59c3-5107-49f6-be40-d04217870c79 of volume 1d3b56ad-73f1-494a-b990-cd5030540cff/4071b1f9-60e4-4918-82a2-67218292165e with capacity 10737418240 (blockVolume:527)
2020-06-29 22:37:55,014+0300 DEBUG (tasks/5) [storage.Volume] cloning volume /rhev/data-center/mnt/blockSD/cf04a2e3-6ca1-40f9-b3eb-64393f991d96/images/0170a32e-3dbd-42a0-89a8-f2b16125a14d/619d9509-c182-41e0-913d-1cbaab7e5064 to /rhev/data-center/mnt/blockSD/cf04a2e3-6ca1-40f9-b3eb-64393f991d96/images/0170a32e-3dbd-42a0-89a8-f2b16125a14d/aa8334d1-df5d-4294-90a4-1639c62003cc (volume:956)
2020-06-29 22:37:55,014+0300 DEBUG (tasks/5) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-15 /usr/bin/dd iflag=direct skip=2432 bs=512 if=/dev/cf04a2e3-6ca1-40f9-b3eb-64393f991d96/metadata count=1 (cwd None) (commands:211)
2020-06-29 22:37:55,026+0300 DEBUG (tasks/3) [storage.TaskManager.Task] (Task='022e276a-578d-4964-b08d-3267ad467b26') _save: orig /rhev/data-center/33972a72-d9cd-426d-a688-353cf42f7eb3/mastersd/master/tasks/022e276a-578d-4964-b08d-3267ad467b26 temp /rhev/data-center/33972a72-d9cd-426d-a688-353cf42f7eb3/mastersd/master/tasks/022e276a-578d-4964-b08d-3267ad467b26.temp (task:781)
2020-06-29 22:37:55,034+0300 DEBUG (tasks/7) [storage.Misc.excCmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n512 bytes copied, 0.000510976 s, 1.0 MB/s\n'; <rc> = 0 (commands:224)
2020-06-29 22:37:55,034+0300 DEBUG (tasks/7) [storage.Misc] err: [b'1+0 records in', b'1+0 records out', b'512 bytes copied, 0.000510976 s, 1.0 MB/s'], size: 512 (misc:114)
2020-06-29 22:37:55,035+0300 ERROR (tasks/7) [storage.Volume] Requested capacity 1073741824 < parent capacity 10737418240 (volume:1138)
2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Trying to release resource '01_img_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.03079d6a-b2fa-406c-9087-4dd626fd6839' (resourceManager:548)
2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Released resource '01_img_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.03079d6a-b2fa-406c-9087-4dd626fd6839' (0 active users) (resourceManager:566)
2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Resource '01_img_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.03079d6a-b2fa-406c-9087-4dd626fd6839' is free, finding out if anyone is waiting for it. (resourceManager:572)
2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Trying to release resource '02_vol_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.bc9f19ff-02d1-4d90-8745-e9b67e56ddf2' (resourceManager:548)
2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Released resource '02_vol_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.bc9f19ff-02d1-4d90-8745-e9b67e56ddf2' (0 active users) (resourceManager:566)
2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] Resource '02_vol_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.bc9f19ff-02d1-4d90-8745-e9b67e56ddf2' is free, finding out if anyone is waiting for it. (resourceManager:572)
2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] No one is waiting for resource '02_vol_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.bc9f19ff-02d1-4d90-8745-e9b67e56ddf2', Clearing records. (resourceManager:580)
2020-06-29 22:37:55,035+0300 DEBUG (tasks/7) [storage.ResourceManager] No one is waiting for resource '01_img_cf04a2e3-6ca1-40f9-b3eb-64393f991d96.03079d6a-b2fa-406c-9087-4dd626fd6839', Clearing records. (resourceManager:580)
2020-06-29 22:37:55,036+0300 ERROR (tasks/7) [storage.TaskManager.Task] (Task='71ccdf08-e8f1-4257-bfd9-3d718fc7f6a3') Unexpected error (task:880)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1874, in createVolume
    initial_size=initialSize)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 993, in createVolume
    initial_size=initial_size)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1139, in create
    raise se.InvalidParameterException("capacity", capacity)
vdsm.storage.exception.InvalidParameterException: Invalid parameter: 'capacity=1073741824'


Version-Release number of selected component (if applicable):
rhv-4.4.1-5

How reproducible:
Reproduced only once by automation run.
Tried to reproduce manually without success.


Steps to Reproduce:

1. clone vm from a pre-defined template latest-rhel-guest-image-8.2-infra with {'name': 'vm_TestCase5061_2922354448', 'cluster': 'golden_env_mixed_1', 'clone': False, 'vol_sparse': True, 'vol_format': 'cow', 'storagedomain': 'iscsi_2', 'virtio_scsi': True, 'display_type': 'spice', 'type': 'desktop', 'timeout': 900, 'wait': True}

2. Start the vm

3. Creating all disk permutations:
virtio scsi cow
virtio scsi raw
virtio cow
virtio raw

4. Add snapshot to VM vm_TestCase5061_2922354448 with {'persist_memory': False, 'description': 'snap_TestCase5061_2922375307', 'wait': True}


Actual results:
Snapshot fails to create

Expected results:
Shouldnt fail

Comment 1 Ilan Zuckerman 2020-07-01 09:01:12 UTC
Tried to reproduced with what Benny suggested in 1758048#c51 :
"it's easy to reproduce by creating a raw disk on iscsi with <1G size and then trying to create a snapshot" 

But the snapshot succeeded for me.

Comment 2 Tal Nisan 2020-07-06 13:21:10 UTC
Might be related to bug 1826365

Comment 3 Eyal Shenitzky 2021-03-10 07:17:44 UTC
Ilan, did we saw this issue again?
If not I suggest closing it as 'WORKSFORME'.

Comment 4 Ilan Zuckerman 2021-03-10 07:22:38 UTC
(In reply to Eyal Shenitzky from comment #3)
> Ilan, did we saw this issue again?
> If not I suggest closing it as 'WORKSFORME'.

The described issue was Never seen again. Closing as 'WORKSFORME'

Comment 5 Roman Hodain 2021-04-06 08:00:03 UTC
I am reopening as we were able to reproduce with vdsm-4.40.50.9-1.el8ev.x86_64 during scale tests.  

https://gitlab.com/nirs/ovirt-stress

Comment 10 Roman Hodain 2021-04-06 08:11:17 UTC
vdsm.log.4.xz:2021-04-05 03:31:53,852+0200 DEBUG (tasks/2) [storage.TaskManager.Task] (Task='8e702233-a269-4297-84ba-6d9a4ecf58af') aborting: Task is aborted: "value=Invalid parameter: 'capacity=1073741824' abortedcode=100" (task:935)
vdsm.log.4.xz:2021-04-05 03:45:12,035+0200 DEBUG (tasks/1) [storage.TaskManager.Task] (Task='c6525f99-5d32-4f6c-a507-dd2ae84ffa2e') aborting: Task is aborted: "value=Invalid parameter: 'capacity=1073741824' abortedcode=100" (task:935)
vdsm.log.4.xz:2021-04-05 03:53:42,817+0200 DEBUG (tasks/1) [storage.TaskManager.Task] (Task='d05b8095-00df-48c4-b6ab-fe8819a0cf3e') aborting: Task is aborted: "value=Invalid parameter: 'capacity=1073741824' abortedcode=100" (task:935)

Comment 11 Eyal Shenitzky 2021-04-07 05:58:20 UTC
Benny, can you please take a look?

Comment 14 Nir Soffer 2021-06-07 12:03:14 UTC
(In reply to Roman Hodain from comment #10)
> vdsm.log.4.xz:2021-04-05 03:31:53,852+0200 DEBUG (tasks/2)
> [storage.TaskManager.Task] (Task='8e702233-a269-4297-84ba-6d9a4ecf58af')
> aborting: Task is aborted: "value=Invalid parameter: 'capacity=1073741824'
...

Roman, do you have complete logs from these incedents?

We need vdsm log from the API call to create a volume, showing which values were sent
by engine, until the call fail with "Invalid parameter: 'capacity=1073741824'".

Comment 15 Nir Soffer 2021-06-07 15:00:00 UTC
(In reply to Nir Soffer from comment #14)
Roman, I found the logs you attached, sorry for the noise.

Comment 20 Eyal Shenitzky 2021-11-01 08:11:03 UTC
I managed to reproduce this issue locally - 

1. Import 'Cirros' image from a glance as a template and use a file-based storage domain for it.
2. Create a VM from the template as 'clone', in the resource allocation select block-based domain for the disk and select 'RAW' format.
3. Try to create a snapshot (cold) for the VM created from the template.

Snapshot creation failed with the following message - 

2021-11-01 03:52:42,842-04 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) [] BaseAsyncTask::logEndTaskFailure: Task '0544902b-59e1-41ec-936c-e311a3ed60cf' (Parent Command 'CreateSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = value=Invalid parameter: 'capacity=46137344' abortedcode=100, code = 100',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = value=Invalid parameter: 'capacity=46137344' abortedcode=100, code = 100'

Engine and VDSM logs added.

Comment 23 Eyal Shenitzky 2021-11-01 08:21:58 UTC
(In reply to Eyal Shenitzky from comment #20)
> I managed to reproduce this issue locally - 
> 
> 1. Import 'Cirros' image from a glance as a template and use a file-based
> storage domain for it.
> 2. Create a VM from the template as 'clone', in the resource allocation
> select block-based domain for the disk and select 'RAW' format.
> 3. Try to create a snapshot (cold) for the VM created from the template.
> 
> Snapshot creation failed with the following message - 
> 
> 2021-11-01 03:52:42,842-04 ERROR
> [org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) []
> BaseAsyncTask::logEndTaskFailure: Task
> '0544902b-59e1-41ec-936c-e311a3ed60cf' (Parent Command 'CreateSnapshot',
> Parameters Type
> 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with
> failure:
> -- Result: 'cleanSuccess'
> -- Message: 'VDSGenericException: VDSErrorException: Failed to
> HSMGetAllTasksStatusesVDS, error = value=Invalid parameter:
> 'capacity=46137344' abortedcode=100, code = 100',
> -- Exception: 'VDSGenericException: VDSErrorException: Failed to
> HSMGetAllTasksStatusesVDS, error = value=Invalid parameter:
> 'capacity=46137344' abortedcode=100, code = 100'
> 
> Engine and VDSM logs added.

This is the VDSM error - 

2021-11-01 03:52:33,066-0400 ERROR (tasks/2) [storage.Volume] Requested capacity 46137344 < parent capacity 134217728 (volume:1223)
2021-11-01 03:52:33,066-0400 ERROR (tasks/2) [storage.TaskManager.Task] (Task='0544902b-59e1-41ec-936c-e311a3ed60cf') Unexpected error (task:877)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1945, in createVolume
    initial_size=initialSize, add_bitmaps=addBitmaps)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1216, in createVolume
    initial_size=initial_size, add_bitmaps=add_bitmaps)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1224, in create
    raise se.InvalidParameterException("capacity", capacity)
vdsm.storage.exception.InvalidParameterException: Invalid parameter: 'capacity=46137344'

Comment 26 Shir Fishbain 2022-05-30 20:22:23 UTC
QE doesn't have the capacity to verify this bug during 4.5.1. The team focuses on CBT tasks.

Comment 27 sshmulev 2022-06-01 07:16:28 UTC
I didn't find 'Cirros' img from glance, so I tried to use some other image from glance and import it as a template (latest_rhcos).
I couldn't reproduce it on a version without the fix (used: ovirt-engine4.5.0.6-0.7.el8ev, vdsm-4.50.0.13-1.el8ev) so I don't know if these are the right steps to reproduce it, maybe something is wrong with the image I used? cause all the rest of the steps I did the exact as described in comment 20:

"
1. Import 'Cirros' image from a glance as a template and use a file-based storage domain for it.
2. Create a VM from the template as 'clone', in the resource allocation select block-based domain for the disk and select 'RAW' format.
3. Try to create a snapshot (cold) for the VM created from the template.
"

Comment 30 sshmulev 2022-06-07 07:55:19 UTC
Verified.

CirrOS wasn't inside glance so I uploaded a CirrOS image less than 128MB, and then created a template from the VM with this image and continued the flow according to the steps in comment 20.
could reproduce It on a version without the fix and didn't reproduced in the latest version.

Versions:
ovirt-engine4.5.1-0.62.el8ev
vdsm-4.50.1.1-1.el8ev

Comment 31 Nir Soffer 2022-06-07 08:25:06 UTC
Sophie, on which version did you reproduce this issue?

Comment 32 Nir Soffer 2022-06-07 08:27:33 UTC
Benny, how this issue was fixed? We know that this happens when system is under load
(e.g. during delete snapshot stress test). I think we don't understand yet why this
happens and since no patch is attached to this bug I assume this was never fixed.

Comment 33 Benny Zlotnik 2022-06-07 08:32:07 UTC
(In reply to Nir Soffer from comment #32)
> Benny, how this issue was fixed? We know that this happens when system is
> under load
> (e.g. during delete snapshot stress test). I think we don't understand yet
> why this
> happens and since no patch is attached to this bug I assume this was never
> fixed.

I fixed it based on Eyal's reproducer which is consistent (comment #20)

Comment 34 Arik 2022-06-07 09:13:33 UTC
(In reply to Nir Soffer from comment #32)
> I think we don't understand yet why this happens and since no patch is attached to this bug I assume this was never fixed.

The status here is incorrect - the attached PR (https://github.com/oVirt/ovirt-engine/pull/267) is merged

Comment 35 sshmulev 2022-06-07 09:17:51 UTC
(In reply to Nir Soffer from comment #31)
> Sophie, on which version did you reproduce this issue?

ovirt-engine-4.5.0.7-0.9.el8ev
vdsm-4.50.0.13-1.el8ev

Comment 40 errata-xmlrpc 2022-07-14 12:54:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5555


Note You need to log in before you can comment on or make changes to this bug.