Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1575986

Summary: Sometimes failing task "Wait for resize"
Product: [oVirt] ovirt-engine Reporter: Petr Kubica <pkubica>
Component: BLL.StorageAssignee: Eyal Shenitzky <eshenitz>
Status: CLOSED WONTFIX QA Contact: Avihai <aefrat>
Severity: medium Docs Contact:
Priority: unspecified    
Version: ---CC: bugs, lsvaty, michal.skrivanek, mperina, pkubica, tnisan
Target Milestone: ---Keywords: Automation, AutomationBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-11 07:29:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine db and logs none

Description Petr Kubica 2018-05-08 13:34:09 UTC
Description of problem:
TASK [oVirt.image-template : Wait for resize] ********************************************
FAILED - RETRYING: Wait for resize (30 retries left).
FAILED - RETRYING: Wait for resize (29 retries left).
.. output shortened ...
FAILED - RETRYING: Wait for resize (2 retries left).
FAILED - RETRYING: Wait for resize (1 retries left).
fatal: [she-test-09.rhev.lab.eng.brq.redhat.com]: FAILED! => {"attempts": 30, "changed": false, "disk": {"actual_size": 1132511232, "alias": "pkubica-rhel-72-host", "content_type": "data", "disk_profile": {"href": "/ovirt-engine/api/diskprofiles/34b359ff-9c88-444e-849e-fef8d71375d1", "id": "34b359ff-9c88-444e-849e-fef8d71375d1"}, "format": "cow", "href": "/ovirt-engine/api/disks/36868cfe-1a9f-492c-a790-4606c5e73f8f", "id": "36868cfe-1a9f-492c-a790-4606c5e73f8f", "image_id": "effbf8dc-f3cb-42f8-9755-74001a9cbcd1", "name": "pkubica-rhel-72-host", "permissions": [], "propagate_errors": false, "provisioned_size": 51539607552, "qcow_version": "qcow2_v3", "quota": {"href": "/ovirt-engine/api/datacenters/06868ef8-528c-11e8-b8bd-001a4a014208/quotas/118ed436-528c-11e8-916c-001a4a014208", "id": "118ed436-528c-11e8-916c-001a4a014208"}, "shareable": false, "sparse": true, "statistics": [], "status": "ok", "storage_domains": [{"href": "/ovirt-engine/api/storagedomains/b0277d7f-d00f-4b9b-af63-e59389a2235b", "id": "b0277d7f-d00f-4b9b-af63-e59389a2235b"}], "storage_type": "image", "total_size": 0, "wipe_after_delete": false}, "id": "36868cfe-1a9f-492c-a790-4606c5e73f8f"}

It is not due to timeout. In webadmin I see only disk is up.
My tip is about wrong checking status of disk.

nothing in engine.log during that task (from transfer to delete after task failed) and in webadmin I didn't see any failed task in tasks tab or failed event.
 
2018-05-08 15:10:06,458+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-9) [0b54458c-9e45-4885-a483-d5e4fb2c9ff6] Lock freed to ob
ject 'EngineLock:{exclusiveLocks='', sharedLocks='[36868cfe-1a9f-492c-a790-4606c5e73f8f=DISK]'}'
2018-05-08 15:10:06,467+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-9) [0b54458c-9e45-4885-a483-d5e4fb2c9ff6] EVENT_ID: TRANSFE
R_IMAGE_SUCCEEDED(1,032), Image Upload with disk pkubica-rhel-72-host succeeded.
2018-05-08 15:10:15,546+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] Setting new tasks map. The map contains now 1 tasks
2018-05-08 15:10:57,184+02 INFO  [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-24) [58f8a395] Lock Acquired to object 'EngineLock:{exclus
iveLocks='[bd9bbfdc-2e3d-4a3b-a8fe-356d5690ce29=PROVIDER]', sharedLocks=''}'
2018-05-08 15:10:57,192+02 INFO  [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-24) [58f8a395] Running command: SyncNetworkProviderCommand
 internal: true.
2018-05-08 15:10:57,283+02 INFO  [org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default task-17) [] User admin@internal successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search
 ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access
2018-05-08 15:10:57,430+02 INFO  [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-24) [58f8a395] Lock freed to object 'EngineLock:{exclusive
Locks='[bd9bbfdc-2e3d-4a3b-a8fe-356d5690ce29=PROVIDER]', sharedLocks=''}'
2018-05-08 15:11:04,242+02 INFO  [org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default task-31) [] User admin@internal successfully logged in with scopes: ovirt-app-admin ovirt-app-api ovirt-app-portal 
ovirt-ext=auth:sequence-priority=~ ovirt-ext=revoke:revoke-all ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access
2018-05-08 15:11:04,426+02 INFO  [org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default task-40) [51e423f] Running command: CreateUserSessionCommand internal: false.
2018-05-08 15:11:04,435+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-40) [51e423f] EVENT_ID: USER_VDC_LOGIN(30), User admin@internal-authz connecting from '10.40.
204.19' using session '1TJPaKZOgbTr3b0NLFbv23mBwYD3xgyeHZDgKURU1+iXmdCupd8fvpGYmkrxzvIUq/N/Hg3D3vUXNkV2Un/uyQ==' logged in.
2018-05-08 15:11:15,546+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [] Setting new tasks map. The map contains now 0 tasks
2018-05-08 15:11:15,547+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [] Cleared all tasks of pool '06868ef8-528c-11e8-b8bd-001a4a014208'.
2018-05-08 15:11:39,404+02 INFO  [org.ovirt.engine.core.bll.RemoveVmCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Lock Acquired to object 'EngineLock:{exclusiveLocks='[61990ff5-7f72-4080-8610
-a0082cae39a8=VM]', sharedLocks=''}'
2018-05-08 15:11:39,436+02 INFO  [org.ovirt.engine.core.bll.RemoveVmCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Running command: RemoveVmCommand internal: false. Entities affected :  ID: 61
990ff5-7f72-4080-8610-a0082cae39a8 Type: VMAction group DELETE_VM with role type USER
2018-05-08 15:11:39,438+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVmStatusVDSCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] START, SetVmStatusVDSCommand( SetVmStatusVDSCommandParameters:{vm
Id='61990ff5-7f72-4080-8610-a0082cae39a8', status='ImageLocked', exitStatus='Normal'}), log id: 32e25c75
2018-05-08 15:11:39,443+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVmStatusVDSCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] FINISH, SetVmStatusVDSCommand, log id: 32e25c75
2018-05-08 15:11:39,445+02 INFO  [org.ovirt.engine.core.bll.RemoveVmCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Lock freed to object 'EngineLock:{exclusiveLocks='[61990ff5-7f72-4080-8610-a0
082cae39a8=VM]', sharedLocks=''}'
2018-05-08 15:11:39,456+02 INFO  [org.ovirt.engine.core.bll.RemoveMemoryVolumesCommand] (default task-43) [60aa9b16] Running command: RemoveMemoryVolumesCommand internal: true.
2018-05-08 15:11:39,484+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.RemoveAllVmImagesCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Running command: RemoveAllVmImagesCommand interna
l: true. Entities affected :  ID: 61990ff5-7f72-4080-8610-a0082cae39a8 Type: VM
2018-05-08 15:11:39,501+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Running command: RemoveImageCommand internal: true. Ent
ities affected :  ID: 00000000-0000-0000-0000-000000000000 Type: Storage

Version-Release number of selected component (if applicable):
ansible-2.5.2-1.el7.noarch
oVirt.image-template (1.1.6)
ovirt-engine-4.2.3.2-0.1.el7.noarch

How reproducible:
20% cases - reproduced in two different environments

Steps to Reproduce:
uploading a normal image to engine, it is non-deterministic why it fail.

some information about used images:
images was created by command: qemu-img create -f qcow2 foobar.qcow2 50000MB 
in playbook I have: 
template_disk_size: 50GiB

Comment 1 Martin Perina 2018-05-09 11:58:09 UTC
Could you please attache full engine and VDSM logs to be able to investigate the failure?

Comment 2 Petr Kubica 2018-05-09 15:59:48 UTC
Created attachment 1433938 [details]
logs from ansible output, engine.log and vdsm.log

yes, attached

Comment 3 Ondra Machacek 2018-07-13 10:28:31 UTC
The extend operation is suceessfully executed:

 2018-05-09 17:39:38,029+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-5993) [] EVENT_ID: USER_EXTEND_DISK_SIZE_SUCCESS(371), Size of the disk 'pkubica-rhel-72-test3' was successfully updated to 50 GB by admin@internal-authz.

But API don't reported the new value:

 ...
 "provisioned_size": 51539607552, 

In vdsm I can see:

 2018-05-09 17:39:38,157+0200 INFO  (jsonrpc/0) [storage.VolumeManifest] b0277d7f-d00f-4b9b-af63-e59389a2235b/e38487a6-6b9f-4e9a-9ad7-31cb2b46d781/35805e2b-841b-4b6c-a2e2-907195dbd37f info is {'status': 'OK', 'domain': 'b0277d7f-d00f-4b9b-af63-e59389a2235b', 'voltype': 'LEAF', 'description': '{"DiskAlias":"pkubica-rhel-72-test3","DiskDescription":""}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'e38487a6-6b9f-4e9a-9ad7-31cb2b46d781', 'ctime': '1525880305', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '1131938592', 'children': [], 'pool': '', 'capacity': '53687091200', 'uuid': u'35805e2b-841b-4b6c-a2e2-907195dbd37f', 'truesize': '1132511232', 'type': 'SPARSE', 'lease': {'path': u'/rhev/data-center/mnt/str-03s1.rhev.lab.eng.brq.redhat.com:_pkubica_he__lab_storage/b0277d7f-d00f-4b9b-af63-e59389a2235b/images/e38487a6-6b9f-4e9a-9ad7-31cb2b46d781/35805e2b-841b-4b6c-a2e2-907195dbd37f.lease', 'owners': [], 'version': None, 'offset': 0}} (volume:279)

The size is reported as '53687091200' 50GiB.

So it looks like maybe the data are not updated in database in some rare cases.
The engine debug log and database dump after would be helpfull to resolve the issue.

Comment 4 Petr Kubica 2018-07-23 08:18:09 UTC
It was difficult to reproduce, when I turned on DEBUG level info, I reproduced it on 12th attempt. But I have dumped database with engine.log

fatal: [localhost]: FAILED! => {
    "attempts": 30, 
    "changed": false, 
    "diff": {
        "after": {}, 
        "before": {}
    }, 
    "disk": {
        "actual_size": 2269462528, 
        "alias": "test1", 
        "content_type": "data", 
        "disk_profile": {
            "href": "/ovirt-engine/api/diskprofiles/34b359ff-9c88-444e-849e-fef8d71375d1", 
            "id": "34b359ff-9c88-444e-849e-fef8d71375d1"
        }, 
        "format": "cow", 
        "href": "/ovirt-engine/api/disks/aeddb27a-c865-47e0-b501-c8d96b39dda9", 
        "id": "aeddb27a-c865-47e0-b501-c8d96b39dda9", 
        "image_id": "7b4ab07b-4e7d-4870-857d-1a6a95f65685", 
        "name": "test1", 
        "permissions": [], 
        "propagate_errors": false, 
        "provisioned_size": 52613349376, 
        "qcow_version": "qcow2_v3", 
        "quota": {
            "href": "/ovirt-engine/api/datacenters/06868ef8-528c-11e8-b8bd-001a4a014208/quotas/118ed436-528c-11e8-916c-001a4a014208", 
            "id": "118ed436-528c-11e8-916c-001a4a014208"
        }, 
        "shareable": false, 
        "sparse": true, 
        "statistics": [], 
        "status": "ok", 
        "storage_domains": [
            {
                "href": "/ovirt-engine/api/storagedomains/b0277d7f-d00f-4b9b-af63-e59389a2235b", 
                "id": "b0277d7f-d00f-4b9b-af63-e59389a2235b"
            }
        ], 
        "storage_type": "image", 
        "total_size": 2269462528, 
        "wipe_after_delete": false
    }, 
    "id": "aeddb27a-c865-47e0-b501-c8d96b39dda9", 
    "invocation": {
        "module_args": {
            "auth": {
                "ca_file": "/etc/pki/ovirt-engine/ca.pem", 
                "compress": true, 
                "headers": null, 
                "insecure": true, 
                "kerberos": false, 
                "timeout": 0, 
                "token": "_Hg2DJEY5B4sV6nrWAdHv3_QI35kcKAjPtXrSxPzvvgyfs9_4HsiO-EzYE7riIVz44D44wFUNGv6rjly-Zo3yQ", 
                "url": "https://she-test-09.rhev.lab.eng.brq.redhat.com/ovirt-engine/api"
            }, 
            "bootable": null, 
            "description": null, 
            "download_image_path": null, 
            "fetch_nested": false, 
            "force": false, 
            "format": "cow", 
            "id": "aeddb27a-c865-47e0-b501-c8d96b39dda9", 
            "image_provider": null, 
            "interface": null, 
            "logical_unit": null, 
            "name": null, 
            "nested_attributes": [], 
            "openstack_volume_type": null, 
            "poll_interval": 3, 
            "profile": null, 
            "quota_id": null, 
            "shareable": null, 
            "size": null, 
            "sparse": null, 
            "sparsify": null, 
            "state": "present", 
            "storage_domain": null, 
            "storage_domains": null, 
            "timeout": 180, 
            "upload_image_path": null, 
            "vm_id": null, 
            "vm_name": null, 
            "wait": true
        }
    }
}

Comment 5 Petr Kubica 2018-07-23 08:18:52 UTC
Created attachment 1469904 [details]
engine db and logs

Comment 6 Martin Perina 2018-08-01 11:33:45 UTC
From the above it seems like a race in storage code inside engine, Tal could you please take a look?

Comment 7 Sandro Bonazzola 2019-01-28 09:34:37 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 8 Michal Skrivanek 2021-05-11 07:29:12 UTC
no update in 3 years, seems it doesn't bother anyone or it got fixed in the meantime, closing...