Bug 1575986
| Summary: | Sometimes failing task "Wait for resize" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Petr Kubica <pkubica> | ||||
| Component: | BLL.Storage | Assignee: | Eyal Shenitzky <eshenitz> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Avihai <aefrat> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | --- | CC: | bugs, lsvaty, michal.skrivanek, mperina, pkubica, tnisan | ||||
| Target Milestone: | --- | Keywords: | Automation, AutomationBlocker | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-05-11 07:29:12 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Could you please attache full engine and VDSM logs to be able to investigate the failure? Created attachment 1433938 [details]
logs from ansible output, engine.log and vdsm.log
yes, attached
The extend operation is suceessfully executed:
2018-05-09 17:39:38,029+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-5993) [] EVENT_ID: USER_EXTEND_DISK_SIZE_SUCCESS(371), Size of the disk 'pkubica-rhel-72-test3' was successfully updated to 50 GB by admin@internal-authz.
But API don't reported the new value:
...
"provisioned_size": 51539607552,
In vdsm I can see:
2018-05-09 17:39:38,157+0200 INFO (jsonrpc/0) [storage.VolumeManifest] b0277d7f-d00f-4b9b-af63-e59389a2235b/e38487a6-6b9f-4e9a-9ad7-31cb2b46d781/35805e2b-841b-4b6c-a2e2-907195dbd37f info is {'status': 'OK', 'domain': 'b0277d7f-d00f-4b9b-af63-e59389a2235b', 'voltype': 'LEAF', 'description': '{"DiskAlias":"pkubica-rhel-72-test3","DiskDescription":""}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'e38487a6-6b9f-4e9a-9ad7-31cb2b46d781', 'ctime': '1525880305', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '1131938592', 'children': [], 'pool': '', 'capacity': '53687091200', 'uuid': u'35805e2b-841b-4b6c-a2e2-907195dbd37f', 'truesize': '1132511232', 'type': 'SPARSE', 'lease': {'path': u'/rhev/data-center/mnt/str-03s1.rhev.lab.eng.brq.redhat.com:_pkubica_he__lab_storage/b0277d7f-d00f-4b9b-af63-e59389a2235b/images/e38487a6-6b9f-4e9a-9ad7-31cb2b46d781/35805e2b-841b-4b6c-a2e2-907195dbd37f.lease', 'owners': [], 'version': None, 'offset': 0}} (volume:279)
The size is reported as '53687091200' 50GiB.
So it looks like maybe the data are not updated in database in some rare cases.
The engine debug log and database dump after would be helpfull to resolve the issue.
It was difficult to reproduce, when I turned on DEBUG level info, I reproduced it on 12th attempt. But I have dumped database with engine.log
fatal: [localhost]: FAILED! => {
"attempts": 30,
"changed": false,
"diff": {
"after": {},
"before": {}
},
"disk": {
"actual_size": 2269462528,
"alias": "test1",
"content_type": "data",
"disk_profile": {
"href": "/ovirt-engine/api/diskprofiles/34b359ff-9c88-444e-849e-fef8d71375d1",
"id": "34b359ff-9c88-444e-849e-fef8d71375d1"
},
"format": "cow",
"href": "/ovirt-engine/api/disks/aeddb27a-c865-47e0-b501-c8d96b39dda9",
"id": "aeddb27a-c865-47e0-b501-c8d96b39dda9",
"image_id": "7b4ab07b-4e7d-4870-857d-1a6a95f65685",
"name": "test1",
"permissions": [],
"propagate_errors": false,
"provisioned_size": 52613349376,
"qcow_version": "qcow2_v3",
"quota": {
"href": "/ovirt-engine/api/datacenters/06868ef8-528c-11e8-b8bd-001a4a014208/quotas/118ed436-528c-11e8-916c-001a4a014208",
"id": "118ed436-528c-11e8-916c-001a4a014208"
},
"shareable": false,
"sparse": true,
"statistics": [],
"status": "ok",
"storage_domains": [
{
"href": "/ovirt-engine/api/storagedomains/b0277d7f-d00f-4b9b-af63-e59389a2235b",
"id": "b0277d7f-d00f-4b9b-af63-e59389a2235b"
}
],
"storage_type": "image",
"total_size": 2269462528,
"wipe_after_delete": false
},
"id": "aeddb27a-c865-47e0-b501-c8d96b39dda9",
"invocation": {
"module_args": {
"auth": {
"ca_file": "/etc/pki/ovirt-engine/ca.pem",
"compress": true,
"headers": null,
"insecure": true,
"kerberos": false,
"timeout": 0,
"token": "_Hg2DJEY5B4sV6nrWAdHv3_QI35kcKAjPtXrSxPzvvgyfs9_4HsiO-EzYE7riIVz44D44wFUNGv6rjly-Zo3yQ",
"url": "https://she-test-09.rhev.lab.eng.brq.redhat.com/ovirt-engine/api"
},
"bootable": null,
"description": null,
"download_image_path": null,
"fetch_nested": false,
"force": false,
"format": "cow",
"id": "aeddb27a-c865-47e0-b501-c8d96b39dda9",
"image_provider": null,
"interface": null,
"logical_unit": null,
"name": null,
"nested_attributes": [],
"openstack_volume_type": null,
"poll_interval": 3,
"profile": null,
"quota_id": null,
"shareable": null,
"size": null,
"sparse": null,
"sparsify": null,
"state": "present",
"storage_domain": null,
"storage_domains": null,
"timeout": 180,
"upload_image_path": null,
"vm_id": null,
"vm_name": null,
"wait": true
}
}
}
Created attachment 1469904 [details]
engine db and logs
From the above it seems like a race in storage code inside engine, Tal could you please take a look? This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. no update in 3 years, seems it doesn't bother anyone or it got fixed in the meantime, closing... |
Description of problem: TASK [oVirt.image-template : Wait for resize] ******************************************** FAILED - RETRYING: Wait for resize (30 retries left). FAILED - RETRYING: Wait for resize (29 retries left). .. output shortened ... FAILED - RETRYING: Wait for resize (2 retries left). FAILED - RETRYING: Wait for resize (1 retries left). fatal: [she-test-09.rhev.lab.eng.brq.redhat.com]: FAILED! => {"attempts": 30, "changed": false, "disk": {"actual_size": 1132511232, "alias": "pkubica-rhel-72-host", "content_type": "data", "disk_profile": {"href": "/ovirt-engine/api/diskprofiles/34b359ff-9c88-444e-849e-fef8d71375d1", "id": "34b359ff-9c88-444e-849e-fef8d71375d1"}, "format": "cow", "href": "/ovirt-engine/api/disks/36868cfe-1a9f-492c-a790-4606c5e73f8f", "id": "36868cfe-1a9f-492c-a790-4606c5e73f8f", "image_id": "effbf8dc-f3cb-42f8-9755-74001a9cbcd1", "name": "pkubica-rhel-72-host", "permissions": [], "propagate_errors": false, "provisioned_size": 51539607552, "qcow_version": "qcow2_v3", "quota": {"href": "/ovirt-engine/api/datacenters/06868ef8-528c-11e8-b8bd-001a4a014208/quotas/118ed436-528c-11e8-916c-001a4a014208", "id": "118ed436-528c-11e8-916c-001a4a014208"}, "shareable": false, "sparse": true, "statistics": [], "status": "ok", "storage_domains": [{"href": "/ovirt-engine/api/storagedomains/b0277d7f-d00f-4b9b-af63-e59389a2235b", "id": "b0277d7f-d00f-4b9b-af63-e59389a2235b"}], "storage_type": "image", "total_size": 0, "wipe_after_delete": false}, "id": "36868cfe-1a9f-492c-a790-4606c5e73f8f"} It is not due to timeout. In webadmin I see only disk is up. My tip is about wrong checking status of disk. nothing in engine.log during that task (from transfer to delete after task failed) and in webadmin I didn't see any failed task in tasks tab or failed event. 2018-05-08 15:10:06,458+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-9) [0b54458c-9e45-4885-a483-d5e4fb2c9ff6] Lock freed to ob ject 'EngineLock:{exclusiveLocks='', sharedLocks='[36868cfe-1a9f-492c-a790-4606c5e73f8f=DISK]'}' 2018-05-08 15:10:06,467+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-9) [0b54458c-9e45-4885-a483-d5e4fb2c9ff6] EVENT_ID: TRANSFE R_IMAGE_SUCCEEDED(1,032), Image Upload with disk pkubica-rhel-72-host succeeded. 2018-05-08 15:10:15,546+02 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] Setting new tasks map. The map contains now 1 tasks 2018-05-08 15:10:57,184+02 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-24) [58f8a395] Lock Acquired to object 'EngineLock:{exclus iveLocks='[bd9bbfdc-2e3d-4a3b-a8fe-356d5690ce29=PROVIDER]', sharedLocks=''}' 2018-05-08 15:10:57,192+02 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-24) [58f8a395] Running command: SyncNetworkProviderCommand internal: true. 2018-05-08 15:10:57,283+02 INFO [org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default task-17) [] User admin@internal successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access 2018-05-08 15:10:57,430+02 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-24) [58f8a395] Lock freed to object 'EngineLock:{exclusive Locks='[bd9bbfdc-2e3d-4a3b-a8fe-356d5690ce29=PROVIDER]', sharedLocks=''}' 2018-05-08 15:11:04,242+02 INFO [org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default task-31) [] User admin@internal successfully logged in with scopes: ovirt-app-admin ovirt-app-api ovirt-app-portal ovirt-ext=auth:sequence-priority=~ ovirt-ext=revoke:revoke-all ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access 2018-05-08 15:11:04,426+02 INFO [org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default task-40) [51e423f] Running command: CreateUserSessionCommand internal: false. 2018-05-08 15:11:04,435+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-40) [51e423f] EVENT_ID: USER_VDC_LOGIN(30), User admin@internal-authz connecting from '10.40. 204.19' using session '1TJPaKZOgbTr3b0NLFbv23mBwYD3xgyeHZDgKURU1+iXmdCupd8fvpGYmkrxzvIUq/N/Hg3D3vUXNkV2Un/uyQ==' logged in. 2018-05-08 15:11:15,546+02 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [] Setting new tasks map. The map contains now 0 tasks 2018-05-08 15:11:15,547+02 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [] Cleared all tasks of pool '06868ef8-528c-11e8-b8bd-001a4a014208'. 2018-05-08 15:11:39,404+02 INFO [org.ovirt.engine.core.bll.RemoveVmCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Lock Acquired to object 'EngineLock:{exclusiveLocks='[61990ff5-7f72-4080-8610 -a0082cae39a8=VM]', sharedLocks=''}' 2018-05-08 15:11:39,436+02 INFO [org.ovirt.engine.core.bll.RemoveVmCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Running command: RemoveVmCommand internal: false. Entities affected : ID: 61 990ff5-7f72-4080-8610-a0082cae39a8 Type: VMAction group DELETE_VM with role type USER 2018-05-08 15:11:39,438+02 INFO [org.ovirt.engine.core.vdsbroker.SetVmStatusVDSCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] START, SetVmStatusVDSCommand( SetVmStatusVDSCommandParameters:{vm Id='61990ff5-7f72-4080-8610-a0082cae39a8', status='ImageLocked', exitStatus='Normal'}), log id: 32e25c75 2018-05-08 15:11:39,443+02 INFO [org.ovirt.engine.core.vdsbroker.SetVmStatusVDSCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] FINISH, SetVmStatusVDSCommand, log id: 32e25c75 2018-05-08 15:11:39,445+02 INFO [org.ovirt.engine.core.bll.RemoveVmCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Lock freed to object 'EngineLock:{exclusiveLocks='[61990ff5-7f72-4080-8610-a0 082cae39a8=VM]', sharedLocks=''}' 2018-05-08 15:11:39,456+02 INFO [org.ovirt.engine.core.bll.RemoveMemoryVolumesCommand] (default task-43) [60aa9b16] Running command: RemoveMemoryVolumesCommand internal: true. 2018-05-08 15:11:39,484+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.RemoveAllVmImagesCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Running command: RemoveAllVmImagesCommand interna l: true. Entities affected : ID: 61990ff5-7f72-4080-8610-a0082cae39a8 Type: VM 2018-05-08 15:11:39,501+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand] (default task-43) [66596f81-32ce-468b-b8cf-cd1dd3860249] Running command: RemoveImageCommand internal: true. Ent ities affected : ID: 00000000-0000-0000-0000-000000000000 Type: Storage Version-Release number of selected component (if applicable): ansible-2.5.2-1.el7.noarch oVirt.image-template (1.1.6) ovirt-engine-4.2.3.2-0.1.el7.noarch How reproducible: 20% cases - reproduced in two different environments Steps to Reproduce: uploading a normal image to engine, it is non-deterministic why it fail. some information about used images: images was created by command: qemu-img create -f qcow2 foobar.qcow2 50000MB in playbook I have: template_disk_size: 50GiB