Bug 1637405 - Failed to create/delete vDisk - Starting with Failed in 'HSMGetAllTasksStatusesVDS' method
Summary: Failed to create/delete vDisk - Starting with Failed in 'HSMGetAllTasksStatus...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.6.2
Hardware: Unspecified
OS: Unspecified
urgent
urgent with 1 vote
Target Milestone: ovirt-4.4.1
: ---
Assignee: Nir Soffer
QA Contact: Avihai
URL:
Whiteboard:
: 1666207 (view as bug list)
Depends On: 1553133
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-09 08:24 UTC by Evelina Shames
Modified: 2020-04-01 07:27 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-26 11:23:24 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+
aoconnor: blocker-


Attachments (Terms of Use)
logs (3.79 MB, application/zip)
2018-10-09 08:24 UTC, Evelina Shames
no flags Details
create/delete volume additional TC's logs (14.57 MB, application/x-gzip)
2018-10-09 08:50 UTC, Avihai
no flags Details
4.2.7 Logs (2018-11-07) (4.32 MB, application/zip)
2018-11-08 08:52 UTC, Shir Fishbain
no flags Details
TestCase18268 (1.28 MB, application/gzip)
2018-12-13 10:29 UTC, Yosi Ben Shimon
no flags Details
TestCase19265 (1.24 MB, application/gzip)
2018-12-13 10:29 UTC, Yosi Ben Shimon
no flags Details
TC6004_rhv-4.2.8-7_logs (6.20 MB, application/gzip)
2019-02-17 09:19 UTC, Avihai
no flags Details
ovirt-engine-4.2.8.3-0.1_Tier1_TestCase6038_glusterfs_logs (11.08 MB, application/zip)
2019-02-27 14:03 UTC, Avihai
no flags Details
TestCase6038_ovirt-engine-4.3.3.6-0.1_rep_delete_volume_scenario (5.19 MB, application/zip)
2019-05-05 06:18 UTC, Avihai
no flags Details
TestCase6038_ovirt-engine-4.3.4.3-0.1_rep_delete_volume_scenario (4.13 MB, application/zip)
2019-06-23 06:28 UTC, Avihai
no flags Details
TestCase6052_ovirt-engine-4.3.4.3-0.1_rep_create_volume_scenario (1.63 MB, application/zip)
2019-06-23 06:49 UTC, Avihai
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1553133 0 urgent CLOSED Creating many thin clones corrupts vg metadata 2023-10-06 17:44:13 UTC

Internal Links: 1553133 1811391

Description Evelina Shames 2018-10-09 08:24:14 UTC
Created attachment 1491972 [details]
logs

Description of problem:
Volume creation failed with the following errors, when creating live snapshot after live migration:

Engine:
2018-09-20 05:38:08,956+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command HSMGetAllTasksStatusesVDS failed: Error creating a new volume: ('Volume creation 982d8647-c9b2-41e8-8274-1fc7afcc0ccc failed: Failed reload: 982d8647-c9b2-41e8-8274-1fc7afcc0ccc',)

2018-09-20 05:38:08,963+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] BaseAsyncTask::logEndTaskFailure: Task '24ed00c0-2af2-45e5-9c23-ff27060f5ee7' (Parent Command 'AddImageFromScratch', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume: ('Volume creation 982d8647-c9b2-41e8-8274-1fc7afcc0ccc failed: Failed reload: 982d8647-c9b2-41e8-8274-1fc7afcc0ccc',), code = 205',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume: ('Volume creation 982d8647-c9b2-41e8-8274-1fc7afcc0ccc failed: Failed reload: 982d8647-c9b2-41e8-8274-1fc7afcc0ccc',), code = 205'

2018-09-20 05:38:22,163+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [snapshots_create_2b65d9fe-3de7-4584] Command 'CreateSnapshotForVm' id: '19100aa8-67bd-4ce9-aba5-ddab3393eee2' child commands '[769ff171-b4b0-46f9-9618-e2d6ac46054d, 8f6404d3-1bad-4711-917d-c3a8dec675f4, 1a549d3b-ee7f-4871-aee3-50ad6113b993]' executions were completed, status 'FAILED'
2018-09-20 05:38:23,200+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [snapshots_create_2b65d9fe-3de7-4584] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand' with failure.

VDSM:
2018-09-20 05:37:58,652+0300 ERROR (tasks/9) [storage.Volume] Unexpected error (volume:1249)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1218, in create
    apparentSize = int(dom.getVSize(imgUUID, volUUID) /
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 717, in getVSize
    return self._manifest.getVSize(imgUUID, volUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 453, in getVSize
    size = lvm.getLV(self.sdUUID, volUUID).size
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 82, in __getattr__
    raise AttributeError("Failed reload: %s" % self.name)
AttributeError: Failed reload: 982d8647-c9b2-41e8-8274-1fc7afcc0ccc

2018-09-20 05:38:23,254+0300 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='fd02f906-4c21-4f61-acc3-e7756ab9f51f') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in revertTask
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2305, in revertTask
    return self.taskMng.revertTask(taskID=taskID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/taskManager.py", line 165, in revertTask
    t = self.__getTask(taskID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/taskManager.py", line 85, in __getTask
    raise se.UnknownTask(taskID)
UnknownTask: Task id unknown: ('eebf168e-4a6e-4b58-9871-4a16b06a4b2e',)


Version-Release number of selected component (if applicable):
vdsm-4.20.40-1.el7ev.x86_64
ovirt-engine-4.2.7-0.1.el7ev.noarch


How reproducible:
Many times, at least 9 test cases.

Steps to Reproduce:
1. Create and run VM
2. Migrate VM's disk to another SD (of the same type) 
3. Create live snapshot

Actual results:
Failed to create snapshot.

Expected results:
Should succeed.

Additional info:
Relevant logs are attached.

Comment 1 Avihai 2018-10-09 08:44:02 UTC
Happened also to me in the same Tier3 run on an additional 7 TC's:

All these TC's also failed during disk\volume creation:
1) TestCase6048.test_consecutive_snapshot_deletion_of_snapshots[iscsi]
2) TestCase6050.test_snapshot_merge_during_snapshot_merge[iscsi]
3) TestCase6062.test_live_merge_during_lsm[iscsi]
4) TestCase12215.test_snapshot_deletion_of_all_snapshots[iscsi]
5) TestCase18912.test_snapshot_deletion_of_all_snapshots[iscsi]

these 2 TC's failed during volume deletion during snapshot merge/preview:
6) TestCase18904.test_consecutive_snapshot_deletion_of_snapshots[iscsi]
7) TestCase18899.test_basic_snapshot_deletion[iscsi]

Logs are attached for:
1) volume creation issue I see from TestCase6048
2) volumes deletion issue during preview TestCase18904
3) volume deletion issue during cold merge TestCase18899.

In all the tests including original description from Evelina's comment , it starts with Engine Error :
2018-10-04 19:32:52,722+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2018-10-04 19:32:52,727+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command HSMGetAllTasksStatusesVDS failed: Error creating a new volume: (u'Volume creation 6db2da3a-bcc0-4496-ad51-5c17cdb31a1d failed: Failed reload: 6db2da3a-bcc0-4496-ad51-5c17cdb31a1d',)

And VDSM all have "AttributeError: Failed reload:" issues either in with creating a new volume (code = 205) or with deleting volume/DestroyImage (code = 100)

Comment 2 Avihai 2018-10-09 08:50:23 UTC
Created attachment 1491996 [details]
create/delete volume additional TC's logs

Attached are engine and VDSM logs .

3 directories are there for 2 types of issues :

1) for create volume issue -> directory "create_volume"
2) for Destroy/delete volume issue two directories for 2 TC's :
"preview_TestCase18899"
"delete_snapshot_TestCase18904"

Comment 3 Avihai 2018-10-22 14:22:33 UTC
Checked again at the same Tier3 run on new build ovirt-engine-4.2.7.3-0.1.el7ev.noarch and issue does not reproduce.

All the failed TC's passed in on this run[1] and the ERROR seen here was not seen in engine logs.

[1] https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier3/88/

Comment 4 Shir Fishbain 2018-11-08 08:52:43 UTC
Created attachment 1503279 [details]
4.2.7 Logs (2018-11-07)

Comment 5 Shir Fishbain 2018-11-08 09:07:37 UTC
The bug reproduce in tier1(TestCase6004,test_storage_migration) 4.2.7_4.3 :

From vdsm.log - hypervisor-lynx22_vdsm.log:

2018-11-06 21:17:14,389+0200 INFO  (tasks/1) [storage.LVM] Deactivating lvs: vg=bbb3d1af-bae6-45cb-a6fc-956408328a61 lvs=[u'bc5c0b5e-a94c-4a5f-a6ba-00d5420ee307'] (lvm:1310)
2018-11-06 21:17:14,965+0200 WARN  (tasks/1) [storage.LVM] bc5c0b5e-a94c-4a5f-a6ba-00d5420ee307 can't be reloaded, please check your storage connections. (lvm:81)
2018-11-06 21:17:14,965+0200 ERROR (tasks/1) [storage.Volume] Unexpected error (volume:1249)
2018-11-06 21:17:14,965+0200 ERROR (tasks/1) [storage.Volume] Unexpected error (volume:1249)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1218, in create
    apparentSize = int(dom.getVSize(imgUUID, volUUID) /
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 717, in getVSize
    return self._manifest.getVSize(imgUUID, volUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 453, in getVSize
    size = lvm.getLV(self.sdUUID, volUUID).size
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 82, in __getattr__
    raise AttributeError("Failed reload: %s" % self.name)
AttributeError: Failed reload: bc5c0b5e-a94c-4a5f-a6ba-00d5420ee307

2018-11-06 21:17:14,967+0200 ERROR (tasks/1) [storage.TaskManager.Task] (Task='4285558a-9305-411f-adcf-9ba322a14a2f') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1943, in createVolume
    initialSize=initialSize)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 820, in createVolume
    initialSize=initialSize)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1251, in create
    (volUUID, e))
VolumeCreationError: Error creating a new volume: (u'Volume creation bc5c0b5e-a94c-4a5f-a6ba-00d5420ee307 failed: Failed reload: bc5c0b5e-a94c-4a5f-a6ba-00d5420ee307',)

2018-11-06 21:17:14,967+0200 DEBUG (tasks/1) [storage.TaskManager.Task] (Task='4285558a-9305-411f-adcf-9ba322a14a2f') aborting: Task is aborted: "Error creating a new volume: (u'Volume creation bc5c0b5e-a94c-4a5f-a6ba-00d5420ee307 failed: Failed reload: bc5c0b5e-a94c-4a5f-a6ba-00d5420ee307',)" - code 205 (task:928)

Comment 6 Fred Rolland 2018-11-11 09:35:53 UTC
It looks like the LVM metadata is corrupted:

2018-11-06 21:17:14,954+0200 DEBUG (tasks/1) [storage.Misc.excCmd] FAILED: <err> = '  Metadata on /dev/mapper/3600a09803830447a4f244c4657595035 at 857600 has wrong VG name "}\n\n3ca717a4-01cb-4502-84de-c8adbb69ff52 {\nid = "rc5PFT-7AMW-e5LU-rBvo-HIRk-2LOu-LEoFEV"\nstatus = ["READ", "WRITE", "VISIBLE"]\nfl" expected bbb3d1af-bae6-45cb-a6fc-956408328a61.\n  Metadata on /dev/mapper/3600a09803830447a4f244c4657595035 at 857600 has wrong VG name "}\n\n3ca717a4-01cb-4502-84de-c8adbb69ff52 {\nid = "rc5PFT-7AMW-e5LU-rBvo-HIRk-2LOu-LEoFEV"\nstatus = ["READ", "WRITE", "VISIBLE"]\nfl" expected bbb3d1af-bae6-45cb-a6fc-956408328a61.\n  Not repairing metadata for VG bbb3d1af-bae6-45cb-a6fc-956408328a61.\n  Recovery of volume group "bbb3d1af-bae6-45cb-a6fc-956408328a61" failed.\n  Cannot process volume group bbb3d1af-bae6-45cb-a6fc-956408328a61\n'; <rc> = 5 (commands:86)

Could be a similar issue as :
https://bugzilla.redhat.com/show_bug.cgi?id=1553133

Nir, do you think that the following bug is also related? ( Separate lvm activation from other lvm commands)
https://bugzilla.redhat.com/show_bug.cgi?id=1639360

Comment 7 Elad 2018-11-25 12:57:14 UTC
Raising severity as we encounter this in snapshot related test scenarios in our automation. 
Do we have any new information following comment #6?

Comment 8 Nir Soffer 2018-11-25 13:01:37 UTC
(In reply to Fred Rolland from comment #6)
> It looks like the LVM metadata is corrupted:
> 
> 2018-11-06 21:17:14,954+0200 DEBUG (tasks/1) [storage.Misc.excCmd] FAILED:
> <err> = '  Metadata on /dev/mapper/3600a09803830447a4f244c4657595035 at
> 857600 has wrong VG name "}\n\n3ca717a4-01cb-4502-84de-c8adbb69ff52 {\nid =
> "rc5PFT-7AMW-e5LU-rBvo-HIRk-2LOu-LEoFEV"\nstatus = ["READ", "WRITE",
> "VISIBLE"]\nfl" expected bbb3d1af-bae6-45cb-a6fc-956408328a61.\n
...
> Could be a similar issue as :
> https://bugzilla.redhat.com/show_bug.cgi?id=1553133

Yes, this looks the like the same issue.

> Nir, do you think that the following bug is also related? ( Separate lvm
> activation from other lvm commands)
> https://bugzilla.redhat.com/show_bug.cgi?id=1639360

I don't think it is related.

For bug 1553133 we could reproduce only once, and then we fail to reproduce several
times.

What is the minimal test code reproducing this? Do you have single test running
with a clean environment reproducing this issue?

Comment 9 Benny Zlotnik 2018-11-25 13:02:33 UTC
(In reply to Elad from comment #7)
> Raising severity as we encounter this in snapshot related test scenarios in
> our automation. 
> Do we have any new information following comment #6?

Not at the moment. Does the failure occur in the same storage machine (NetApp?) in all the failures?

Comment 10 Elad 2018-11-25 13:04:24 UTC
So far yes, only on Netapp

Comment 11 Elad 2018-12-05 14:22:17 UTC
Occurs with other storages (Kaminario). See https://bugzilla.redhat.com/show_bug.cgi?id=1553133#c61

Comment 12 Yosi Ben Shimon 2018-12-13 10:20:57 UTC
Happened again in tier2 execution on 4.3 env.
The test cases are:
storage_streaming_api (download image) - TestCase18268 & TestCase19265.
Both on FCP SD.

Version-Release number:
ovirt-engine-4.3.0-0.6.alpha2.el7.noarch
vdsm-4.30.4-1.el7ev.x86_64


From the engine log:


2018-12-12 09:12:56,090+02 INFO  [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (default task-50) [disks_create_1ee96e6f-0f69-4b90] Running command: AddDiskCommand internal: false. Entities affected :  ID: ad097cbe-eff1-4c67-ac1d-b59fb3109fcd Type: StorageAction group CREATE_DISK with role type USER
2018-12-12 09:12:56,129+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (default task-50) [disks_create_1ee96e6f-0f69-4b90] Running command: AddImageFromScratchCommand internal: true. Entities affected :  ID: ad097cbe-eff1-4c67-ac1d-b59fb3109fcd Type: Storage
2018-12-12 09:12:56,173+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (default task-50) [disks_create_1ee96e6f-0f69-4b90] START, CreateImageVDSCommand( CreateImageVDSCommandParameters:{storagePoolId='ef05d454-fd4f-11e8-b2b0-001a4a168bfc', ignoreFailoverLimit='false', storageDomainId='ad097cbe-eff1-4c67-ac1d-b59fb3109fcd', imageGroupId='4d75d747-a86a-4db9-9939-69ce405d103b', imageSizeInBytes='1073741824', volumeFormat='RAW', newImageId='e9a32dc8-76ba-4d71-8a85-7c522a2564fb', imageType='Preallocated', newImageDescription='{"DiskAlias":"disk_virtioraw_1209125485","DiskDescription":""}', imageInitialSizeInBytes='0'}), log id: 3c56dea
2018-12-12 09:12:56,174+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (default task-50) [disks_create_1ee96e6f-0f69-4b90] -- executeIrsBrokerCommand: calling 'createVolume' with two new parameters: description and UUID
2018-12-12 09:12:56,391+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (default task-50) [disks_create_1ee96e6f-0f69-4b90] FINISH, CreateImageVDSCommand, return: e9a32dc8-76ba-4d71-8a85-7c522a2564fb, log id: 3c56dea
2018-12-12 09:12:56,404+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (default task-50) [disks_create_1ee96e6f-0f69-4b90] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 'bca3e47c-b4e6-4513-b66c-eecfd9bf6e08'
2018-12-12 09:12:56,404+02 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-50) [disks_create_1ee96e6f-0f69-4b90] CommandMultiAsyncTasks::attachTask: Attaching task '566b2774-6d81-43c7-9122-a23f36774184' to command 'bca3e47c-b4e6-4513-b66c-eecfd9bf6e08'.
2018-12-12 09:12:56,429+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (default task-50) [disks_create_1ee96e6f-0f69-4b90] Adding task '566b2774-6d81-43c7-9122-a23f36774184' (Parent Command 'AddImageFromScratch', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
2018-12-12 09:12:56,447+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-78) [disks_create_bc00dabe-4a7c-4cd1] Command 'AddDisk' (id: 'bbabe6b5-ebb7-41b8-b1fc-957c42655bfa') waiting on child command id: 'a89f74c7-724a-4b6a-bfa6-37391e3c26db' type:'AddImageFromScratch' to complete
2018-12-12 09:12:56,454+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (default task-50) [disks_create_1ee96e6f-0f69-4b90] BaseAsyncTask::startPollingTask: Starting to poll task '566b2774-6d81-43c7-9122-a23f36774184'.
2018-12-12 09:12:56,496+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-50) [disks_create_1ee96e6f-0f69-4b90] EVENT_ID: USER_ADD_DISK(2,020), Add-Disk operation of 'disk_virtioraw_1209125485' was initiated by admin@internal-authz.

2018-12-12 09:12:58,451+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [disks_create_1ee96e6f-0f69-4b90] Command 'AddDisk' (id: 'de3c90ac-047f-4785-a9ca-1c1081e5cf64') waiting on child command id: 'bca3e47c-b4e6-4513-b66c-eecfd9bf6e08' type:'AddImageFromScratch' to complete

2018-12-12 09:13:00,468+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] Command [id=bca3e47c-b4e6-4513-b66c-eecfd9bf6e08]: Updating status to 'FAILED', The command end method logic will be executed by one of its parent commands.
2018-12-12 09:13:00,468+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] BaseAsyncTask::onTaskEndSuccess: Task '8baa8f3e-b5bc-4515-baae-3a945edc2203' (Parent Command 'AddImageFromScratch', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2018-12-12 09:13:00,468+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'AddImageFromScratch' completed, handling the result.
2018-12-12 09:13:00,468+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'AddImageFromScratch' succeeded, clearing tasks.
2018-12-12 09:13:00,468+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] SPMAsyncTask::ClearAsyncTask: Attempting to clear task '566b2774-6d81-43c7-9122-a23f36774184'
2018-12-12 09:13:00,468+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] START, SPMClearTaskVDSCommand( SPMTaskGuidBaseVDSCommandParameters:{storagePoolId='ef05d454-fd4f-11e8-b2b0-001a4a168bfc', ignoreFailoverLimit='false', taskId='566b2774-6d81-43c7-9122-a23f36774184'}), log id: 255e1ce5
2018-12-12 09:13:00,468+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] CommandAsyncTask::endActionIfNecessary: All tasks of command 'a89f74c7-724a-4b6a-bfa6-37391e3c26db' has ended -> executing 'endAction'

2018-12-12 09:13:00,509+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] START, HSMClearTaskVDSCommand(HostName = host_mixed_1, HSMTaskGuidBaseVDSCommandParameters:{hostId='6cd3bf70-c103-42b5-8ca3-1a269545959b', taskId='566b2774-6d81-43c7-9122-a23f36774184'}), log id: 6898b681

2018-12-12 09:13:00,545+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] FINISH, HSMClearTaskVDSCommand, return: , log id: 6898b681
2018-12-12 09:13:00,545+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] FINISH, SPMClearTaskVDSCommand, return: , log id: 255e1ce5

2018-12-12 09:13:00,553+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] BaseAsyncTask::removeTaskFromDB: Removed task '566b2774-6d81-43c7-9122-a23f36774184' from DataBase
2018-12-12 09:13:00,553+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-10208) [disks_create_1ee96e6f-0f69-4b90] CommandAsyncTask::HandleEndActionResult [within thread]: Removing CommandMultiAsyncTasks object for entity 'bca3e47c-b4e6-4513-b66c-eecfd9bf6e08'

2018-12-12 09:13:02,472+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-17) [disks_create_1ee96e6f-0f69-4b90] Command 'AddDisk' id: 'de3c90ac-047f-4785-a9ca-1c1081e5cf64' child commands '[bca3e47c-b4e6-4513-b66c-eecfd9bf6e08]' executions were completed, status 'FAILED'

2018-12-12 09:13:06,378+02 ERROR [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [disks_create_1ee96e6f-0f69-4b90] Ending command 'org.ovirt.engine.core.bll.storage.disk.AddDiskCommand' with failure.
2018-12-12 09:13:06,386+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [disks_create_1ee96e6f-0f69-4b90] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand' with failure.


From VDSM log:

2018-12-12 09:12:58,714+0200 WARN  (tasks/0) [storage.LVM] e9a32dc8-76ba-4d71-8a85-7c522a2564fb can't be reloaded, please check your storage connections. (lvm:81)
2018-12-12 09:12:58,714+0200 ERROR (tasks/0) [storage.Volume] Unexpected error (volume:1247)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1216, in create
    apparentSize = int(dom.getVSize(imgUUID, volUUID) /
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 714, in getVSize
    return self._manifest.getVSize(imgUUID, volUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 453, in getVSize
    size = lvm.getLV(self.sdUUID, volUUID).size
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 82, in __getattr__
    raise AttributeError("Failed reload: %s" % self.name)
AttributeError: Failed reload: e9a32dc8-76ba-4d71-8a85-7c522a2564fb
2018-12-12 09:12:58,715+0200 DEBUG (tasks/0) [storage.ResourceManager] Trying to release resource '01_img_ad097cbe-eff1-4c67-ac1d-b59fb3109fcd.4d75d747-a86a-4db9-9939-69ce405d103b' (resourceManager:568)
2018-12-12 09:12:58,716+0200 DEBUG (tasks/0) [storage.ResourceManager] Released resource '01_img_ad097cbe-eff1-4c67-ac1d-b59fb3109fcd.4d75d747-a86a-4db9-9939-69ce405d103b' (0 active users) (resourceManager:586)
2018-12-12 09:12:58,716+0200 DEBUG (tasks/0) [storage.ResourceManager] Resource '01_img_ad097cbe-eff1-4c67-ac1d-b59fb3109fcd.4d75d747-a86a-4db9-9939-69ce405d103b' is free, finding out if anyone is waiting for it. (resourceManager:592)
2018-12-12 09:12:58,716+0200 DEBUG (tasks/0) [storage.ResourceManager] No one is waiting for resource '01_img_ad097cbe-eff1-4c67-ac1d-b59fb3109fcd.4d75d747-a86a-4db9-9939-69ce405d103b', Clearing records. (resourceManager:600)
2018-12-12 09:12:58,717+0200 ERROR (tasks/0) [storage.TaskManager.Task] (Task='566b2774-6d81-43c7-9122-a23f36774184') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1940, in createVolume
    initialSize=initialSize)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 817, in createVolume
    initialSize=initialSize)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1249, in create
    (volUUID, e))
VolumeCreationError: Error creating a new volume: (u'Volume creation e9a32dc8-76ba-4d71-8a85-7c522a2564fb failed: Failed reload: e9a32dc8-76ba-4d71-8a85-7c522a2564fb',)
2018-12-12 09:12:58,717+0200 DEBUG (tasks/0) [storage.TaskManager.Task] (Task='566b2774-6d81-43c7-9122-a23f36774184') Task._run: 566b2774-6d81-43c7-9122-a23f36774184 () {} failed - stopping task (task:894)
2018-12-12 09:12:58,718+0200 DEBUG (tasks/0) [storage.TaskManager.Task] (Task='566b2774-6d81-43c7-9122-a23f36774184') stopping in state running (force False) (task:1256)
2018-12-12 09:12:58,718+0200 DEBUG (tasks/0) [storage.TaskManager.Task] (Task='566b2774-6d81-43c7-9122-a23f36774184') ref 1 aborting True (task:1002)
2018-12-12 09:12:58,718+0200 DEBUG (tasks/0) [storage.TaskManager.Task] (Task='566b2774-6d81-43c7-9122-a23f36774184') aborting: Task is aborted: "Error creating a new volume: (u'Volume creation e9a32dc8-76ba-4d71-8a85-7c522a2564fb failed: Failed reload: e9a32dc8-76ba-4d71-8a85-7c522a2564fb',)" - code 205 (task:928)


Attaching two logs for two test cases.

Comment 13 Yosi Ben Shimon 2018-12-13 10:29:04 UTC
Created attachment 1513976 [details]
TestCase18268

Comment 14 Yosi Ben Shimon 2018-12-13 10:29:32 UTC
Created attachment 1513977 [details]
TestCase19265

Comment 16 Tal Nisan 2019-01-07 09:51:07 UTC
Seems like the same issue as bug 1553133, postponing to async

Comment 17 Fred Rolland 2019-01-21 09:11:22 UTC
*** Bug 1666207 has been marked as a duplicate of this bug. ***

Comment 18 Fred Rolland 2019-01-28 09:55:10 UTC
Targeting to 4.2.8-2 when it will be available

Comment 19 Sandro Bonazzola 2019-02-15 14:15:15 UTC
If this still need fixes, can you please re-target to 4.2.8-3 ?

Comment 21 Avihai 2019-02-17 09:19:23 UTC
Created attachment 1535638 [details]
TC6004_rhv-4.2.8-7_logs

Comment 23 Avihai 2019-02-27 14:03:40 UTC
Created attachment 1539161 [details]
ovirt-engine-4.2.8.3-0.1_Tier1_TestCase6038_glusterfs_logs

Comment 26 Avihai 2019-05-05 06:15:38 UTC
Another encounter of this issue with the delete volume scenario on ovirt-engine-4.3.3.6-0.1.el7.noarch/vdsm-4.30.13-1.el7ev.x86_64.
Occured in Tier1 Automation TestCase6038.
Logs attached.


Engine:
2019-05-04 21:55:33,280+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-75) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command HSMGetAllTasksStatusesVDS failed: Cannot delete volume which has children (non-ethical): ['sdUUID: ecd720bf-6e98-42d7-99b6-b2f59e12435d', u'imgUUID: b1a530b6-fdd7-4424-8945-1749f25156b6', u'volUUID: a858870b-f9e1-48a2-95b9-01337e5ba4f5']
2019-05-04 21:55:33,280+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-75) [] SPMAsyncTask::PollTask: Polling task 'dcb6363c-e7b0-4ad3-bd34-6a139cd34d4d' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'running'.
2019-05-04 21:55:33,280+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-75) [] SPMAsyncTask::PollTask: Polling task '3f8954bd-093f-4dff-a69e-266965267a33' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2019-05-04 21:55:33,280+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-75) [] BaseAsyncTask::logEndTaskFailure: Task '3f8954bd-093f-4dff-a69e-266965267a33' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Cannot delete volume which has children (non-ethical): ['sdUUID: ecd720bf-6e98-42d7-99b6-b2f59e12435d', u'imgUUID: b1a530b6-fdd7-4424-8945-1749f25156b6', u'volUUID: a858870b-f9e1-48a2-95b9-01337e5ba4f5'], code = 100',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Cannot delete volume which has children (non-ethical): ['sdUUID: ecd720bf-6e98-42d7-99b6-b2f59e12435d', u'imgUUID: b1a530b6-fdd7-4424-8945-1749f25156b6', u'volUUID: a858870b-f9e1-48a2-95b9-01337e5ba4f5'], code = 100'

VDSM:
2019-05-04 21:55:30,237+0300 ERROR (tasks/7) [storage.TaskManager.Task] (Task='3f8954bd-093f-4dff-a69e-266965267a33') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1967, in deleteVolume
    vol.delete(postZero=postZero, force=force, discard=discard)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 588, in delete
    self.validateDelete()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1239, in validateDelete
    self._manifest.validateDelete()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 509, in validateDelete
    raise se.VolumeImageHasChildren(self)
VolumeImageHasChildren: Cannot delete volume which has children (non-ethical): ['sdUUID: ecd720bf-6e98-42d7-99b6-b2f59e12435d', u'imgUUID: b1a530b6-fdd7-4424-8945-1749f25156b6', u'volUUID: a858870b-f9e1-48a2-95b9-01337e5ba4f5']

Comment 27 Avihai 2019-05-05 06:18:34 UTC
Created attachment 1563759 [details]
TestCase6038_ovirt-engine-4.3.3.6-0.1_rep_delete_volume_scenario

Comment 28 Nir Soffer 2019-05-28 22:33:01 UTC
If we have automated tests reproducing this issue, we should run them with
the patch for bug 1693075. It may reveal important LVM warnings which are 
hidden in the default log level.

Comment 29 Avihai 2019-05-29 10:26:43 UTC
(In reply to Nir Soffer from comment #28)
> If we have automated tests reproducing this issue, we should run them with
> the patch for bug 1693075. It may reveal important LVM warnings which are 
> hidden in the default log level.

Unfortunately, this bug (which includes both create/delete volume failure) is very random and does not reproduce for a specific test case.
We encounter them sporadically when running Tier1/2/3, at some builds we do not see them at all and then suddenly we see them again in other builds.

Please push the patch as planned and we hope we will see it when we run 4.3.5.
We do not have the capacity to run temporary patches at the moment.

Comment 30 Avihai 2019-06-23 06:26:09 UTC
Another encounter of this issue with the delete volume scenario(Tier1 TestCase6038).

Very strangely I cannot see any ERROR at VDSM logs.
Nir did something change? can you take a peek?

Builds:
ovirt-engine-4.3.4.3-0.1.el7.noarch
vdsm-4.30.17-1.el7ev.x86_64
libvirt-4.5.0-10.el7_6.12.x86_64
qemu-img-rhev-2.12.0-33.el7.x86_64
sanlock-3.6.0-1.el7.x86_64


Engine:
2019-06-22 09:07:28,853+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2019-06-22 09:07:28,860+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM 
host_mixed_1 command HSMGetAllTasksStatusesVDS failed: Cannot delete volume which has children (non-ethical): [u'sdUUID: 77d0c50b-bf2f-4103-a1e3-b70024f9b734', u'imgUUID: 53d22544-f178-4587-a9a8-e35bb3634184', 
u'volUUID: 843c8df0-1ec8-4ef4-bcb8-8af174ca50cb']

2019-06-22 09:07:28,866+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [] BaseAsyncTask::logEndTaskFailure: Task '4a181bc1-f240-48be-8ba9-36bc19d4c52
b' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Cannot delete volume which has children (non-ethical): [u'sdUUID: 77d0c50b-bf2f-4103-a1e3-b70024f9b734', u'imgUUID: 53d22544-f178-4587-a9a8-e35bb3634184', u'volUUID: 843c8df0-1ec8-4ef4-bcb8-8af174ca50cb'], code = 100',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Cannot delete volume which has children (non-ethical): [u'sdUUID: 77d0c50b-bf2f-4103-a1e3-b70024f9b734', u'imgUUID: 53d22544-f178-4587-a9a8-e35bb3634184', u'volUUID: 843c8df0-1ec8-4ef4-bcb8-8af174ca50cb'], code = 100'

2019-06-22 09:07:33,891+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [snapshots_delete_0ee04ba3-dcb5-4c18] Merging of snapshot 'b29d3af3-acca-4b4c-874d-f24c4d9f23b9' images '060c78fe-ba3d-44f0-9b8a-5788cc451b0e'..'843c8df0-1ec8-4ef4-bcb8-8af174ca50cb' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
2019-06-22 09:07:33,893+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [snapshots_delete_0ee04ba3-dcb5-4c18] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' with failure.
2019-06-22 09:07:55,885+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [snapshots_delete_0ee04ba3-dcb5-4c18] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' with failure.
2019-06-22 09:07:55,916+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [snapshots_delete_0ee04ba3-dcb5-4c18] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Failed to delete snapshot 'snapshot_6038_nfs_1' for VM 'vm_TestCase6038_2208503035'.

Comment 31 Avihai 2019-06-23 06:28:22 UTC
Created attachment 1583625 [details]
TestCase6038_ovirt-engine-4.3.4.3-0.1_rep_delete_volume_scenario

Comment 32 Avihai 2019-06-23 06:48:44 UTC
Another what looks like create snapshot issue - Not sure, Needs Nir ack.
Automation Tier1 TestCase6052.

Builds:
ovirt-engine-4.3.4.3-0.1.el7.noarch
vdsm-4.30.17-1.el7ev.ppc64le
libvirt-4.5.0-10.el7_6.12.ppc64le


Logs look more detailed than before.

Engine:

2019-06-22 01:13:21,017+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-60) [snapshots_create_f3de4fca-23ee-460a] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandP
arameters:{storagePoolId='99a910aa-f9d9-4d21-ba13-b966e02f5920', ignoreFailoverLimit='false', storageDomainId='0c5ae784-9441-42d8-85e1-4f378a15a1fa', imageGroupId='73c56871-dcd1-4c05-bde4-9d1799186ff1', imageSi
zeInBytes='10737418240', volumeFormat='COW', newImageId='49b8220f-4ed8-4805-8f7e-79cb4b0f6f6c', imageType='Sparse', newImageDescription='', imageInitialSizeInBytes='0', imageId='54204868-cffb-4c7f-8e95-a9d3f6e2
a104', sourceImageGroupId='73c56871-dcd1-4c05-bde4-9d1799186ff1'}), log id: 286616ce
2019-06-22 01:13:21,239+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-60) [snapshots_create_f3de4fca-23ee-460a] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command CreateVolumeVDS failed: Can't persist task: (u'708997a5-84e2-42f9-8021-7a2ce2139852: cannot access/create taskdir /rhev/data-center/99a910aa-f9d9-4d21-ba13-b966e02f5920/mastersd/master/tasks/708997a5-84e2-42f9-8021-7a2ce2139852: [Errno 11] Resource temporarily unavailable',)
2019-06-22 01:13:21,240+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-60) [snapshots_create_f3de4fca-23ee-460a] Command 'CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='99a910aa-f9d9-4d21-ba13-b966e02f5920', ignoreFailoverLimit='false', storageDomainId='0c5ae784-9441-42d8-85e1-4f378a15a1fa', imageGroupId='73c56871-dcd1-4c05-bde4-9d1799186ff1', imageSizeInBytes='10737418240', volumeFormat='COW', newImageId='49b8220f-4ed8-4805-8f7e-79cb4b0f6f6c', imageType='Sparse', newImageDescription='', imageInitialSizeInBytes='0', imageId='54204868-cffb-4c7f-8e95-a9d3f6e2a104', sourceImageGroupId='73c56871-dcd1-4c05-bde4-9d1799186ff1'})' execution failed: IRSGenericException: IRSErrorException: Can't persist task: (u'708997a5-84e2-42f9-8021-7a2ce2139852: cannot access/create taskdir /rhev/data-center/99a910aa-f9d9-4d21-ba13-b966e02f5920/mastersd/master/tasks/708997a5-84e2-42f9-8021-7a2ce2139852: [Errno 11] Resource temporarily unavailable',)
2019-06-22 01:13:21,240+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-60) [snapshots_create_f3de4fca-23ee-460a] FINISH, CreateVolumeVDSCommand, return: 00000000-0000-0000-0000-000000000000, log id: 286616ce
2019-06-22 01:13:21,240+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (default task-60) [snapshots_create_f3de4fca-23ee-460a] Failed creating snapshot from image id '54204868-cffb-4c7f-8e95-a9d3f6e2a104'

2019-06-22 01:13:37,601+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2019-06-22 01:13:37,606+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command HSMGetAllTasksStatusesVDS failed: Logical volume does not exist: (u'0c5ae784-9441-42d8-85e1-4f378a15a1fa/c02be00e-dc30-4ad7-880f-6cc70099b018',)
2019-06-22 01:13:37,606+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2019-06-22 01:13:37,607+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2019-06-22 01:13:37,607+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] SPMAsyncTask::PollTask: Polling task 'ae0b383b-b069-47b9-8c13-d81d834fca27' (Parent Command 'CreateSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2019-06-22 01:13:37,610+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] BaseAsyncTask::logEndTaskFailure: Task 'ae0b383b-b069-47b9-8c13-d81d834fca27' (Parent Command 'CreateSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSTaskResultNotSuccessException: TaskState contained successful return code, but a non-success result ('cleanSuccess').',
-- Exception: 'VDSGenericException: VDSTaskResultNotSuccessException: TaskState contained successful return code, but a non-success result ('cleanSuccess').'
2019-06-22 01:13:37,610+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] CommandAsyncTask::endActionIfNecessary: All tasks of command '4d9789f3-8ffc-4de2-b159-493c0972715a' has ended -> executing 'endAction'
2019-06-22 01:13:37,610+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: '4d9789f3-8ffc-4de2-b159-493c0972715a'): calling endAction '.
2019-06-22 01:13:37,611+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] SPMAsyncTask::PollTask: Polling task '6937cdb3-2b12-4fd8-a485-70ad53ce2284' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2019-06-22 01:13:37,611+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [] BaseAsyncTask::logEndTaskFailure: Task '6937cdb3-2b12-4fd8-a485-70ad53ce2284' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Logical volume does not exist: (u'0c5ae784-9441-42d8-85e1-4f378a15a1fa/2c42218a-16f2-428a-a8cf-1662f89c0270',), code = 610',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Logical volume does not exist: (u'0c5ae784-9441-42d8-85e1-4f378a15a1fa/2c42218a-16f2-428a-a8cf-1662f89c0270',), code = 610'

VDSM:
2019-06-22 01:13:21,117+0300 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='708997a5-84e2-42f9-8021-7a2ce2139852') Unexpected error (task:1117)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1115, in setPersistence
    getProcPool().fileUtils.createdir(taskDir)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 169, in createdir
    self._iop.mkdir(tmpPath)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 510, in mkdir
    self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
OSError: [Errno 11] Resource temporarily unavailable
2019-06-22 01:13:21,117+0300 INFO  (jsonrpc/0) [vdsm.api] FINISH createVolume error=Can't persist task: (u'708997a5-84e2-42f9-8021-7a2ce2139852: cannot access/create taskdir /rhev/data-center/99a910aa-f9d9-4d21-ba13-b966e02f5920/mastersd/master/tasks/708997a5-84e2-42f9-8021-7a2ce2139852: [Errno 11] Resource temporarily unavailable',) from=::ffff:10.46.16.72,47188, flow_id=snapshots_create_f3de4fca-23ee-460a, task_id=708997a5-84e2-42f9-8021-7a2ce2139852 (api:52)
2019-06-22 01:13:21,117+0300 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='708997a5-84e2-42f9-8021-7a2ce2139852') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in createVolume
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1459, in createVolume
    volUUID, desc, srcImgUUID, srcVolUUID, initialSize)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 923, in _spmSchedule
    name, func, *args)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/taskManager.py", line 74, in scheduleJob
    task.setPersistence(store, cleanPolicy=TaskCleanType.manual)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1119, in setPersistence
    " %s: %s" % (self, taskDir, e))
TaskPersistError: Can't persist task: (u'708997a5-84e2-42f9-8021-7a2ce2139852: cannot access/create taskdir /rhev/data-center/99a910aa-f9d9-4d21-ba13-b966e02f5920/mastersd/master/tasks/708997a5-84e2-42f9-8021-7a2ce2139852: [Errno 11] Resource temporarily unavailable',)

Comment 33 Avihai 2019-06-23 06:49:49 UTC
Created attachment 1583626 [details]
TestCase6052_ovirt-engine-4.3.4.3-0.1_rep_create_volume_scenario

Comment 35 Nir Soffer 2020-01-16 20:14:56 UTC
(In reply to Avihai from comment #32)
> Another what looks like create snapshot issue - Not sure, Needs Nir ack.
> Automation Tier1 TestCase6052.

Does not look related, the issue is:

> OSError: [Errno 11] Resource temporarily unavailable

Meaning there is some issue to access file based storage using ioprocess.

Comment 36 Nir Soffer 2020-01-22 14:03:31 UTC
This bug looks like unrelated mix of random errors which are not related.

I don't think we can do anything with such bug. Please file separate bugs for every
error with clear reproduction step for each one.

The only issue we should handle here is the one reported in comment 0.

Comment 39 Avihai 2020-03-17 16:04:12 UTC
just saw the the delete volume scenario issue in rhv-4.3.9-6/engine 4.3.9.3-0.1.el7/vdsm-4.30.42-1.el7ev.x86_64.

As requested opened new bug 1814318 on this issue.

VDSM:
2020-03-17 02:57:36,194+0200 ERROR (tasks/5) [storage.TaskManager.Task] (Task='4d812043-4937-4ef7-b994-8518697ca36c') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1967, in deleteVolume
    vol.delete(postZero=postZero, force=force, discard=discard)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line 563, in delete
    self.validateDelete()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1292, in validateDelete
    self._manifest.validateDelete()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 545, in validateDelete
    raise se.VolumeImageHasChildren(self)
VolumeImageHasChildren: Cannot delete volume which has children (non-ethical): [u'sdUUID: 4f3fbe9e-e4a4-45e7-9b62-27fafe535028', u'imgUUID: 1f99972b-560a-4477-833c-f0261fb28f9f', u'volUUID: 61dd1b0f-d1d3-4bd9-b15d-13da7aa927c4']


Note You need to log in before you can comment on or make changes to this bug.