Red Hat Bugzilla – Bug 1312741
Engine continually shows Migrating Disk job while no running tasks are reported in connected vdsm hosts
Last modified: 2016-06-23 00:52:01 EDT
Created attachment 1131432 [details]
engine and vdsm logs
Description of problem:
Migrating Disk job hangs in engine after a number of migrations are performed. The VDSM hosts no longer show any running tasks
Version-Release number of selected component (if applicable):
50% (about every second full tier 2 Live migration run)
Steps to Reproduce:
1. Run Live migrations for disks using same and different domain types
2. Repeat migrations using all available permutations
Migrating Disk job does not show up as complete in the engine.
The job/task state should be in sync between the engine and the vdsm hosts
Here's the start of the disk migration job:
2016-02-29 03:10:41,822 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [3c171353] Correlation ID: 3c171353, Job ID: 0e96ff1f-6a49-4626-b756-929c2e44f258, Call Stack: null, Custom Event ID: -1, Message: User admin@internal moving disk disk_TestCase5988_REST_ISCSI_2016-02-29_03-01-50_Disk_virtio_cow_sparse-True_alias to domain iscsi_1
Here's the failed remove on the disk noting the migration is still ongoing over 6.5 hours later:
2016-02-29 08:45:33,457 WARN [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-17)  CanDoAction of action 'RemoveVm' failed for user admin@internal. Reasons: VAR__ACTION__REMOVE,VAR__TYPE__VM,ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED,$DiskName disk_TestCase5988_REST_ISCSI_2016-02-29_03-01-50_Disk_virtio_cow_sparse-True_alias
Please find attached logs
Gilad, can you please specify the engine and VDSM's [rpm] versions please?
After reproducing the described scenario:
* What is the status of the disks? I.e. are they locked?
* Are any of the disks failed to migrate?
* Does it reproduce only in scale tests?
* What was the magnitude of migration in the described flow?
Daniel, https://gerrit.ovirt.org/#/c/54730/ and its backport https://gerrit.ovirt.org/54776 are both merged. Is there anything else we need for this BZ? If not, can it be moved to MODIFIED?
(In reply to Allon Mureinik from comment #4)
> Daniel, https://gerrit.ovirt.org/#/c/54730/ and its backport
> https://gerrit.ovirt.org/54776 are both merged. Is there anything else we
> need for this BZ? If not, can it be moved to MODIFIED?
Yes, Should be rechecked on latest build.
Bugs moved pre-mature to ON_QA since they didn't have target release.
Notice that only bugs with a set target release will move to ON_QA.
Verified to be fixed with 3.6.5, ran 5 full live storage migration runs, no stuck task encountered