Hide Forgot
Created attachment 1131432 [details] engine and vdsm logs Description of problem: Migrating Disk job hangs in engine after a number of migrations are performed. The VDSM hosts no longer show any running tasks Version-Release number of selected component (if applicable): How reproducible: 50% (about every second full tier 2 Live migration run) Steps to Reproduce: 1. Run Live migrations for disks using same and different domain types 2. Repeat migrations using all available permutations Actual results: Migrating Disk job does not show up as complete in the engine. Expected results: The job/task state should be in sync between the engine and the vdsm hosts Additional info: Here's the start of the disk migration job: 2016-02-29 03:10:41,822 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [3c171353] Correlation ID: 3c171353, Job ID: 0e96ff1f-6a49-4626-b756-929c2e44f258, Call Stack: null, Custom Event ID: -1, Message: User admin@internal moving disk disk_TestCase5988_REST_ISCSI_2016-02-29_03-01-50_Disk_virtio_cow_sparse-True_alias to domain iscsi_1 Here's the failed remove on the disk noting the migration is still ongoing over 6.5 hours later: 2016-02-29 08:45:33,457 WARN [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-17) [68343745] CanDoAction of action 'RemoveVm' failed for user admin@internal. Reasons: VAR__ACTION__REMOVE,VAR__TYPE__VM,ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED,$DiskName disk_TestCase5988_REST_ISCSI_2016-02-29_03-01-50_Disk_virtio_cow_sparse-True_alias Please find attached logs
Gilad, can you please specify the engine and VDSM's [rpm] versions please?
Allon, sure: Engine: 3.6.3.2-0.1 VDSM: 4.17.23-0
Hi Gilad, After reproducing the described scenario: * What is the status of the disks? I.e. are they locked? * Are any of the disks failed to migrate? * Does it reproduce only in scale tests? * What was the magnitude of migration in the described flow? Thanks! Daniel
Daniel, https://gerrit.ovirt.org/#/c/54730/ and its backport https://gerrit.ovirt.org/54776 are both merged. Is there anything else we need for this BZ? If not, can it be moved to MODIFIED?
(In reply to Allon Mureinik from comment #4) > Daniel, https://gerrit.ovirt.org/#/c/54730/ and its backport > https://gerrit.ovirt.org/54776 are both merged. Is there anything else we > need for this BZ? If not, can it be moved to MODIFIED? Yes, Should be rechecked on latest build.
Bugs moved pre-mature to ON_QA since they didn't have target release. Notice that only bugs with a set target release will move to ON_QA.
Verified to be fixed with 3.6.5, ran 5 full live storage migration runs, no stuck task encountered