Bug 1312741 - Engine continually shows Migrating Disk job while no running tasks are reported in connected vdsm hosts
Engine continually shows Migrating Disk job while no running tasks are report...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.0
Unspecified Unspecified
medium Severity high (vote)
: ovirt-3.6.5
: 3.6.5
Assigned To: Daniel Erez
Gilad Lazarovich
storage
: Automation, AutomationBlocker
Depends On:
Blocks: 1315960
  Show dependency treegraph
 
Reported: 2016-02-29 02:24 EST by Gilad Lazarovich
Modified: 2016-06-23 00:52 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-21 10:39:49 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
amureini: ovirt‑3.6.z?
glazarov: planning_ack?
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine and vdsm logs (12.04 MB, application/octet-stream)
2016-02-29 02:24 EST, Gilad Lazarovich
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 54730 None None None 2016-03-15 13:08 EDT

  None (edit)
Description Gilad Lazarovich 2016-02-29 02:24:07 EST
Created attachment 1131432 [details]
engine and vdsm logs

Description of problem:
Migrating Disk job hangs in engine after a number of migrations are performed. The VDSM hosts no longer show any running tasks

Version-Release number of selected component (if applicable):


How reproducible:
50% (about every second full tier 2 Live migration run)

Steps to Reproduce:
1. Run Live migrations for disks using same and different domain types
2. Repeat migrations using all available permutations

Actual results:
Migrating Disk job does not show up as complete in the engine.

Expected results:
The job/task state should be in sync between the engine and the vdsm hosts

Additional info:
Here's the start of the disk migration job:
2016-02-29 03:10:41,822 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [3c171353] Correlation ID: 3c171353, Job ID: 0e96ff1f-6a49-4626-b756-929c2e44f258, Call Stack: null, Custom Event ID: -1, Message: User admin@internal moving disk disk_TestCase5988_REST_ISCSI_2016-02-29_03-01-50_Disk_virtio_cow_sparse-True_alias to domain iscsi_1

Here's the failed remove on the disk noting the migration is still ongoing over 6.5 hours later:
2016-02-29 08:45:33,457 WARN  [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-17) [68343745] CanDoAction of action 'RemoveVm' failed for user admin@internal. Reasons: VAR__ACTION__REMOVE,VAR__TYPE__VM,ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED,$DiskName disk_TestCase5988_REST_ISCSI_2016-02-29_03-01-50_Disk_virtio_cow_sparse-True_alias

Please find attached logs
Comment 1 Allon Mureinik 2016-02-29 07:10:48 EST
Gilad, can you please specify the engine and VDSM's [rpm] versions please?
Comment 2 Gilad Lazarovich 2016-03-02 03:42:50 EST
Allon, sure:
Engine: 3.6.3.2-0.1
VDSM: 4.17.23-0
Comment 3 Daniel Erez 2016-03-14 12:25:17 EDT
Hi Gilad,

After reproducing the described scenario:
* What is the status of the disks? I.e. are they locked?
* Are any of the disks failed to migrate?
* Does it reproduce only in scale tests?
* What was the magnitude of migration in the described flow?

Thanks!
Daniel
Comment 4 Allon Mureinik 2016-03-27 10:30:07 EDT
Daniel, https://gerrit.ovirt.org/#/c/54730/ and its backport https://gerrit.ovirt.org/54776 are both merged. Is there anything else we need for this BZ? If not, can it be moved to MODIFIED?
Comment 5 Daniel Erez 2016-03-28 04:38:25 EDT
(In reply to Allon Mureinik from comment #4)
> Daniel, https://gerrit.ovirt.org/#/c/54730/ and its backport
> https://gerrit.ovirt.org/54776 are both merged. Is there anything else we
> need for this BZ? If not, can it be moved to MODIFIED?

Yes, Should be rechecked on latest build.
Comment 6 Eyal Edri 2016-03-31 04:36:09 EDT
Bugs moved pre-mature to ON_QA since they didn't have target release.
Notice that only bugs with a set target release will move to ON_QA.
Comment 7 Gilad Lazarovich 2016-04-10 08:55:05 EDT
Verified to be fixed with 3.6.5, ran 5 full live storage migration runs, no stuck task encountered

Note You need to log in before you can comment on or make changes to this bug.