| Summary: | Engine continually shows Migrating Disk job while no running tasks are reported in connected vdsm hosts | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Gilad Lazarovich <glazarov> | ||||
| Component: | BLL.Storage | Assignee: | Daniel Erez <derez> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Gilad Lazarovich <glazarov> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.6.0 | CC: | acanan, amureini, bugs, derez, glazarov | ||||
| Target Milestone: | ovirt-3.6.5 | Keywords: | Automation, AutomationBlocker | ||||
| Target Release: | 3.6.5 | Flags: | amureini:
ovirt-3.6.z?
glazarov: planning_ack? rule-engine: devel_ack+ rule-engine: testing_ack+ |
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-04-21 14:39:49 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1315960 | ||||||
| Attachments: |
|
||||||
Gilad, can you please specify the engine and VDSM's [rpm] versions please? Allon, sure: Engine: 3.6.3.2-0.1 VDSM: 4.17.23-0 Hi Gilad, After reproducing the described scenario: * What is the status of the disks? I.e. are they locked? * Are any of the disks failed to migrate? * Does it reproduce only in scale tests? * What was the magnitude of migration in the described flow? Thanks! Daniel Daniel, https://gerrit.ovirt.org/#/c/54730/ and its backport https://gerrit.ovirt.org/54776 are both merged. Is there anything else we need for this BZ? If not, can it be moved to MODIFIED? (In reply to Allon Mureinik from comment #4) > Daniel, https://gerrit.ovirt.org/#/c/54730/ and its backport > https://gerrit.ovirt.org/54776 are both merged. Is there anything else we > need for this BZ? If not, can it be moved to MODIFIED? Yes, Should be rechecked on latest build. Bugs moved pre-mature to ON_QA since they didn't have target release. Notice that only bugs with a set target release will move to ON_QA. Verified to be fixed with 3.6.5, ran 5 full live storage migration runs, no stuck task encountered |
Created attachment 1131432 [details] engine and vdsm logs Description of problem: Migrating Disk job hangs in engine after a number of migrations are performed. The VDSM hosts no longer show any running tasks Version-Release number of selected component (if applicable): How reproducible: 50% (about every second full tier 2 Live migration run) Steps to Reproduce: 1. Run Live migrations for disks using same and different domain types 2. Repeat migrations using all available permutations Actual results: Migrating Disk job does not show up as complete in the engine. Expected results: The job/task state should be in sync between the engine and the vdsm hosts Additional info: Here's the start of the disk migration job: 2016-02-29 03:10:41,822 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [3c171353] Correlation ID: 3c171353, Job ID: 0e96ff1f-6a49-4626-b756-929c2e44f258, Call Stack: null, Custom Event ID: -1, Message: User admin@internal moving disk disk_TestCase5988_REST_ISCSI_2016-02-29_03-01-50_Disk_virtio_cow_sparse-True_alias to domain iscsi_1 Here's the failed remove on the disk noting the migration is still ongoing over 6.5 hours later: 2016-02-29 08:45:33,457 WARN [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-17) [68343745] CanDoAction of action 'RemoveVm' failed for user admin@internal. Reasons: VAR__ACTION__REMOVE,VAR__TYPE__VM,ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED,$DiskName disk_TestCase5988_REST_ISCSI_2016-02-29_03-01-50_Disk_virtio_cow_sparse-True_alias Please find attached logs