Bug 1585950
Summary: | [downstream clone - 4.2.8] Live Merge failed on engine with "still in volume chain", but merge on host was successful | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> | ||||
Component: | ovirt-engine | Assignee: | Eyal Shenitzky <eshenitz> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Avihai <aefrat> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 4.1.9 | CC: | aefrat, amarchuk, asabadra, bcholler, bzlotnik, cshao, dfodor, dhuertas, ebenahar, eedri, eshames, eshenitz, gveitmic, gwatson, jspanko, lsurette, lveyde, michael.moir, mkalinin, ratamir, rbalakri, Rhev-m-bugs, sirao, srevivo, tnisan | ||||
Target Milestone: | ovirt-4.2.8 | Keywords: | Reopened, ZStream | ||||
Target Release: | --- | Flags: | jspanko:
needinfo?
|
||||
Hardware: | Unspecified | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | ovirt-engine-4.2.4.4 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1554369 | Environment: | |||||
Last Closed: | 2019-01-25 12:50:23 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1554369 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
RHV bug bot
2018-06-05 07:38:01 UTC
Ala - please take a look. I'm tentatively targetting for 4.2.2, just because it's quite let to get anything into 4.1.10. If we *do* find a quick [and safe!] fix, this should definitely be a candidate for 4.1.z. (Originally by amureini) Moving to 4.2.4 for now as the issue isn't reproducible and need more analysis. (Originally by Ala Hino) Evelina, please take a look (Originally by Elad Ben Aharon) Waiting for customer's response. (Originally by Evelina Shames) (In reply to Evelina Shames from comment #17) > Waiting for customer's response. What response are we waiting for? The customer provided the script they use, and the request was to attempt to reproduce the bug on an 4.2 env. What am I missing here? (Originally by amureini) Me and Ala had a few questions about the script, Gordon sent them to the customer. (Originally by Evelina Shames) Hi Evelina, Let's try to come with a script from our side that simulates what the customer is trying to do: 1. Create a VM 2. Create a snapshot 3. Delete up to N snapshots. When deleting a snapshot, we have to check the all snapshots status is OK before proceeding to next deletion Pseudo code: vm = _create_vm() _create_snapshot(vm) // if there are X snapshots and X>N, _list_vm_snapshots returns X-N snapshots snapshots = _list_vm_snapshots(vm.id) for s in snapshots: _delete_snapshot(s) _del_completed = _check_snapshot_status(vm) while not _del_completed: _del_completed = _check_snapshot_status(vm) As a reference please see listSnapsToDelete and checkSnapStateOK in the customer script. Let me know if you need any help with the script. (Originally by Ala Hino) This is happening very frequently when the customer uses Commvault. Just reviewed the 3 cases attached above, happened with: rhvm-4.2.3.5-0.1.el7.noarch vdsm-4.19.50-1.el7ev.x86_64 The engine sees the volume still in chain, but all was fine on the host. A retry fixed the issue. (Originally by Germano Veit Michel) Examined our latest automation executions done for 4.2.4-5 build. Snapshot removal tests passed. Used: rhvm-4.2.4.4-0.1.el7_3.noarch vdsm-4.20.31-1.el7ev.x86_64 libvirt-3.9.0-14.el7_5.6.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2071 Managed to reproduce this bug with the following steps: In 4.3 engine and cluster < 4.2: 1) Create vm1 with a disk 2) Create backup_vm with a disk 3) Create snapshot ('snap1') to vm1 4) run vm1 backup_vm 5) Attach snap1 to backup_vm 6) Power-off backup_vm 7) Remove backup_vm 8) Remove snap1 Live merge failed with the following error - 2018-10-11 11:56:48,883+03 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-10) [c1dce6ca-df36-4e6f-a5f1-e3cc063ccf6a] Failed to live merge. Top volume e9ffc75b-caca-4188-aa3f-a983ae2554d1 is still in qemu chain [ca9223cb-960a-4da7-8c6e-6166b269c813, e9ffc75b-caca-4188-aa3f-a983ae2554d1] Created attachment 1507931 [details]
engine log
This bug should have been moved to ON_QA. same as the other bug, the bz bot that acks bugs wasn't working until yesterday, so it got acked only today. please check on it on the next build scheduled today. QE verification bot: the bug was verified upstream BZ<2>Jira Resync |