Bug 1327140
Summary: | Failed to MergeVDS, error = Drive image file could not be found, code = 13 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | sha | ||||||||
Component: | General | Assignee: | Ala Hino <ahino> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.6.4.1 | CC: | acanan, ahino, amureini, bugs, ebenahar, gregor_forum, sbonazzo, sha, tnisan, ylavi | ||||||||
Target Milestone: | ovirt-3.6.6 | Flags: | rule-engine:
ovirt-3.6.z+
ylavi: planning_ack+ rule-engine: devel_ack+ acanan: testing_ack+ |
||||||||
Target Release: | 3.6.6.2 | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2016-05-30 10:56:02 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
sha
2016-04-14 10:31:38 UTC
Ala, seems like something that the patchset you're working on should also solve, no? Hi, I have here the same problem. Here the Snapshot is not marked illegal, the Status is "OK". Environment: CentOS 7.2.1511 kernel-3.10.0-327.13.1.el7.x86_64 ovirt-engine-3.6.4.1-1.el7.centos.noarch ... 2016-04-19 19:42:02,243 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-7-thread-2) [111550ed] Command 'org.ovirt.engine.core.bll.MergeCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 (Failed with error imageErr and code 13) ... UPDATE: Now I see that the Disk's inside the Snapshot are marked as Illegal, for all Snapshots I solved the problem, these where my steps: - power off all VM's - reboot the host - now all VM's should be off - delete the snapshots, this took very very long but it finished and the snapshots are removed + the VM is able to boot ;-) (In reply to Allon Mureinik from comment #1) > Ala, seems like something that the patchset you're working on should also > solve, no? Yes, this BZ is supposed to be fixed by https://gerrit.ovirt.org/56734. Basically, when an volume becomes illegal as a result of a failure during live merge, the patch enables the user to recover from this state by running live merge again on same volume Ala, is there a work around in the mean time? I just tried creatign another snapshot and deleting it on a different VM and got the exact same issues. Except this time, its being referenced to in the python check_disk tool. Alpheus currently has three snapshots listed under the "Snapshots" tab: Current: OK (3273de02-507d-4917-a446-a18b0219f79a) _backup_alpheus_201605091325: Illegal (98bcd787-8212-49fc-b029-24166eafe9ee) Base CentOS7 install: OK (d8a02d4b-c92d-4d08-97a5-d155d9cd2347) Output of the check_disk.py VM alpheus Disk 4e5dac5e-6a41-49b3-99de-60ccf0c3a20a (sd:8d6835a5-7b8f-4cc7-a8bb-3a07926d522d) Volumes: d8a02d4b-c92d-4d08-97a5-d155d9cd2347 98bcd787-8212-49fc-b029-24166eafe9ee I've attached two logs, the inital_snapshot-deletion.txt is the first attempt at deleting the snapshot. The snapshot-deletion.txt is a subsequent attempt at deleting the snapshot. Created attachment 1155351 [details]
First attempt at deleting the snapshot
Created attachment 1155352 [details]
Subsequent attempt at deleting the snapshot
From the initial log it seems that the volume was not removed from chain: 2016-05-09 14:03:18,508 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (pool-7-thread-1) [4b362aae] Failed to live merge, still in volume chain: [3273de02-507d-4917-a446-a18b0219f79a, 98bcd787-8212-49fc-b029-24166eafe9ee] Can you please attach vdsm log? The log will help understand why second merge failed too. BTW, are you using a build that includes https://gerrit.ovirt.org/56734 patch? The patch is in 3.6.6. Please note that recovery by attempting live merge again is only support in 3.6.6 and later versions. Created attachment 1155610 [details]
vdsm.log from host running VM
I've attached the VDSM log from around the time of the two merges. It looks like I'm only running 3.6.5.3-1.el6. Does this mean live merge does not work for this version? Is this a regression (I'm sure it used to work)? Ovirt packages installed on the engine are: [root@ovirt-hosted-00 ~]# rpm -qa ovirt-* ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.5.3-1.el6.noarch ovirt-engine-extensions-api-impl-3.6.5.3-1.el6.noarch ovirt-engine-webadmin-portal-3.6.5.3-1.el6.noarch ovirt-engine-dbscripts-3.6.5.3-1.el6.noarch ovirt-engine-wildfly-8.2.1-1.el6.x86_64 ovirt-vmconsole-proxy-1.0.0-1.el6.noarch ovirt-guest-agent-1.0.11-1.el6.noarch ovirt-engine-lib-3.6.5.3-1.el6.noarch ovirt-setup-lib-1.0.1-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-common-3.6.5.3-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-3.6.5.3-1.el6.noarch ovirt-engine-extension-aaa-jdbc-1.0.6-1.el6.noarch ovirt-engine-setup-plugin-websocket-proxy-3.6.5.3-1.el6.noarch ovirt-engine-vmconsole-proxy-helper-3.6.5.3-1.el6.noarch ovirt-engine-tools-backup-3.6.5.3-1.el6.noarch ovirt-engine-userportal-3.6.5.3-1.el6.noarch ovirt-engine-restapi-3.6.5.3-1.el6.noarch ovirt-engine-backend-3.6.5.3-1.el6.noarch ovirt-engine-jboss-as-7.1.1-1.el6.x86_64 ovirt-engine-cli-3.6.2.0-1.el6.noarch ovirt-image-uploader-3.6.0-1.el6.noarch ovirt-vmconsole-1.0.0-1.el6.noarch ovirt-host-deploy-java-1.4.1-1.el6.noarch ovirt-engine-wildfly-overlay-8.0.5-1.el6.noarch ovirt-release36-007-1.noarch ovirt-engine-setup-base-3.6.5.3-1.el6.noarch ovirt-engine-setup-3.6.5.3-1.el6.noarch ovirt-engine-websocket-proxy-3.6.5.3-1.el6.noarch ovirt-engine-sdk-python-3.6.3.0-1.el6.noarch ovirt-engine-tools-3.6.5.3-1.el6.noarch ovirt-iso-uploader-3.6.0-1.el6.noarch ovirt-engine-3.6.5.3-1.el6.noarch ovirt-host-deploy-1.4.1-1.el6.noarch Live merge works for 3.6.5. No regression here. We enhanced live merge recovery mechanism for 3.6.6 to enable users to recover from use cases where volumes become ILLEGAL by attempting live merge again. If there is a "real" storage issue, re-attempting live merge will not help. I will look at vdsm log now and see what I can find. Does the bug verification require hosted engine env? Were you able to reproduce the issue without the fix? If so, I'd recommend to verify that issue is resolved with the fix Ala, I have Verified this matter on a my environment, I was asking for you'r best knowledge, is this should be enough or should the verification include reproducing this illegal snapshot as a result of hosted-engine shutdown? As the bug originally encountered on hosted-engine environment, I do recommend to verify the fix on similar setup Tested the following: - Deployed hosted-engine over 2 hosts - Created domain, VM, live snapshot - Live merge and during the operation, took down the hosted-engine. While the hosted-engine was down and the task of deleteImage was marked as finished on SPM, I cleared the task (vdsClient -s 0 clearTask) - Engine started, the snapshot was marked as ILLEGAL - Tried to live merge the same snapshot again, got [1] in vdsm and the snapshot got deleted successfully [1] jsonrpc.Executor/5::ERROR::2016-05-19 19:21:21,447::dispatcher::76::Storage.Dispatcher::(wrapper) {'status': {'message': "Volume does not exist: (u'0eae32f6-0952-43d9-8216-95d23590de12',)", 'code': 201}} Verified using: vdsm-4.17.28-0.el7ev.noarch rhevm-3.6.6.2-0.1.el6.noarch ovirt-hosted-engine-setup-1.3.6.1-1.el7ev.noarch Ala, is there anything we need to document here, or is it documented elsewhere? Please either provide the doctext, or comment with the BZ tracking the doc text and set requires-doctext-. BZ 1323629 documents the behavior Did this fix make the 3.6.6 (2016-05-23) release? Yes it did |