Bug 1349950
Summary: | Snapshot deletion failed: Merge failed | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | nicolas | ||||||
Component: | BLL.Storage | Assignee: | Ala Hino <ahino> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Raz Tamir <ratamir> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.6.6 | CC: | amureini, bugs, mgoldboi, mperina, nsoffer, ratamir, ylavi | ||||||
Target Milestone: | ovirt-3.6.7 | Flags: | rule-engine:
ovirt-3.6.z+
ylavi: planning_ack+ rule-engine: devel_ack+ rule-engine: testing_ack+ |
||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-07-04 12:30:19 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
nicolas
2016-06-24 15:47:18 UTC
Created attachment 1171987 [details]
Engine + SPM VDSM logs
Nir, is that infra or storage? dup of Bug 1344479 - Live Merge completes but the 'job' table entry in the database is not marked as finished ? (In reply to Moran Goldboim from comment #3) > dup of Bug 1344479 - Live Merge completes but the 'job' table entry in the > database is not marked as finished ? No. This BZ is related to the work done in live merge recovery area in 3.6.6: enable the user to retry live merge after failures. see BZ 1323629 for documentation. I am moving this BZ to ON_QA in order to QA to keep me honest here. Would like to verify the following: 1. prepare a long live merge, i.e. copy large amount of data 2. create a snapshot 3. delete that snapshot (while the VM is up) 4. stop vdsm while merge is running 5. wait for merge to complete (with error) on the engine 6. start vdsm 7. retry merge 8. assuming job is still running, merge will fail 9. wait for job to complete 10. retry merge, merge should succeed Hi Ala, I followed the steps you provided and got ERROR in the engine: 2016-06-27 11:43:31,057 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] Failed in 'MergeVDS' method 2016-06-27 11:43:31,066 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-2) [5c46b46f] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_3 command failed: Drive image file could not be found 2016-06-27 11:43:31,066 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=13, message=Drive image file could not be found]]' 2016-06-27 11:43:31,067 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] HostName = host_mixed_3 2016-06-27 11:43:31,068 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{runAsync='true', hostId='f5b87851-6c15-4552-b349-cc4c297bc30b', vmId='8ee7b94f-b1a3-4f54-9794-37e8259371fe', storagePoolId='7a2d027d-f58b-4645-8339-99c324fcfcd6', storageDomainId='a3f7bd63-2ffd-40e5-be72-d44ea1534cb1', imageGroupId='4b97a780-5529-4187-aaa7-564b14c00cb6', imageId='0cb90c2e-3a2f-4000-9432-d82a3aab97cb', baseImageId='8c63eae4-43c5-4cb9-9242-cd6634571c6c', topImageId='0cb90c2e-3a2f-4000-9432-d82a3aab97cb', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 2016-06-27 11:43:31,070 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] FINISH, MergeVDSCommand, log id: 450cd3bb 2016-06-27 11:43:31,071 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-2) [5c46b46f] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 (Failed with error imageErr and code 13) And from vdsm.log: jsonrpc.Executor/5::ERROR::2016-06-27 11:45:34,199::task::866::Storage.TaskManager.Task::(_setError) Task=`f445fe59-5 476-48db-87e8-130497810da9`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3162, in getVolumeInfo volUUID=volUUID).getInfo() File "/usr/share/vdsm/storage/sd.py", line 457, in produceVolume volUUID) File "/usr/share/vdsm/storage/blockVolume.py", line 80, in __init__ volume.Volume.__init__(self, repoPath, sdUUID, imgUUID, volUUID) File "/usr/share/vdsm/storage/volume.py", line 187, in __init__ self.validate() File "/usr/share/vdsm/storage/blockVolume.py", line 88, in validate raise se.VolumeDoesNotExist(self.volUUID) VolumeDoesNotExist: Volume does not exist: ('0cb90c2e-3a2f-4000-9432-d82a3aab97cb',) new logs attached Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Created attachment 1172802 [details]
new logs
Hi Raz, This exception is expected. This is what happened: We started a merge (of Snapshot 'snap' deletion for VM 'test_vm') that failed. Then, we retried merge but meanwhile, merge job *completed* (at storage/vdsm side) and we got that imageErr (indicating base image doesn't exists). The engine in 3.6.6, handles that error by continuing the flow and as can be seen in the log, the snapshot removed: 2016-06-27 11:45:49,192 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-1) [41b38968] Correlation ID: 41b38968, Job ID: ca381f80-90f9-414f-9701-843c31e5bb1d, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'snap' deletion for VM 'test_vm' has been completed. Can you please confirm that snapshot indeed removed after retrying live merge? Yes, the snapshot removed eventually According to comment #9 moving to VERIFIED |