Created attachment 1643633 [details] Partial log of engine when I do a snapshot Description of problem: I am not able to do a snapshot to a specific VM. This VM has several disks, these are allocated in different storages and with different formats: - 4 Gb in NFS storage and THICK(RAW) format. - 1 Gb in iSCSI storage and THICK(RAW) format. - 1 Gb in iSCSI storage and THIN(COW) format. - 1 Gb in NFS storage and THIN(COW) format. A week before, I could do a snapshot without problems. But now I can't it. I do not explain. Therefore I have been researching with the engine log and it seems that there is a problem with the volumes that are created, but I cannot find the exact cause and its solution. The volumes of disks aren't broken, so I can execute this VM without problems, and create files within it. I have attached the portion of engine.log when I do a snapshot to this VM, and also I have attached when I execute it. Please, if you want any information, ask me it. How reproducible: I don't know how this point has been reached. Actual results: Error when create a snapshot with this VM. Expected results: Create a snapshot correctly with this VM.
Created attachment 1643634 [details] Partial log of engine when I execute the virtual machine
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
Sorry, I activated it by mistake. I did not mean it was an issue of documentation. It's OK now?
Please provide full engine and vdsm logs (SPM and the host where the VM is running)
Created attachment 1644019 [details] vdsm.log Log of vdsm in hosts where vm is running
Created attachment 1644020 [details] Full engine.log Full engine.log
I attached full vdsm and engine logs. If you need more info or apply a action on the virtual machine, please ask me it.
I see 2019-12-11 16:50:41,941+0100 ERROR (tasks/3) [storage.Image] There is no leaf in the image 71c430cc-17e0-471d-827f-02703c1c318d (image:209) 2019-12-11 16:50:41,941+0100 WARN (tasks/3) [storage.ResourceManager] Resource factory failed to create resource '01_img_adb25706-a077-443a-b0df-69721cfc9957.71c430cc-17e0-471d-827f-02703c1c318d'. Canceling request. (resourceManager:544) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 540, in registerResource obj = namespaceObj.factory.createResource(name, lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList imgUUID=resourceName) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 210, in getChain raise se.ImageIsNotLegalChain(imgUUID) ImageIsNotLegalChain: Image is not a legal chain: (u'71c430cc-17e0-471d-827f-02703c1c318d',) Looks like the chain may have been corrupted previously, can you share the output of: `vdsm-tool dump-volum-chains <sd-id>` Where <sd-id> is the storage domain where the snapshot's disk resides
Thanks for your fast answer :) I attached two outputs(one for each storage domain that share the disks of vm). By the by, the command was: `vdsm-tool dump-volumE-chains <sd-id>` jejeje
Created attachment 1644535 [details] vdsm-tool of iSCSI Storage
Created attachment 1644536 [details] vdsm-tool NFS Storage
The Image is in NFS Storage.
image: 71c430cc-17e0-471d-827f-02703c1c318d - 48b7b1dd-41f6-4259-a996-9038ad253d11 status: OK, voltype: INTERNAL, format: RAW, legality: LEGAL, type: SPARSE, capacity: 1073741824, truesize: 222797824 looks like voltype is INTERNAL while it should be LEAF, since this is the only volume in the chain. This likely indicates a botched snapshot removal. I don't see any indication of a snapshot removal in the logs though. If you have older logs, lookup errors related to image 71c430cc-17e0-471d-827f-02703c1c318d Theoretically, you could edit the volume's voltype, using `vdsm-client SDM update_volume`[1], but I am not sure what is the state of the volume, so hard to tell if it would work. If you choose to do this, use it with caution and make sure you have everything backed up. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1409380
Created attachment 1645549 [details] update.json JSON to edit voltype
Created attachment 1645550 [details] get Info volume
Created attachment 1645551 [details] HostJobs
Created attachment 1645552 [details] FindOlderLogs.txt
Created attachment 1645553 [details] vdsm-update-volume log
Thanks for your help and your time. In first hand, I have checked the older logs, but I didn't find any errors messages related to image: 71c430cc-17e0-471d-827f-02703c1c318d (Attach: FindOlderLogs.txt). I have found several warnings messages with this format: engine.log-20191121: 2019-11-20 16:12:41,235+01 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-45) [8fea739a-f977-4281-9ebd-e97fe646d4cf] Trying to release a shared lock for key: '71c430cc-17e0-471d-827f-02703c1c318dDISK' , but lock does not exist In other hand, I have tried edit the volume's voltype with the method of [1], but it raise a Exception: 2019-12-16 11:35:50,681+0100 ERROR (tasks/9) [storage.Image] There is no leaf in the image 71c430cc-17e0-471d-827f-02703c1c318d (image:209) 2019-12-16 11:35:50,681+0100 WARN (tasks/9) [storage.ResourceManager] Resource factory failed to create resource '01_img_adb25706-a077-443a-b0df-69721cfc9957.71c430cc-17e0-471d-827f-02703c1c318d'. Canceling request. (resourceManager:544) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 540, in registerResource obj = namespaceObj.factory.createResource(name, lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList imgUUID=resourceName) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 210, in getChain raise se.ImageIsNotLegalChain(imgUUID) I attach several files: - update.json -> File to edit volume. - vdsm-update-volume log -> Log of vdsm when I applied: vdsm-client -f update.json SDM update_volume - HostJobs -> Get Jobs of host where is allocated the image. - get Info volume -> Info of volume with image 71c430cc-17e0-471d-827f-02703c1c318d Any idea with this error?
If that's the case I can only suggest directly editing the volume's metadata, it should be 48b7b1dd-41f6-4259-a996-9038ad253d11.meta file, but again, make sure you have backups
Thanks for your help and your time. Sorry by delay. I have edited directly the volume's metadata, and... It's works! =D I attached the before/after edit of the volume's metadata. One time more, thank you for your help =) although I don't understand very well the change I have made.
Created attachment 1648256 [details] Before/after change of the volume's metadata
(In reply to francisco.garcia from comment #21) > Thanks for your help and your time. Sorry by delay. > > I have edited directly the volume's metadata, and... It's works! =D > > I attached the before/after edit of the volume's metadata. > > One time more, thank you for your help =) although I don't understand very > well the change I have made. Glad to hear it worked! The simple explanation is that an attempted snapshot removal (which we have no logs of), corrupted the chain. The chain consists of a single volume but it in its metadata its type is INTERNAL, which means it's a volume with a parent and a child, but it isn't correct in practice, since it only has a parent. The last volume in the chain has a type LEAF, and thus vdsm block the operation because the chain didn't make sense, so you had to manually fix the metadata. Since the issue is resolved, I will be closing this bug.