Bug 1781682 - Create snapshots problems with a unique vm
Summary: Create snapshots problems with a unique vm
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backup-Restore.VMs
Version: 4.3.3.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Benny Zlotnik
QA Contact: Avihai
bugs@ovirt.org
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-10 11:40 UTC by francisco.garcia
Modified: 2019-12-29 08:34 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-12-29 08:34:29 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
Partial log of engine when I do a snapshot (59.24 KB, text/plain)
2019-12-10 11:40 UTC, francisco.garcia
no flags Details
Partial log of engine when I execute the virtual machine (40.42 KB, text/plain)
2019-12-10 11:41 UTC, francisco.garcia
no flags Details
vdsm.log (7.50 MB, text/plain)
2019-12-11 15:57 UTC, francisco.garcia
no flags Details
Full engine.log (319.77 KB, text/plain)
2019-12-11 15:57 UTC, francisco.garcia
no flags Details
vdsm-tool of iSCSI Storage (28.32 KB, text/plain)
2019-12-12 18:30 UTC, francisco.garcia
no flags Details
vdsm-tool NFS Storage (31.80 KB, text/plain)
2019-12-12 18:31 UTC, francisco.garcia
no flags Details
update.json (320 bytes, text/plain)
2019-12-16 11:33 UTC, francisco.garcia
no flags Details
get Info volume (1.15 KB, text/plain)
2019-12-16 11:33 UTC, francisco.garcia
no flags Details
HostJobs (416 bytes, text/plain)
2019-12-16 11:34 UTC, francisco.garcia
no flags Details
FindOlderLogs.txt (1.22 KB, text/plain)
2019-12-16 11:34 UTC, francisco.garcia
no flags Details
vdsm-update-volume log (351.30 KB, text/plain)
2019-12-16 11:35 UTC, francisco.garcia
no flags Details
Before/after change of the volume's metadata (956 bytes, text/plain)
2019-12-28 20:23 UTC, francisco.garcia
no flags Details

Description francisco.garcia 2019-12-10 11:40:41 UTC
Created attachment 1643633 [details]
Partial log of engine when I do a snapshot

Description of problem:
I am not able to do a snapshot to a specific VM. This VM has several disks, these are allocated in different storages and with different formats:
 - 4 Gb in NFS storage and THICK(RAW) format.
 - 1 Gb in iSCSI storage and THICK(RAW) format.
 - 1 Gb in iSCSI storage and THIN(COW) format.
 - 1 Gb in NFS storage and THIN(COW) format.

A week before, I could do a snapshot without problems. But now I can't it. I do not explain. Therefore I have been researching with the engine log and it seems that there is a problem with the volumes that are created, but I cannot find the exact cause and its solution. The volumes of disks aren't broken, so I can execute this VM without problems, and create files within it.

I have attached the portion of engine.log when I do a snapshot to this VM, and also I have attached when I execute it. Please, if you want any information, ask me it.


How reproducible:
I don't know how this point has been reached.


Actual results:
Error when create a snapshot with this VM.


Expected results:
Create a snapshot correctly with this VM.

Comment 1 francisco.garcia 2019-12-10 11:41:51 UTC
Created attachment 1643634 [details]
Partial log of engine when I execute the virtual machine

Comment 2 RHEL Program Management 2019-12-11 11:56:29 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 3 francisco.garcia 2019-12-11 15:19:27 UTC
Sorry, I activated it by mistake. I did not mean it was an issue of documentation. It's OK now?

Comment 4 Benny Zlotnik 2019-12-11 15:34:02 UTC
Please provide full engine and vdsm logs (SPM and the host where the VM is running)

Comment 5 francisco.garcia 2019-12-11 15:57:05 UTC
Created attachment 1644019 [details]
vdsm.log

Log of vdsm in hosts where vm is running

Comment 6 francisco.garcia 2019-12-11 15:57:45 UTC
Created attachment 1644020 [details]
Full engine.log

Full engine.log

Comment 7 francisco.garcia 2019-12-11 16:02:43 UTC
I attached full vdsm and engine logs. If you need more info or apply a action on the virtual machine, please ask me it.

Comment 8 Benny Zlotnik 2019-12-12 09:29:16 UTC
I see 
2019-12-11 16:50:41,941+0100 ERROR (tasks/3) [storage.Image] There is no leaf in the image 71c430cc-17e0-471d-827f-02703c1c318d (image:209)
2019-12-11 16:50:41,941+0100 WARN  (tasks/3) [storage.ResourceManager] Resource factory failed to create resource '01_img_adb25706-a077-443a-b0df-69721cfc9957.71c430cc-17e0-471d-827f-02703c1c318d'. Canceling request. (resourceManager:544)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 540, in registerResource
    obj = namespaceObj.factory.createResource(name, lockType)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource
    lockType)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList
    imgUUID=resourceName)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 210, in getChain
    raise se.ImageIsNotLegalChain(imgUUID)
ImageIsNotLegalChain: Image is not a legal chain: (u'71c430cc-17e0-471d-827f-02703c1c318d',)

Looks like the chain may have been corrupted previously, can you share the output of:
`vdsm-tool dump-volum-chains <sd-id>`

Where <sd-id> is the storage domain where the snapshot's disk resides

Comment 9 francisco.garcia 2019-12-12 18:30:16 UTC
Thanks for your fast answer :)

I attached two outputs(one for each storage domain that share the disks of vm).

By the by, the command was: `vdsm-tool dump-volumE-chains <sd-id>` jejeje

Comment 10 francisco.garcia 2019-12-12 18:30:44 UTC
Created attachment 1644535 [details]
vdsm-tool of iSCSI Storage

Comment 11 francisco.garcia 2019-12-12 18:31:15 UTC
Created attachment 1644536 [details]
vdsm-tool NFS Storage

Comment 12 francisco.garcia 2019-12-12 18:32:40 UTC
The Image is in NFS Storage.

Comment 13 Benny Zlotnik 2019-12-15 09:07:17 UTC
   image:    71c430cc-17e0-471d-827f-02703c1c318d

             - 48b7b1dd-41f6-4259-a996-9038ad253d11
               status: OK, voltype: INTERNAL, format: RAW, legality: LEGAL, type: SPARSE, capacity: 1073741824, truesize: 222797824

looks like voltype is INTERNAL while it should be LEAF, since this is the only volume in the chain. This likely indicates a botched snapshot removal.
I don't see any indication of a snapshot removal in the logs though.

If you have older logs, lookup errors related to image 71c430cc-17e0-471d-827f-02703c1c318d
Theoretically, you could edit the volume's voltype, using `vdsm-client SDM update_volume`[1], but I am not sure what is the state of the volume, so hard to tell if it would work.
If you choose to do this, use it with caution and make sure you have everything backed up.




[1] https://bugzilla.redhat.com/show_bug.cgi?id=1409380

Comment 14 francisco.garcia 2019-12-16 11:33:05 UTC
Created attachment 1645549 [details]
update.json

JSON to edit voltype

Comment 15 francisco.garcia 2019-12-16 11:33:38 UTC
Created attachment 1645550 [details]
get Info volume

Comment 16 francisco.garcia 2019-12-16 11:34:15 UTC
Created attachment 1645551 [details]
HostJobs

Comment 17 francisco.garcia 2019-12-16 11:34:55 UTC
Created attachment 1645552 [details]
FindOlderLogs.txt

Comment 18 francisco.garcia 2019-12-16 11:35:57 UTC
Created attachment 1645553 [details]
vdsm-update-volume log

Comment 19 francisco.garcia 2019-12-16 11:41:43 UTC
Thanks for your help and your time.

In first hand, I have checked the older logs, but I didn't find any errors messages related to image: 71c430cc-17e0-471d-827f-02703c1c318d (Attach: FindOlderLogs.txt).

I have found several warnings messages with this format:
engine.log-20191121:
    2019-11-20 16:12:41,235+01 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-45) [8fea739a-f977-4281-9ebd-e97fe646d4cf] Trying to release a shared lock for key: '71c430cc-17e0-471d-827f-02703c1c318dDISK' , but lock does not exist


In other hand, I have tried edit the volume's voltype with the method of [1], but it raise a Exception:
2019-12-16 11:35:50,681+0100 ERROR (tasks/9) [storage.Image] There is no leaf in the image 71c430cc-17e0-471d-827f-02703c1c318d (image:209)
2019-12-16 11:35:50,681+0100 WARN  (tasks/9) [storage.ResourceManager] Resource factory failed to create resource '01_img_adb25706-a077-443a-b0df-69721cfc9957.71c430cc-17e0-471d-827f-02703c1c318d'. Canceling request. (resourceManager:544)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 540, in registerResource
    obj = namespaceObj.factory.createResource(name, lockType)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource
    lockType)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList
    imgUUID=resourceName)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 210, in getChain
    raise se.ImageIsNotLegalChain(imgUUID)



I attach several files:
 - update.json -> File to edit volume.
 - vdsm-update-volume log -> Log of vdsm when I applied: vdsm-client -f update.json SDM update_volume
 - HostJobs -> Get Jobs of host where is allocated the image.
 - get Info volume -> Info of volume with image 71c430cc-17e0-471d-827f-02703c1c318d

Any idea with this error?

Comment 20 Benny Zlotnik 2019-12-16 13:48:59 UTC
If that's the case I can only suggest directly editing the volume's metadata, it should be 48b7b1dd-41f6-4259-a996-9038ad253d11.meta file, but again, make sure you have backups

Comment 21 francisco.garcia 2019-12-28 20:23:14 UTC
Thanks for your help and your time. Sorry by delay.

I have edited directly the volume's metadata, and... It's works! =D

I attached the before/after edit of the volume's metadata.

One time more, thank you for your help =) although I don't understand very well the change I have made.

Comment 22 francisco.garcia 2019-12-28 20:23:58 UTC
Created attachment 1648256 [details]
Before/after change of the volume's metadata

Comment 23 Benny Zlotnik 2019-12-29 08:34:29 UTC
(In reply to francisco.garcia from comment #21)
> Thanks for your help and your time. Sorry by delay.
> 
> I have edited directly the volume's metadata, and... It's works! =D
> 
> I attached the before/after edit of the volume's metadata.
> 
> One time more, thank you for your help =) although I don't understand very
> well the change I have made.

Glad to hear it worked!

The simple explanation is that an attempted snapshot removal (which we have no logs of), corrupted the chain.
The chain consists of a single volume but it in its metadata its type is INTERNAL, which means it's a volume with a parent and a child, but it isn't correct in practice,
since it only has a parent. The last volume in the chain has a type LEAF, and thus vdsm block the operation because the chain didn't make sense, so you had to manually fix the metadata.

Since the issue is resolved, I will be closing this bug.


Note You need to log in before you can comment on or make changes to this bug.