Hide Forgot
Created attachment 1137063 [details] engine.log Description of problem: We have a source NFS backend where we have only one disk (+ 2 OFV_STORE), and we want to either copy or move it to a iSCSI based backend. We init the task and after a while it fails with these events: User admin@internal finished with error copying disk local-disk to domain iscsi02. VDSM command failed: Image does not exist in domain: u'image=e828ae39-6154-4e49-8258-3f11b33ffc57, domain=b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8' VDSM host04.domain.com command failed: low level Image copy failed Version-Release number of selected component (if applicable): How reproducible: Always, the difference is that on each attempt the second line of errors above the image UUID changes. Steps to Reproduce: 1. Go to 'Disks' tab 2. Find the disk currently located in the NFS backend. 3. Click on copy/move and choose the iSCSI backend as target. 4. Click on OK Actual results: Starts copying/moving and after a while errors above show up Additional info: 1) engine.log attached 2) vdsm.log of SPM host attached 3) Directory tree of the NFS backend is: . ├── 4f1659ee-652a-49b0-98b3-d4bcc0f99132 │ ├── dom_md │ │ ├── ids │ │ ├── inbox │ │ ├── leases │ │ ├── metadata │ │ └── outbox │ └── images │ ├── 8be3f486-dc5e-419b-b2a5-9fd365db698c │ │ ├── 8e206348-3efe-4d93-b427-44adb2353516 │ │ ├── 8e206348-3efe-4d93-b427-44adb2353516.lease │ │ └── 8e206348-3efe-4d93-b427-44adb2353516.meta │ ├── a017fe48-17b3-4786-83d5-bb5e0a31e1bf │ │ ├── 5e34b99f-ee39-4c8c-b3c3-f4694768e969 │ │ ├── 5e34b99f-ee39-4c8c-b3c3-f4694768e969.lease │ │ └── 5e34b99f-ee39-4c8c-b3c3-f4694768e969.meta │ └── ade17b46-d54f-40d5-bc77-d3cd7cac1df0 │ ├── e865c309-5cef-4bb7-b126-4ace8f45ee11 │ ├── e865c309-5cef-4bb7-b126-4ace8f45ee11.lease │ └── e865c309-5cef-4bb7-b126-4ace8f45ee11.meta ├── __DIRECT_IO_TEST__ └── lost+found Space usage per directory: 1,1M 8be3f486-dc5e-419b-b2a5-9fd365db698c 4,2G a017fe48-17b3-4786-83d5-bb5e0a31e1bf 1,1M ade17b46-d54f-40d5-bc77-d3cd7cac1df0 I'm assuming 8be3f486-dc5e-419b-b2a5-9fd365db698c and ade17b46-d54f-40d5-bc77-d3cd7cac1df0 are OVF_STORE, so a017fe48-17b3-4786-83d5-bb5e0a31e1bf must be the disk.
Created attachment 1137064 [details] vdsm.log
Forgot to mention: ovirt engine version: 3.6.3.4-1
The relevant part of the log sounds awefuly familiar: CopyImageError: low level Image copy failed: ("ecode=1, stdout=[], stderr=['qemu-img: error while writing sector 39219200: No space left on device'], message=None",) What version of VDSM are you using? Also, has this disk ever been live merged?
(In reply to Allon Mureinik from comment #3) > The relevant part of the log sounds awefuly familiar: > CopyImageError: low level Image copy failed: ("ecode=1, stdout=[], > stderr=['qemu-img: error while writing sector 39219200: No space left on > device'], message=None",) One would expect this very specific message to be shown to the user - is there a specific error code we can propagate this to the engine?
VDSM version is 4.17.23. This disk was created by copying an original disk to an iSCSI-based destination storage domain (background bug #1314959 ). Then moved to a file-based storage NFS (where the disk seems to be currently), and then *theoretically* moved back to iSCSI - however, this last moving seems not to have happened, though it didn't fail. The other strange thing is that there's plenty of space on both source and destination datastores, 44GB and 1TB of free space respectively.
This looks eerily familiar, but last time I saw it was due to Live Merge misbehaving. Nir, as this week's contact, can you take a look please? Thanks!
Just FWIW, this is a storage backend we used temporarily to migrate our former storage infrastructure to a new one, we don't use it anymore but I've left it configured as it was so I can make any additional tests if needed.
Indeed a017fe48-17b3-4786-83d5-bb5e0a31e1bf is the disk that you tried to copy. I tried to reproduce it but it didn't work out. Did you try copy the disk when the VM was up?
I tried both when the VM was up and down, with the same result. Truth be told, I've used this intermediary backend for moving about 100 images and all went smoothly except this one. I guess this must be an unusual case, or maybe the image is corrupt... If you haven't received more bug reports like this we can close it and reopen it if more cases are found.
OK, let's close for now until we have a reproducer.