Bug 1318335 - Copying/moving from NFS to iSCSI backend fails: low level Image copy failed
Summary: Copying/moving from NFS to iSCSI backend fails: low level Image copy failed
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.3.3
Hardware: x86_64
OS: Linux
unspecified
unspecified vote
Target Milestone: ovirt-3.6.6
: ---
Assignee: Idan Shaby
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-16 14:40 UTC by nicolas
Modified: 2016-04-20 08:38 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-20 08:38:42 UTC
oVirt Team: Storage
amureini: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
engine.log (50.41 KB, text/plain)
2016-03-16 14:40 UTC, nicolas
no flags Details
vdsm.log (1.04 MB, text/plain)
2016-03-16 14:40 UTC, nicolas
no flags Details

Description nicolas 2016-03-16 14:40:14 UTC
Created attachment 1137063 [details]
engine.log

Description of problem:

We have a source NFS backend where we have only one disk (+ 2 OFV_STORE), and we want to either copy or move it to a iSCSI based backend. We init the task and after a while it fails with these events:

  User admin@internal finished with error copying disk local-disk to domain iscsi02.
  VDSM command failed: Image does not exist in domain: u'image=e828ae39-6154-4e49-8258-3f11b33ffc57, domain=b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8'
  VDSM host04.domain.com command failed: low level Image copy failed
Version-Release number of selected component (if applicable):

How reproducible:

Always, the difference is that on each attempt the second line of errors above the image UUID changes.

Steps to Reproduce:
1. Go to 'Disks' tab
2. Find the disk currently located in the NFS backend.
3. Click on copy/move and choose the iSCSI backend as target.
4. Click on OK

Actual results:

Starts copying/moving and after a while errors above show up

Additional info:

1) engine.log attached
2) vdsm.log of SPM host attached
3) Directory tree of the NFS backend is:

.
├── 4f1659ee-652a-49b0-98b3-d4bcc0f99132
│   ├── dom_md
│   │   ├── ids
│   │   ├── inbox
│   │   ├── leases
│   │   ├── metadata
│   │   └── outbox
│   └── images
│       ├── 8be3f486-dc5e-419b-b2a5-9fd365db698c
│       │   ├── 8e206348-3efe-4d93-b427-44adb2353516
│       │   ├── 8e206348-3efe-4d93-b427-44adb2353516.lease
│       │   └── 8e206348-3efe-4d93-b427-44adb2353516.meta
│       ├── a017fe48-17b3-4786-83d5-bb5e0a31e1bf
│       │   ├── 5e34b99f-ee39-4c8c-b3c3-f4694768e969
│       │   ├── 5e34b99f-ee39-4c8c-b3c3-f4694768e969.lease
│       │   └── 5e34b99f-ee39-4c8c-b3c3-f4694768e969.meta
│       └── ade17b46-d54f-40d5-bc77-d3cd7cac1df0
│           ├── e865c309-5cef-4bb7-b126-4ace8f45ee11
│           ├── e865c309-5cef-4bb7-b126-4ace8f45ee11.lease
│           └── e865c309-5cef-4bb7-b126-4ace8f45ee11.meta
├── __DIRECT_IO_TEST__
└── lost+found

Space usage per directory:

1,1M	8be3f486-dc5e-419b-b2a5-9fd365db698c
4,2G	a017fe48-17b3-4786-83d5-bb5e0a31e1bf
1,1M	ade17b46-d54f-40d5-bc77-d3cd7cac1df0

I'm assuming 8be3f486-dc5e-419b-b2a5-9fd365db698c and ade17b46-d54f-40d5-bc77-d3cd7cac1df0 are OVF_STORE, so a017fe48-17b3-4786-83d5-bb5e0a31e1bf must be the disk.

Comment 1 nicolas 2016-03-16 14:40:45 UTC
Created attachment 1137064 [details]
vdsm.log

Comment 2 nicolas 2016-03-16 14:41:49 UTC
Forgot to mention: ovirt engine version: 3.6.3.4-1

Comment 3 Allon Mureinik 2016-03-17 04:38:23 UTC
The relevant part of the log sounds awefuly familiar:
CopyImageError: low level Image copy failed: ("ecode=1, stdout=[], stderr=['qemu-img: error while writing sector 39219200: No space left on device'], message=None",)

What version of VDSM are you using?
Also, has this disk ever been live merged?

Comment 4 Yaniv Kaul 2016-03-17 07:51:53 UTC
(In reply to Allon Mureinik from comment #3)
> The relevant part of the log sounds awefuly familiar:
> CopyImageError: low level Image copy failed: ("ecode=1, stdout=[],
> stderr=['qemu-img: error while writing sector 39219200: No space left on
> device'], message=None",)

One would expect this very specific message to be shown to the user - is there a specific error code we can propagate this to the engine?

Comment 5 nicolas 2016-03-17 08:18:53 UTC
VDSM version is 4.17.23.

This disk was created by copying an original disk to an iSCSI-based destination storage domain (background bug #1314959 ). Then moved to a file-based storage NFS (where the disk seems to be currently), and then *theoretically* moved back to iSCSI - however, this last moving seems not to have happened, though it didn't fail.

The other strange thing is that there's plenty of space on both source and destination datastores, 44GB and 1TB of free space respectively.

Comment 6 Allon Mureinik 2016-03-21 10:10:41 UTC
This looks eerily familiar, but last time I saw it was due to Live Merge misbehaving.

Nir, as this week's contact, can you take a look please?
Thanks!

Comment 7 nicolas 2016-04-19 06:45:50 UTC
Just FWIW, this is a storage backend we used temporarily to migrate our former storage infrastructure to a new one, we don't use it anymore but I've left it configured as it was so I can make any additional tests if needed.

Comment 8 Idan Shaby 2016-04-19 11:03:53 UTC
Indeed a017fe48-17b3-4786-83d5-bb5e0a31e1bf is the disk that you tried to copy.
I tried to reproduce it but it didn't work out.
Did you try copy the disk when the VM was up?

Comment 9 nicolas 2016-04-19 18:41:01 UTC
I tried both when the VM was up and down, with the same result. Truth be told, I've used this intermediary backend for moving about 100 images and all went smoothly except this one.

I guess this must be an unusual case, or maybe the image is corrupt... If you haven't received more bug reports like this we can close it and reopen it if more cases are found.

Comment 10 Allon Mureinik 2016-04-20 08:38:42 UTC
OK, let's close for now until we have a reproducer.


Note You need to log in before you can comment on or make changes to this bug.