Bug 967495 - [vdsm] 'CopyImageError: low level Image copy failed' during copyImage on nfs vers=2
[vdsm] 'CopyImageError: low level Image copy failed' during copyImage on nfs ...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.2.0
x86_64 Unspecified
unspecified Severity unspecified
: ---
: 3.2.1
Assigned To: Nobody's working on this, feel free to take it
storage
:
Depends On:
Blocks: 967628
  Show dependency treegraph
 
Reported: 2013-05-27 05:49 EDT by Elad
Modified: 2016-02-10 12:51 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 967628 (view as bug list)
Environment:
Last Closed: 2013-05-28 11:04:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (855.31 KB, application/x-gzip)
2013-05-27 05:49 EDT, Elad
no flags Details

  None (edit)
Description Elad 2013-05-27 05:49:17 EDT
Created attachment 753535 [details]
logs

Description of problem:
copyImage is failing with 'low level Image copy failed' 

Version-Release number of selected component (if applicable):
vdsm-4.10.2-21.0.el6ev.x86_64
rhevm-3.2.0-11.28.el6ev.noarch

How reproducible:
50%

Steps to Reproduce:
1. on NFS pool:
2. create template or export vm/template to an export domain 

Actual results:

6ce4c345-3ff8-4223-8c30-bc427909864c::ERROR::2013-05-27 12:23:04,925::task::850::TaskManager.Task::(_setError) Task=`6ce4c345-3ff8-4223-8c30-bc427909864c`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1708, in copyImage
    volType, volFormat, preallocate, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 862, in copy
    raise se.CopyImageError(str(e))
CopyImageError: low level Image copy failed: ('General Storage Exception: (\'rc: 1, err: ["qemu-img: Could not open \\\'/rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/i
mages/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d\\\'", "qemu-img: Could not open \\\'/rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/image
s/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d\\\'"]\',)',)



Additional info: logs
Comment 3 Allon Mureinik 2013-05-27 07:26:15 EDT
Elad,

Was is the impact of the failure?
Can it be retried later?
Is load a factor here? Is size a factor?
Does this reproduce only on NFS?
Did you try a different NFS server?
Comment 4 Elad 2013-05-27 08:27:32 EDT
(In reply to Allon Mureinik from comment #3)

> Was is the impact of the failure?
user fails in create template or in copy image to an export domain
> Can it be retried later? in case of copy the image to an export domain, there is no roll back and user cannot delete the image from the domain.
> Is load a factor here? Is size a factor? no
> Does this reproduce only on NFS? so far, yes
> Did you try a different NFS server?  so far, I've managed to reproduceonly on one storage server.
Comment 5 Allon Mureinik 2013-05-27 10:52:15 EDT
(In reply to Elad from comment #4)
> (In reply to Allon Mureinik from comment #3)
> 
> > Was is the impact of the failure?
> user fails in create template or in copy image to an export domain
And if he tries again, does it fail again?

Also, please answer the following?
> > Can it be retried later? in case of copy the image to an export domain, there is no roll back and user cannot delete the image from the domain.
> > Is load a factor here? Is size a factor? no
> > Does this reproduce only on NFS? so far, yes
> > Did you try a different NFS server?  so far, I've managed to reproduceonly on one storage server.
Comment 6 Allon Mureinik 2013-05-27 10:53:14 EDT
(In reply to Allon Mureinik from comment #5)
> (In reply to Elad from comment #4)
> > (In reply to Allon Mureinik from comment #3)
> > 
> > > Was is the impact of the failure?
> > user fails in create template or in copy image to an export domain
> And if he tries again, does it fail again?
> 
> Also, please answer the following?
> > > Can it be retried later? in case of copy the image to an export domain, there is no roll back and user cannot delete the image from the domain.
> > > Is load a factor here? Is size a factor? no
> > > Does this reproduce only on NFS? so far, yes
> > > Did you try a different NFS server?  so far, I've managed to reproduceonly on one storage server.

Disregard, because of BZ's color scheme, I did not notice you answered inline
Comment 7 Federico Simoncelli 2013-05-27 10:56:51 EDT
Looking at the logs I couldn't spot anything unusual about the flow:

Thread-61280::INFO::2013-05-27 12:22:25,522::logUtils::40::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='69b4335c-0e57-4384-a173-6fef05302a20', spUUID='121b58e3-71e3-4138-a934-543b3ad0328b', imgUUID='8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', size='3221225472', volFormat=5, preallocate=2, diskType=2, volUUID='332b9176-8d8a-4df6-93df-27b2d8d7040d', desc='', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')

The volume was healthy up until the copyImage request:

Thread-61304::INFO::2013-05-27 12:22:56,844::logUtils::40::dispatcher::(wrapper) Run and protect: getVolumeInfo(sdUUID='69b4335c-0e57-4384-a173-6fef05302a20', spUUID='121b58e3-71e3-4138-a934-543b3ad0328b', imgUUID='8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', volUUID='332b9176-8d8a-4df6-93df-27b2d8d7040d', options=None)
Thread-61304::INFO::2013-05-27 12:22:56,865::logUtils::42::dispatcher::(wrapper) Run and protect: getVolumeInfo, Return response: {'info': {'status': 'OK', 'domain': '69b4335c-0e57-4384-a173-6fef05302a20', 'voltype': 'LEAF', 'description': '', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'image': '8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', 'ctime': '1369646547', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '1369646547', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'capacity': '3221225472', 'uuid': '332b9176-8d8a-4df6-93df-27b2d8d7040d', 'truesize': '0', 'type': 'SPARSE'}}
Thread-61304::DEBUG::2013-05-27 12:22:56,866::task::1168::TaskManager.Task::(prepare) Task=`cf8b8fe6-f74d-46f7-8cbd-1920cb1c3459`::finished: {'info': {'status': 'OK', 'domain': '69b4335c-0e57-4384-a173-6fef05302a20', 'voltype': 'LEAF', 'description': '', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'image': '8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', 'ctime': '1369646547', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '1369646547', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'capacity': '3221225472', 'uuid': '332b9176-8d8a-4df6-93df-27b2d8d7040d', 'truesize': '0', 'type': 'SPARSE'}}

Thread-61305::INFO::2013-05-27 12:22:56,929::logUtils::40::dispatcher::(wrapper) Run and protect: copyImage(sdUUID='69b4335c-0e57-4384-a173-6fef05302a20', spUUID='121b58e3-71e3-4138-a934-543b3ad0328b', vmUUID=''
, srcImgUUID='8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', srcVolUUID='332b9176-8d8a-4df6-93df-27b2d8d7040d', dstImgUUID='31ab876b-62e5-417b-9abd-c4317d23674a', dstVolUUID='8b405965-2bb8-4ef9-84b7-c2aecd06b97b', descr
iption='Active VM', dstSdUUID='69b4335c-0e57-4384-a173-6fef05302a20', volType=6, volFormat=5, preallocate=2, postZero='false', force='false')
...
6ce4c345-3ff8-4223-8c30-bc427909864c::INFO::2013-05-27 12:23:04,556::volume::709::Storage.Volume::(prepare) Volume: preparing volume 69b4335c-0e57-4384-a173-6fef05302a20/332b9176-8d8a-4df6-93df-27b2d8d7040d
6ce4c345-3ff8-4223-8c30-bc427909864c::INFO::2013-05-27 12:23:04,561::volume::709::Storage.Volume::(prepare) Volume: preparing volume 69b4335c-0e57-4384-a173-6fef05302a20/8b405965-2bb8-4ef9-84b7-c2aecd06b97b
6ce4c345-3ff8-4223-8c30-bc427909864c::DEBUG::2013-05-27 12:23:04,616::volume::1012::Storage.Volume::(qemuConvert) (qemuConvert): COPY /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d (raw) to /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/31ab876b-62e5-417b-9abd-c4317d23674a/8b405965-2bb8-4ef9-84b7-c2aecd06b97b (raw) START
6ce4c345-3ff8-4223-8c30-bc427909864c::DEBUG::2013-05-27 12:23:04,617::misc::83::Storage.Misc.excCmd::(<lambda>) '/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/qemu-img convert -t none -f raw /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d -O raw /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/31ab876b-62e5-417b-9abd-c4317d23674a/8b405965-2bb8-4ef9-84b7-c2aecd06b97b' (cwd None)
...
6ce4c345-3ff8-4223-8c30-bc427909864c::DEBUG::2013-05-27 12:23:04,921::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ["qemu-img: Could not open '/rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d'", "qemu-img: Could not open '/rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d'"]; <rc> = 1
6ce4c345-3ff8-4223-8c30-bc427909864c::DEBUG::2013-05-27 12:23:04,922::volume::1029::Storage.Volume::(qemuConvert) (qemuConvert): COPY /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d to /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/31ab876b-62e5-417b-9abd-c4317d23674a/8b405965-2bb8-4ef9-84b7-c2aecd06b97b DONE
6ce4c345-3ff8-4223-8c30-bc427909864c::ERROR::2013-05-27 12:23:04,922::image::874::Storage.Image::(copy) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 862, in copy
    raise se.CopyImageError(str(e))
Comment 10 Elad 2013-05-28 03:29:30 EDT
I was mange to reproduce:


708495b0-5532-45bd-86ad-8fc2aea15191::ERROR::2013-05-28 10:22:14,306::task::850::TaskManager.Task::(_setError) Task=`708495b0-5532-45bd-86ad-8fc2aea15191`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1708, in copyImage
    volType, volFormat, preallocate, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 862, in copy
    raise se.CopyImageError(str(e))
CopyImageError: low level Image copy failed: ('General Storage Exception: (\'rc: 1, err: ["qemu-img: Could not open \\\'/rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/i
mages/bf11ae0e-0cfc-47b6-8de9-73ebef1b64f2/b9327626-47e6-4648-b662-ee4087a7f080\\\'"]\',)',)


It was done with another host and another nfs storage server
Comment 11 Federico Simoncelli 2013-05-28 05:10:34 EDT
It looks like this is related to RAW volumes and qemu-img. I've been able to reproduce the issue manually:

# su -s /bin/bash vdsm
$ mkdir /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62
$ truncate -s 3221225472 /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89
$ chmod 664 /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89
$ hexdump -n 512 -C /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/2a371c42-d376-4d38-b634-d1b4d333010d/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200
$ hexdump -n 512 -C /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200
$ strace /usr/bin/qemu-img convert -t none -f raw /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/2a371c42-d376-4d38-b634-d1b4d333010d/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89 -O raw /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89
...
open("/rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89", O_RDWR|O_DIRECT|O_CLOEXEC) = 5
fstatfs(5, {f_type="NFS_SUPER_MAGIC", f_bsize=8192, f_blocks=129144408, f_bfree=13020403, f_bavail=6461051, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=8192}) = 0
lseek(5, 0, SEEK_END)                   = -1 EINVAL (Invalid argument)
write(2, "qemu-img: ", 10qemu-img: )              = 10
write(2, "Could not open '/rhev/data-cente"..., 189Could not open '/rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89') = 189

From lseek(2):
  EINVAL whence is not valid.  Or: the resulting file offset would be negative, or beyond the end of a seekable device.

I think we can move this bug to qemu-kvm. Technically we could use "dd" for raw copies but it doesn't handle sparse files (not until bug 908980 is closed). I also suggest to stick to one tool.
Comment 12 Federico Simoncelli 2013-05-28 05:15:11 EDT
I forgot to mention that this is specific to NFS, the same qemu-img command is successful on a regular file-system, so we can't exclude issues in the NFS stack (client/server).
Comment 13 Federico Simoncelli 2013-05-28 05:56:12 EDT
This is probably related to nfs version 2:

$ grep nfs_mount_options /etc/vdsm/vdsm.conf
nfs_mount_options = soft,nosharecache,vers=2

$ mount | grep elad5
orion.qa.lab.tlv.redhat.com:/export/elad5 on /rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_elad5 type nfs (rw,soft,nosharecache,vers=2,addr=10.35.64.10)

$ truncate -s 1999M test1.raw
$ /tmp/test_lseek test1.raw
lseek test1.raw = 2096103424 (errno = 0)

$ truncate -s 2G test1.raw
$ /tmp/test_lseek test1.raw
lseek test1.raw = -1 (errno = 22)

On nfs version 3:

$ truncate -s 2G test1.raw
$ /tmp/test_lseek test1.raw
lseek test1.raw = 2147483648 (errno = 0)
Comment 14 Federico Simoncelli 2013-05-28 10:55:04 EDT
NOTABUG, on nfs v2 the file size limit is 2Gb.
Comment 15 Elad 2013-05-28 11:04:11 EDT
Closing this as NOTABUG since of nfsv2 file size limitation.

Note You need to log in before you can comment on or make changes to this bug.