Created attachment 753535 [details] logs Description of problem: copyImage is failing with 'low level Image copy failed' Version-Release number of selected component (if applicable): vdsm-4.10.2-21.0.el6ev.x86_64 rhevm-3.2.0-11.28.el6ev.noarch How reproducible: 50% Steps to Reproduce: 1. on NFS pool: 2. create template or export vm/template to an export domain Actual results: 6ce4c345-3ff8-4223-8c30-bc427909864c::ERROR::2013-05-27 12:23:04,925::task::850::TaskManager.Task::(_setError) Task=`6ce4c345-3ff8-4223-8c30-bc427909864c`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 318, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1708, in copyImage volType, volFormat, preallocate, postZero, force) File "/usr/share/vdsm/storage/image.py", line 862, in copy raise se.CopyImageError(str(e)) CopyImageError: low level Image copy failed: ('General Storage Exception: (\'rc: 1, err: ["qemu-img: Could not open \\\'/rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/i mages/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d\\\'", "qemu-img: Could not open \\\'/rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/image s/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d\\\'"]\',)',) Additional info: logs
Elad, Was is the impact of the failure? Can it be retried later? Is load a factor here? Is size a factor? Does this reproduce only on NFS? Did you try a different NFS server?
(In reply to Allon Mureinik from comment #3) > Was is the impact of the failure? user fails in create template or in copy image to an export domain > Can it be retried later? in case of copy the image to an export domain, there is no roll back and user cannot delete the image from the domain. > Is load a factor here? Is size a factor? no > Does this reproduce only on NFS? so far, yes > Did you try a different NFS server? so far, I've managed to reproduceonly on one storage server.
(In reply to Elad from comment #4) > (In reply to Allon Mureinik from comment #3) > > > Was is the impact of the failure? > user fails in create template or in copy image to an export domain And if he tries again, does it fail again? Also, please answer the following? > > Can it be retried later? in case of copy the image to an export domain, there is no roll back and user cannot delete the image from the domain. > > Is load a factor here? Is size a factor? no > > Does this reproduce only on NFS? so far, yes > > Did you try a different NFS server? so far, I've managed to reproduceonly on one storage server.
(In reply to Allon Mureinik from comment #5) > (In reply to Elad from comment #4) > > (In reply to Allon Mureinik from comment #3) > > > > > Was is the impact of the failure? > > user fails in create template or in copy image to an export domain > And if he tries again, does it fail again? > > Also, please answer the following? > > > Can it be retried later? in case of copy the image to an export domain, there is no roll back and user cannot delete the image from the domain. > > > Is load a factor here? Is size a factor? no > > > Does this reproduce only on NFS? so far, yes > > > Did you try a different NFS server? so far, I've managed to reproduceonly on one storage server. Disregard, because of BZ's color scheme, I did not notice you answered inline
Looking at the logs I couldn't spot anything unusual about the flow: Thread-61280::INFO::2013-05-27 12:22:25,522::logUtils::40::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='69b4335c-0e57-4384-a173-6fef05302a20', spUUID='121b58e3-71e3-4138-a934-543b3ad0328b', imgUUID='8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', size='3221225472', volFormat=5, preallocate=2, diskType=2, volUUID='332b9176-8d8a-4df6-93df-27b2d8d7040d', desc='', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000') The volume was healthy up until the copyImage request: Thread-61304::INFO::2013-05-27 12:22:56,844::logUtils::40::dispatcher::(wrapper) Run and protect: getVolumeInfo(sdUUID='69b4335c-0e57-4384-a173-6fef05302a20', spUUID='121b58e3-71e3-4138-a934-543b3ad0328b', imgUUID='8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', volUUID='332b9176-8d8a-4df6-93df-27b2d8d7040d', options=None) Thread-61304::INFO::2013-05-27 12:22:56,865::logUtils::42::dispatcher::(wrapper) Run and protect: getVolumeInfo, Return response: {'info': {'status': 'OK', 'domain': '69b4335c-0e57-4384-a173-6fef05302a20', 'voltype': 'LEAF', 'description': '', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'image': '8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', 'ctime': '1369646547', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '1369646547', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'capacity': '3221225472', 'uuid': '332b9176-8d8a-4df6-93df-27b2d8d7040d', 'truesize': '0', 'type': 'SPARSE'}} Thread-61304::DEBUG::2013-05-27 12:22:56,866::task::1168::TaskManager.Task::(prepare) Task=`cf8b8fe6-f74d-46f7-8cbd-1920cb1c3459`::finished: {'info': {'status': 'OK', 'domain': '69b4335c-0e57-4384-a173-6fef05302a20', 'voltype': 'LEAF', 'description': '', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'image': '8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', 'ctime': '1369646547', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '1369646547', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'capacity': '3221225472', 'uuid': '332b9176-8d8a-4df6-93df-27b2d8d7040d', 'truesize': '0', 'type': 'SPARSE'}} Thread-61305::INFO::2013-05-27 12:22:56,929::logUtils::40::dispatcher::(wrapper) Run and protect: copyImage(sdUUID='69b4335c-0e57-4384-a173-6fef05302a20', spUUID='121b58e3-71e3-4138-a934-543b3ad0328b', vmUUID='' , srcImgUUID='8c6b8b29-8498-49c6-ad6f-65a27a3efbdb', srcVolUUID='332b9176-8d8a-4df6-93df-27b2d8d7040d', dstImgUUID='31ab876b-62e5-417b-9abd-c4317d23674a', dstVolUUID='8b405965-2bb8-4ef9-84b7-c2aecd06b97b', descr iption='Active VM', dstSdUUID='69b4335c-0e57-4384-a173-6fef05302a20', volType=6, volFormat=5, preallocate=2, postZero='false', force='false') ... 6ce4c345-3ff8-4223-8c30-bc427909864c::INFO::2013-05-27 12:23:04,556::volume::709::Storage.Volume::(prepare) Volume: preparing volume 69b4335c-0e57-4384-a173-6fef05302a20/332b9176-8d8a-4df6-93df-27b2d8d7040d 6ce4c345-3ff8-4223-8c30-bc427909864c::INFO::2013-05-27 12:23:04,561::volume::709::Storage.Volume::(prepare) Volume: preparing volume 69b4335c-0e57-4384-a173-6fef05302a20/8b405965-2bb8-4ef9-84b7-c2aecd06b97b 6ce4c345-3ff8-4223-8c30-bc427909864c::DEBUG::2013-05-27 12:23:04,616::volume::1012::Storage.Volume::(qemuConvert) (qemuConvert): COPY /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d (raw) to /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/31ab876b-62e5-417b-9abd-c4317d23674a/8b405965-2bb8-4ef9-84b7-c2aecd06b97b (raw) START 6ce4c345-3ff8-4223-8c30-bc427909864c::DEBUG::2013-05-27 12:23:04,617::misc::83::Storage.Misc.excCmd::(<lambda>) '/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/qemu-img convert -t none -f raw /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d -O raw /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/31ab876b-62e5-417b-9abd-c4317d23674a/8b405965-2bb8-4ef9-84b7-c2aecd06b97b' (cwd None) ... 6ce4c345-3ff8-4223-8c30-bc427909864c::DEBUG::2013-05-27 12:23:04,921::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ["qemu-img: Could not open '/rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d'", "qemu-img: Could not open '/rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d'"]; <rc> = 1 6ce4c345-3ff8-4223-8c30-bc427909864c::DEBUG::2013-05-27 12:23:04,922::volume::1029::Storage.Volume::(qemuConvert) (qemuConvert): COPY /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/8c6b8b29-8498-49c6-ad6f-65a27a3efbdb/332b9176-8d8a-4df6-93df-27b2d8d7040d to /rhev/data-center/121b58e3-71e3-4138-a934-543b3ad0328b/69b4335c-0e57-4384-a173-6fef05302a20/images/31ab876b-62e5-417b-9abd-c4317d23674a/8b405965-2bb8-4ef9-84b7-c2aecd06b97b DONE 6ce4c345-3ff8-4223-8c30-bc427909864c::ERROR::2013-05-27 12:23:04,922::image::874::Storage.Image::(copy) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 862, in copy raise se.CopyImageError(str(e))
I was mange to reproduce: 708495b0-5532-45bd-86ad-8fc2aea15191::ERROR::2013-05-28 10:22:14,306::task::850::TaskManager.Task::(_setError) Task=`708495b0-5532-45bd-86ad-8fc2aea15191`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 318, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1708, in copyImage volType, volFormat, preallocate, postZero, force) File "/usr/share/vdsm/storage/image.py", line 862, in copy raise se.CopyImageError(str(e)) CopyImageError: low level Image copy failed: ('General Storage Exception: (\'rc: 1, err: ["qemu-img: Could not open \\\'/rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/i mages/bf11ae0e-0cfc-47b6-8de9-73ebef1b64f2/b9327626-47e6-4648-b662-ee4087a7f080\\\'"]\',)',) It was done with another host and another nfs storage server
It looks like this is related to RAW volumes and qemu-img. I've been able to reproduce the issue manually: # su -s /bin/bash vdsm $ mkdir /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62 $ truncate -s 3221225472 /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89 $ chmod 664 /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89 $ hexdump -n 512 -C /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/2a371c42-d376-4d38-b634-d1b4d333010d/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000200 $ hexdump -n 512 -C /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000200 $ strace /usr/bin/qemu-img convert -t none -f raw /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/2a371c42-d376-4d38-b634-d1b4d333010d/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89 -O raw /rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89 ... open("/rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89", O_RDWR|O_DIRECT|O_CLOEXEC) = 5 fstatfs(5, {f_type="NFS_SUPER_MAGIC", f_bsize=8192, f_blocks=129144408, f_bfree=13020403, f_bavail=6461051, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=8192}) = 0 lseek(5, 0, SEEK_END) = -1 EINVAL (Invalid argument) write(2, "qemu-img: ", 10qemu-img: ) = 10 write(2, "Could not open '/rhev/data-cente"..., 189Could not open '/rhev/data-center/67e767fb-c134-4bcd-916b-3ce09989ce31/9ab02025-d3bd-452a-9f74-1d2603707fb0/images/2c827c9e-0371-4287-9b8d-e61726253d62/ef6f9c77-788d-4df0-9a02-603e08605a89') = 189 From lseek(2): EINVAL whence is not valid. Or: the resulting file offset would be negative, or beyond the end of a seekable device. I think we can move this bug to qemu-kvm. Technically we could use "dd" for raw copies but it doesn't handle sparse files (not until bug 908980 is closed). I also suggest to stick to one tool.
I forgot to mention that this is specific to NFS, the same qemu-img command is successful on a regular file-system, so we can't exclude issues in the NFS stack (client/server).
This is probably related to nfs version 2: $ grep nfs_mount_options /etc/vdsm/vdsm.conf nfs_mount_options = soft,nosharecache,vers=2 $ mount | grep elad5 orion.qa.lab.tlv.redhat.com:/export/elad5 on /rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_elad5 type nfs (rw,soft,nosharecache,vers=2,addr=10.35.64.10) $ truncate -s 1999M test1.raw $ /tmp/test_lseek test1.raw lseek test1.raw = 2096103424 (errno = 0) $ truncate -s 2G test1.raw $ /tmp/test_lseek test1.raw lseek test1.raw = -1 (errno = 22) On nfs version 3: $ truncate -s 2G test1.raw $ /tmp/test_lseek test1.raw lseek test1.raw = 2147483648 (errno = 0)
NOTABUG, on nfs v2 the file size limit is 2Gb.
Closing this as NOTABUG since of nfsv2 file size limitation.