Description of problem: i cannot upload a qcow2 disk, test in upload disk dialog works but the upload is changed to Paused by System after some time. - daemon.log ... 2018-09-10 11:55:41,631 INFO (Thread-17) [images] [10.37.136.71] WRITE size=0 offset=943718400 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:41,954 INFO (Thread-18) [images] [10.37.136.71] WRITE size=0 offset=1048576000 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:42,327 INFO (Thread-19) [images] [10.37.136.71] WRITE size=0 offset=1153433600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:42,328 ERROR (Thread-19) [web] ERROR [10.37.136.71] PUT /images/9eabe166-374c-4100-ad3a-a9e1f314c07b [500] [Errno 22] Invalid argument [request=0.000938, operation=0.000242, sync=0.000014] Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/web.py", line 99, in __call__ resp = self.dispatch(request, clock) File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/web.py", line 128, in dispatch return method(*match.groups()) File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 258, in put ticket.run(op) File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/tickets.py", line 107, in run operation.run() File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 81, in run self._run() File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 176, in _run remaining = self._seek_before_first_block(dst) File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 195, in _seek_before_first_block dst.seek(self._offset) File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 445, in seek return self._fio.seek(pos, how) IOError: [Errno 22] Invalid argument 2018-09-10 11:55:46,267 INFO (Thread-21) [images] [10.37.136.71] WRITE size=0 offset=1153433600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b ... - proxy log (Thread-15 ) DEBUG 2018-09-10 12:14:56,922 connectionpool:383:requests.packages.urllib3.connectionpool:(_make_request) "PUT /images/3 286e69d-f25d-49e8-b288-f22a3701a2d6 HTTP/1.1" 500 189 (Thread-15 ) DEBUG 2018-09-10 12:14:56,923 images:234:root:(make_imaged_request) Incoming headers from host: content-length: 189 content-type: application/json; charset=UTF-8 date: Mon, 10 Sep 2018 10:14:56 GMT server: WSGIServer/0.1 Python/2.7.5 (Thread-15 ) ERROR 2018-09-10 12:14:56,923 web:136:web:(log_error) ERROR [10.34.130.244] PUT /images/3286e69d-f25d-49e8-b288-f22a3701 a2d6 [500] Failed response from host: 500 {"explanation": "The server has either erred or is incapable of performing the requested op eration.", "code": 500, "detail": "[Errno 22] Invalid argument", "title": "Internal Server Error"} [request=0.012144] Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/web.py", line 99, in __call__ resp = self.dispatch(request, clock) File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/web.py", line 128, in dispatch return method(*match.groups()) File "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/http_helper.py", line 86, in wrapper return func(self, *args) File "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/http_helper.py", line 59, in wrapper ret = func(self, *args) File "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/images.py", line 153, in put read_timeout=self.config.imaged_read_timeout_sec) File "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/images.py", line 239, in make_imaged_request raise exc.status_map[imaged_resp.status_code](s) HTTPInternalServerError: Failed response from host: 500 {"explanation": "The server has either erred or is incapable of performing th e requested operation.", "code": 500, "detail": "[Errno 22] Invalid argument", "title": "Internal Server Error"} Version-Release number of selected component (if applicable): ovirt-imageio-daemon-1.4.4-0.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. try to upload qcow2 disk 2. 3. Actual results: after some time the upload status is switched to Paused by System Expected results: should work Additional info:
I was using uncompressed image from https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1804_02.qcow2.xz
## The image $ ls -l CentOS-7-x86_64-GenericCloud-1804_02.qcow2 -rw-rw-r--. 1 nsoffer nsoffer 935067648 May 19 04:35 CentOS-7-x86_64-GenericCloud-1804_02.qcow2 $ qemu-img info CentOS-7-x86_64-GenericCloud-1804_02.qcow2 image: CentOS-7-x86_64-GenericCloud-1804_02.qcow2 file format: qcow2 virtual size: 8.0G (8589934592 bytes) disk size: 882M cluster_size: 65536 Format specific information: compat: 0.10 refcount bits: 16 (Note that disk size is the allocated size, not the file size, which is the value relevant for upload) ## vdsm log 1. Creating volume of size 8589934592 (8 GiB) (expected) 2018-09-10 11:54:52,493+0200 INFO (jsonrpc/7) [vdsm.api] START createVolume(sdUUID=u'94884ba9-9f37-4009-8b70-2bc11a97afbe', spUUID=u'7030d3e7-38de-43ae-adad-358190e4e99d', imgUUID=u'b1e28231-47a3-4ee3-aabb-6d9bbadfa995', size=u'8589934592', volFormat=4, preallocate=2, diskType=u'DATA', volUUID=u'ebaaa479-0ac0-42d8-8324-b1833fca0b17', desc=u'{"DiskAlias":"CentOS-7-x86_64-GenericCloud-1804_02.qcow2","DiskDescription":"CentOS-7-x86_64-GenericCloud-1804_02.qcow2"}', srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=u'935067648') from=::ffff:10.37.136.71,58508, flow_id=b3cf0397-3c89-41d1-9c7b-760b0511a897, task_id=d0190393-3971-41fa-a7e3-08920739e8bf (api:46) The disk is using volFormat=4, preallocate=2. These cryptic numbers mean: COW_FORMAT = 4 SPARSE_VOL = 2 So the disk was created as thin provisioned disk using qcow2 format (expected). 2. Creating LV for the volume 2018-09-10 11:54:52,811+0200 INFO (tasks/2) [storage.LVM] Creating LV (vg=94884ba9-9f37-4009-8b70-2bc11a97afbe, lv=ebaaa479-0ac0-42d8-8324-b1833fca0b17, size=981m, activate=True, contiguous=False, initialTags=('OVIRT_VOL_INITIALIZING',)) (lvm:1139) 2018-09-10 11:54:53,016+0200 WARN (tasks/2) [root] File: /rhev/data-center/mnt/blockSD/94884ba9-9f37-4009-8b70-2bc11a97afbe/images/b1e28231-47a3-4ee3-aabb-6d9bbadfa995/ebaaa479-0ac0-42d8-8324-b1833fca0b17 already removed (fileutils:51) LVM rounds up the volume size to the next extent 1024 MiB (extent size = 128 MiB). So we have 1G LV (expected). ## daemon log 1. Adding ticket 2018-09-10 11:55:08,178 INFO (Thread-1) [tickets] [local] ADD ticket={u'uuid': u'9eabe166-374c-4100-ad3a-a9e1f314c07b', u'ops': [u'write'], u'url': u'file:///rhev/data-center/mnt/blockSD/ 94884ba9-9f37-4009-8b70-2bc11a97afbe/images/b1e28231-47a3-4ee3-aabb-6d9bbadfa995/ebaaa479-0ac0-42d8-8324-b1833fca0b17', u'sparse': False, u'timeout': 300, u'size': 8589934592} First issue - ticket size 8589934592 (8 GiB) is wrong. Since we upload 891.75 MiB image, we expect to have size = 1073741824 (lv size). 2. Then we see normal upload flow 2018-09-10 11:55:10,797 INFO (Thread-3) [images] [10.37.136.71] OPTIONS ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:11,190 INFO (Thread-4) [images] [10.37.136.71] WRITE size=104857600 offset=0 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:13,706 INFO (Thread-6) [images] [10.37.136.71] WRITE size=104857600 offset=104857600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:17,511 INFO (Thread-8) [images] [10.37.136.71] WRITE size=104857600 offset=209715200 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:20,931 INFO (Thread-9) [images] [10.37.136.71] WRITE size=104857600 offset=314572800 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:23,865 INFO (Thread-11) [images] [10.37.136.71] WRITE size=104857600 offset=419430400 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:27,290 INFO (Thread-12) [images] [10.37.136.71] WRITE size=104857600 offset=524288000 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:31,204 INFO (Thread-13) [images] [10.37.136.71] WRITE size=104857600 offset=629145600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:34,548 INFO (Thread-15) [images] [10.37.136.71] WRITE size=104857600 offset=734003200 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b Last successful request - write 96206848 at offset 838860800: 2018-09-10 11:55:37,623 INFO (Thread-16) [images] [10.37.136.71] WRITE size=96206848 offset=838860800 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b Total transfer size is now 935067648 - all data was uploaded successfully (image size 935067648). At this point the uploading program should have stopped, since all data was uploaded. But we see: 2018-09-10 11:55:41,631 INFO (Thread-17) [images] [10.37.136.71] WRITE size=0 offset=943718400 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b This is write of 0 bytes to offset 943718400, this request does not make sense. Looks like the uploading program is trying to upload 8 GiB of data, instead of 891.75 MiB of data, and it does not check correctly for EOF, so when the program get "" empty read singnalling end of the file, it continues to send the "read" data instead of stopping. Then we see more 0 bytes writes: 2018-09-10 11:55:41,954 INFO (Thread-18) [images] [10.37.136.71] WRITE size=0 offset=1048576000 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b 2018-09-10 11:55:42,327 INFO (Thread-19) [images] [10.37.136.71] WRITE size=0 offset=1153433600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b The last write seems to fail trying to seek to offset 1153433600 which is after the end of the logical volume. 2018-09-10 11:55:42,328 ERROR (Thread-19) [web] ERROR [10.37.136.71] PUT /images/9eabe166-374c-4100-ad3a-a9e1f314c07b [500] [Errno 22] Invalid argument [request=0.000938, operation=0.000242, sync=0.000014] ... File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 445, in seek return self._fio.seek(pos, how) IOError: [Errno 22] Invalid argument So at the daemon, everything looks expected. We can improve error handling by rejecting zero writes, we can consider this for 4.3. ## Proxy log In the proxy, we do't have enough info for this tranfer since proxy is missing proper info logs in 1.4.4. This is fixed in https://gerrit.ovirt.org/c/93961/ and will be available 4.2.7. However since you tried the upload multiple times and change log level to DEBUG (thanks!), we see the same pattern in the next upload: $ grep 'Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6' var/log/ovirt-imageio-proxy/image-proxy.log (Thread-4 ) DEBUG 2018-09-10 12:14:29,202 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host (Thread-5 ) DEBUG 2018-09-10 12:14:31,980 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host (Thread-6 ) DEBUG 2018-09-10 12:14:35,167 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host (Thread-7 ) DEBUG 2018-09-10 12:14:38,054 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host (Thread-8 ) DEBUG 2018-09-10 12:14:41,183 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host (Thread-9 ) DEBUG 2018-09-10 12:14:44,147 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host (Thread-10 ) DEBUG 2018-09-10 12:14:47,539 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host (Thread-11 ) DEBUG 2018-09-10 12:14:50,396 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host (Thread-12 ) DEBUG 2018-09-10 12:14:53,477 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 96206848 bytes to host (Thread-13 ) DEBUG 2018-09-10 12:14:56,149 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host (Thread-14 ) DEBUG 2018-09-10 12:14:56,496 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host (Thread-15 ) DEBUG 2018-09-10 12:14:56,911 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host (Thread-16 ) DEBUG 2018-09-10 12:15:00,406 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host (Thread-17 ) DEBUG 2018-09-10 12:15:03,932 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host (Thread-18 ) DEBUG 2018-09-10 12:15:07,863 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host The proxy is just delivering the bad requests from the client. We can improve error handling by rejecting zero writes, we can consider this for 4.3. ## engine log 1. Creating disk 2018-09-10 12:14:18,925+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (default task-38) [fc0c40e6-37cc-487f-811f-4179928c8909] START, CreateImageVDSCommand( CreateImageVDSCommandParameters:{storagePoolId='7030d3e7-38de-43ae-adad-358190e4e99d', ignoreFailoverLimit='false', storageDomainId='94884ba9-9f37-4009-8b70-2bc11a97afbe', imageGroupId='b0c4ac42-15be-434d-a27f-c9469ed7068e', imageSizeInBytes='8589934592', volumeFormat='COW', newImageId='cf35eac7-495f-4736-9390-bde53642bc1b', imageType='Sparse', newImageDescription='{"DiskAlias":"CentOS-7-x86_64-GenericCloud-1804_02.qcow2","DiskDescription":"CentOS-7-x86_64-GenericCloud-1804_02.qcow2"}', imageInitialSizeInBytes='935067648'}), log id: 205a083d Parameters looks correct, matching what we see in vdsm log. 2. Creating transfer for this volume 2018-09-10 12:14:25,551+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-100) [fc0c40e6-37cc-487f-811f-4179928c8909] Successfully added Upload disk 'CentOS-7-x86_64-GenericCloud-1804_02.qcow2' (disk id: 'b0c4ac42-15be-434d-a27f-c9469ed7068e', image id: 'cf35eac7-495f-4736-9390-bde53642bc1b') for image transfer command '3a33c16c-f812-4100-8f04-958625d4d705' 3. Adding ticket for this transfer 2018-09-10 12:14:26,288+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-100) [fc0c40e6-37cc-487f-811f-4179928c8909] START, AddImageTicketVDSCommand(HostName = slot-6c, AddImageTicketVDSCommandParameters:{hostId='b91e1530-b1f5-4b7f-a8fc-260da478ebf2', ticketId='3286e69d-f25d-49e8-b288-f22a3701a2d6', timeout='300', operations='[write]', size='8589934592', url='file:///rhev/data-center/mnt/blockSD/94884ba9-9f37-4009-8b70-2bc11a97afbe/images/b0c4ac42-15be-434d-a27f-c9469ed7068e/cf35eac7-495f-4736-9390-bde53642bc1b', filename='null', sparse='false'}), log id: 2c490127 Here we have the first error: size='8589934592' The size should be the actual size of the disk (1073741824). But this is probably not the root cause. ## Summary - There is no bug in daemon or proxy. - Engine created wrong size in ticket, but it is probably not the root cause. If you performed this upload from he UI, please move the bug to ovirt-engine. If you performed this upload using the SDK, your upload script is incorrect and you can close this bug as NOTABUG. You can open other bugs for the wrong ticket size, and for rejecting 0 bytes writes in the daemon and proxy.
moving to ovirt-engine, this was tried via web UI.
Jiri, - Can you add the basic info about engine version tested? - Which was the last version that passed this test?
(In reply to Nir Soffer from comment #5) > Jiri, > > - Can you add the basic info about engine version tested? ovirt-engine-webadmin-portal-4.2.6.4-0.1.el7ev.noarch ovirt-engine-4.2.6.4-0.1.el7ev.noarch > - Which was the last version that passed this test? no idea, i just tried it after long time
I hit that too. Uploading image file is Paused in the middle of process. In logs are the same errors as reported here. rhv 4.2.6.4-0.1.el7ev
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Verified on upstream build 4.2.7.3-0.0.master.20181007094930.git02c1da0.el7
QE verification bot: the bug was verified upstream
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.