Bug 1627032 - Upload disk fail trying to write 0 bytes after the end of the LV
Summary: Upload disk fail trying to write 0 bytes after the end of the LV
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.7
: ---
Assignee: Daniel Erez
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks: 1635830
TreeView+ depends on / blocked
 
Reported: 2018-09-10 10:21 UTC by Jiri Belka
Modified: 2018-11-02 14:37 UTC (History)
11 users (show)

Fixed In Version: ovirt-engine-4.2.7.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1635830 (view as bug list)
Environment:
Last Closed: 2018-11-02 14:37:21 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3125531 0 None None None 2018-10-02 14:42:38 UTC
oVirt gerrit 94636 0 'None' MERGED webadmin: upload image - use actual size for transfer size 2020-02-17 18:32:56 UTC
oVirt gerrit 94664 0 'None' MERGED webadmin: upload image - use actual size for transfer size 2020-02-17 18:32:56 UTC

Description Jiri Belka 2018-09-10 10:21:07 UTC
Description of problem:

i cannot upload a qcow2 disk, test in upload disk dialog works but the upload is changed to Paused by System after some time.

- daemon.log

...
2018-09-10 11:55:41,631 INFO    (Thread-17) [images] [10.37.136.71] WRITE size=0 offset=943718400 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:41,954 INFO    (Thread-18) [images] [10.37.136.71] WRITE size=0 offset=1048576000 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:42,327 INFO    (Thread-19) [images] [10.37.136.71] WRITE size=0 offset=1153433600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:42,328 ERROR   (Thread-19) [web] ERROR [10.37.136.71] PUT /images/9eabe166-374c-4100-ad3a-a9e1f314c07b [500] [Errno 22] Invalid argument [request=0.000938, operation=0.000242, sync=0.000014]
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/web.py", line 99, in __call__
    resp = self.dispatch(request, clock)
  File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/web.py", line 128, in dispatch
    return method(*match.groups())
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 258, in put
    ticket.run(op)
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/tickets.py", line 107, in run
    operation.run()
  File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 81, in run
    self._run()
  File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 176, in _run
    remaining = self._seek_before_first_block(dst)
  File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 195, in _seek_before_first_block
    dst.seek(self._offset)
  File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 445, in seek
    return self._fio.seek(pos, how)
IOError: [Errno 22] Invalid argument
2018-09-10 11:55:46,267 INFO    (Thread-21) [images] [10.37.136.71] WRITE size=0 offset=1153433600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
...

- proxy log

(Thread-15 ) DEBUG 2018-09-10 12:14:56,922 connectionpool:383:requests.packages.urllib3.connectionpool:(_make_request) "PUT /images/3
286e69d-f25d-49e8-b288-f22a3701a2d6 HTTP/1.1" 500 189
(Thread-15 ) DEBUG 2018-09-10 12:14:56,923 images:234:root:(make_imaged_request) Incoming headers from host:
  content-length: 189
  content-type: application/json; charset=UTF-8
  date: Mon, 10 Sep 2018 10:14:56 GMT
  server: WSGIServer/0.1 Python/2.7.5
(Thread-15 ) ERROR 2018-09-10 12:14:56,923 web:136:web:(log_error) ERROR [10.34.130.244] PUT /images/3286e69d-f25d-49e8-b288-f22a3701
a2d6 [500] Failed response from host: 500 {"explanation": "The server has either erred or is incapable of performing the requested op
eration.", "code": 500, "detail": "[Errno 22] Invalid argument", "title": "Internal Server Error"} [request=0.012144]
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/web.py", line 99, in __call__
    resp = self.dispatch(request, clock)
  File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/web.py", line 128, in dispatch
    return method(*match.groups())
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/http_helper.py", line 86, in wrapper
    return func(self, *args)
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/http_helper.py", line 59, in wrapper
    ret = func(self, *args)
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/images.py", line 153, in put
    read_timeout=self.config.imaged_read_timeout_sec)
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/images.py", line 239, in make_imaged_request
    raise exc.status_map[imaged_resp.status_code](s)
HTTPInternalServerError: Failed response from host: 500 {"explanation": "The server has either erred or is incapable of performing th
e requested operation.", "code": 500, "detail": "[Errno 22] Invalid argument", "title": "Internal Server Error"}

Version-Release number of selected component (if applicable):
ovirt-imageio-daemon-1.4.4-0.el7ev.noarch

How reproducible:
100% 

Steps to Reproduce:
1. try to upload qcow2 disk
2.
3.

Actual results:
after some time the upload status is switched to Paused by System

Expected results:
should work

Additional info:

Comment 2 Jiri Belka 2018-09-10 10:35:40 UTC
I was using uncompressed image from  https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1804_02.qcow2.xz

Comment 3 Nir Soffer 2018-09-10 12:15:25 UTC
## The image

$ ls -l CentOS-7-x86_64-GenericCloud-1804_02.qcow2 
-rw-rw-r--. 1 nsoffer nsoffer 935067648 May 19 04:35 CentOS-7-x86_64-GenericCloud-1804_02.qcow2

$ qemu-img info CentOS-7-x86_64-GenericCloud-1804_02.qcow2 
image: CentOS-7-x86_64-GenericCloud-1804_02.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 882M
cluster_size: 65536
Format specific information:
    compat: 0.10
    refcount bits: 16

(Note that disk size is the allocated size, not the file size, which is the value
relevant for upload)


## vdsm log

1. Creating volume of size 8589934592 (8 GiB) (expected)

2018-09-10 11:54:52,493+0200 INFO  (jsonrpc/7) [vdsm.api] START createVolume(sdUUID=u'94884ba9-9f37-4009-8b70-2bc11a97afbe', spUUID=u'7030d3e7-38de-43ae-adad-358190e4e99d', imgUUID=u'b1e28231-47a3-4ee3-aabb-6d9bbadfa995', size=u'8589934592', volFormat=4, preallocate=2, diskType=u'DATA', volUUID=u'ebaaa479-0ac0-42d8-8324-b1833fca0b17', desc=u'{"DiskAlias":"CentOS-7-x86_64-GenericCloud-1804_02.qcow2","DiskDescription":"CentOS-7-x86_64-GenericCloud-1804_02.qcow2"}', srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=u'935067648') from=::ffff:10.37.136.71,58508, flow_id=b3cf0397-3c89-41d1-9c7b-760b0511a897, task_id=d0190393-3971-41fa-a7e3-08920739e8bf (api:46)

The disk is using volFormat=4, preallocate=2. These cryptic numbers mean:

COW_FORMAT = 4
SPARSE_VOL = 2

So the disk was created as thin provisioned disk using qcow2 format (expected).

2. Creating LV for the volume

2018-09-10 11:54:52,811+0200 INFO  (tasks/2) [storage.LVM] Creating LV (vg=94884ba9-9f37-4009-8b70-2bc11a97afbe, lv=ebaaa479-0ac0-42d8-8324-b1833fca0b17, size=981m, activate=True, contiguous=False, initialTags=('OVIRT_VOL_INITIALIZING',)) (lvm:1139)
2018-09-10 11:54:53,016+0200 WARN  (tasks/2) [root] File: /rhev/data-center/mnt/blockSD/94884ba9-9f37-4009-8b70-2bc11a97afbe/images/b1e28231-47a3-4ee3-aabb-6d9bbadfa995/ebaaa479-0ac0-42d8-8324-b1833fca0b17 already removed (fileutils:51)

LVM rounds up the volume size to the next extent 1024 MiB (extent size = 128 MiB).

So we have 1G LV (expected).


## daemon log

1. Adding ticket

2018-09-10 11:55:08,178 INFO    (Thread-1) [tickets] [local] ADD ticket={u'uuid': u'9eabe166-374c-4100-ad3a-a9e1f314c07b', u'ops': [u'write'], u'url': u'file:///rhev/data-center/mnt/blockSD/
94884ba9-9f37-4009-8b70-2bc11a97afbe/images/b1e28231-47a3-4ee3-aabb-6d9bbadfa995/ebaaa479-0ac0-42d8-8324-b1833fca0b17', u'sparse': False, u'timeout': 300, u'size': 8589934592}

First issue - ticket size 8589934592 (8 GiB) is wrong. Since we upload 891.75 MiB
image, we expect to have size = 1073741824 (lv size).

2. Then we see normal upload flow

2018-09-10 11:55:10,797 INFO    (Thread-3) [images] [10.37.136.71] OPTIONS ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:11,190 INFO    (Thread-4) [images] [10.37.136.71] WRITE size=104857600 offset=0 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:13,706 INFO    (Thread-6) [images] [10.37.136.71] WRITE size=104857600 offset=104857600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:17,511 INFO    (Thread-8) [images] [10.37.136.71] WRITE size=104857600 offset=209715200 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:20,931 INFO    (Thread-9) [images] [10.37.136.71] WRITE size=104857600 offset=314572800 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:23,865 INFO    (Thread-11) [images] [10.37.136.71] WRITE size=104857600 offset=419430400 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:27,290 INFO    (Thread-12) [images] [10.37.136.71] WRITE size=104857600 offset=524288000 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:31,204 INFO    (Thread-13) [images] [10.37.136.71] WRITE size=104857600 offset=629145600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:34,548 INFO    (Thread-15) [images] [10.37.136.71] WRITE size=104857600 offset=734003200 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b

Last successful request - write 96206848 at offset 838860800:

2018-09-10 11:55:37,623 INFO    (Thread-16) [images] [10.37.136.71] WRITE size=96206848 offset=838860800 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b

Total transfer size is now 935067648 - all data was uploaded successfully
(image size 935067648).

At this point the uploading program should have stopped, since all data was
uploaded. But we see:

2018-09-10 11:55:41,631 INFO    (Thread-17) [images] [10.37.136.71] WRITE size=0 offset=943718400 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b

This is write of 0 bytes to offset 943718400, this request does not make sense.
Looks like the uploading program is trying to upload 8 GiB of data, instead 
of 891.75 MiB of data, and it does not check correctly for EOF, so when the
program get "" empty read singnalling end of the file, it continues to send
the "read" data instead of stopping.

Then we see more 0 bytes writes:

2018-09-10 11:55:41,954 INFO    (Thread-18) [images] [10.37.136.71] WRITE size=0 offset=1048576000 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b
2018-09-10 11:55:42,327 INFO    (Thread-19) [images] [10.37.136.71] WRITE size=0 offset=1153433600 flush=True ticket=9eabe166-374c-4100-ad3a-a9e1f314c07b

The last write seems to fail trying to seek to offset 1153433600 which
is after the end of the logical volume.

2018-09-10 11:55:42,328 ERROR   (Thread-19) [web] ERROR [10.37.136.71] PUT /images/9eabe166-374c-4100-ad3a-a9e1f314c07b [500] [Errno 22] Invalid argument [request=0.000938, operation=0.000242, sync=0.000014]
...
  File "/usr/lib64/python2.7/site-packages/ovirt_imageio_common/directio.py", line 445, in seek
    return self._fio.seek(pos, how)
IOError: [Errno 22] Invalid argument

So at the daemon, everything looks expected.

We can improve error handling by rejecting zero writes, we can consider this for
4.3.


## Proxy log

In the proxy, we do't have enough info for this tranfer since proxy is missing
proper info logs in 1.4.4. This is fixed in
https://gerrit.ovirt.org/c/93961/ and will be available 4.2.7.

However since you tried the upload multiple times and change log level to DEBUG
(thanks!), we see the same pattern in the next upload:

$ grep 'Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6' var/log/ovirt-imageio-proxy/image-proxy.log 
(Thread-4  ) DEBUG 2018-09-10 12:14:29,202 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host
(Thread-5  ) DEBUG 2018-09-10 12:14:31,980 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host
(Thread-6  ) DEBUG 2018-09-10 12:14:35,167 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host
(Thread-7  ) DEBUG 2018-09-10 12:14:38,054 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host
(Thread-8  ) DEBUG 2018-09-10 12:14:41,183 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host
(Thread-9  ) DEBUG 2018-09-10 12:14:44,147 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host
(Thread-10 ) DEBUG 2018-09-10 12:14:47,539 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host
(Thread-11 ) DEBUG 2018-09-10 12:14:50,396 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 104857600 bytes to host
(Thread-12 ) DEBUG 2018-09-10 12:14:53,477 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 96206848 bytes to host
(Thread-13 ) DEBUG 2018-09-10 12:14:56,149 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host
(Thread-14 ) DEBUG 2018-09-10 12:14:56,496 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host
(Thread-15 ) DEBUG 2018-09-10 12:14:56,911 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host
(Thread-16 ) DEBUG 2018-09-10 12:15:00,406 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host
(Thread-17 ) DEBUG 2018-09-10 12:15:03,932 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host
(Thread-18 ) DEBUG 2018-09-10 12:15:07,863 images:149:root:(put) Resource 3286e69d-f25d-49e8-b288-f22a3701a2d6: transferring 0 bytes to host

The proxy is just delivering the bad requests from the client.

We can improve error handling by rejecting zero writes, we can consider this for
4.3.


## engine log

1. Creating disk

2018-09-10 12:14:18,925+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (default task-38) [fc0c40e6-37cc-487f-811f-4179928c8909] START, CreateImageVDSCommand( CreateImageVDSCommandParameters:{storagePoolId='7030d3e7-38de-43ae-adad-358190e4e99d', ignoreFailoverLimit='false', storageDomainId='94884ba9-9f37-4009-8b70-2bc11a97afbe', imageGroupId='b0c4ac42-15be-434d-a27f-c9469ed7068e', imageSizeInBytes='8589934592', volumeFormat='COW', newImageId='cf35eac7-495f-4736-9390-bde53642bc1b', imageType='Sparse', newImageDescription='{"DiskAlias":"CentOS-7-x86_64-GenericCloud-1804_02.qcow2","DiskDescription":"CentOS-7-x86_64-GenericCloud-1804_02.qcow2"}', imageInitialSizeInBytes='935067648'}), log id: 205a083d

Parameters looks correct, matching what we see in vdsm log.

2. Creating transfer for this volume

2018-09-10 12:14:25,551+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-100) [fc0c40e6-37cc-487f-811f-4179928c8909] Successfully added Upload disk 'CentOS-7-x86_64-GenericCloud-1804_02.qcow2' (disk id: 'b0c4ac42-15be-434d-a27f-c9469ed7068e', image id: 'cf35eac7-495f-4736-9390-bde53642bc1b') for image transfer command '3a33c16c-f812-4100-8f04-958625d4d705'

3. Adding ticket for this transfer

2018-09-10 12:14:26,288+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-100) [fc0c40e6-37cc-487f-811f-4179928c8909] START, AddImageTicketVDSCommand(HostName = slot-6c, AddImageTicketVDSCommandParameters:{hostId='b91e1530-b1f5-4b7f-a8fc-260da478ebf2', ticketId='3286e69d-f25d-49e8-b288-f22a3701a2d6', timeout='300', operations='[write]', size='8589934592', url='file:///rhev/data-center/mnt/blockSD/94884ba9-9f37-4009-8b70-2bc11a97afbe/images/b0c4ac42-15be-434d-a27f-c9469ed7068e/cf35eac7-495f-4736-9390-bde53642bc1b', filename='null', sparse='false'}), log id: 2c490127

Here we have the first error:

    size='8589934592'

The size should be the actual size of the disk (1073741824).
But this is probably not the root cause.


## Summary

- There is no bug in daemon or proxy.
- Engine created wrong size in ticket, but it is probably not the root cause.

If you performed this upload from he UI, please move the bug to ovirt-engine.

If you performed this upload using the SDK, your upload script is incorrect
and you can close this bug as NOTABUG.

You can open other bugs for the wrong ticket size, and for rejecting 0 bytes writes
in the daemon and proxy.

Comment 4 Jiri Belka 2018-09-10 12:37:23 UTC
moving to ovirt-engine, this was tried via web UI.

Comment 5 Nir Soffer 2018-09-10 12:52:37 UTC
Jiri,

- Can you add the basic info about engine version tested?
- Which was the last version that passed this test?

Comment 6 Jiri Belka 2018-09-10 13:42:38 UTC
(In reply to Nir Soffer from comment #5)
> Jiri,
> 
> - Can you add the basic info about engine version tested?

ovirt-engine-webadmin-portal-4.2.6.4-0.1.el7ev.noarch
ovirt-engine-4.2.6.4-0.1.el7ev.noarch

> - Which was the last version that passed this test?

no idea, i just tried it after long time

Comment 7 Radek Duda 2018-09-13 12:23:43 UTC
I hit that too. Uploading image file is Paused in the middle of process. In logs are the same errors as reported here. 

rhv 4.2.6.4-0.1.el7ev

Comment 8 Red Hat Bugzilla Rules Engine 2018-09-13 13:07:45 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 15 Avihai 2018-10-08 12:40:02 UTC
Verified on upstream build 4.2.7.3-0.0.master.20181007094930.git02c1da0.el7

Comment 16 Raz Tamir 2018-10-18 11:49:38 UTC
QE verification bot: the bug was verified upstream

Comment 17 Sandro Bonazzola 2018-11-02 14:37:21 UTC
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.