Bug 2053103 - oVirt engine takes 16 seconds to create a disk
Summary: oVirt engine takes 16 seconds to create a disk
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: future
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Benny Zlotnik
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks: 2039255
TreeView+ depends on / blocked
 
Reported: 2022-02-10 14:32 UTC by Richard W.M. Jones
Modified: 2022-05-24 15:20 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-24 15:20:09 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)
Vdsm debug logs (16.86 KB, text/plain)
2022-02-10 16:36 UTC, Nir Soffer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44679 0 None None None 2022-02-10 14:39:27 UTC

Description Richard W.M. Jones 2022-02-10 14:32:37 UTC
Description of problem:

One thing that causes virt-v2v to take a long time to do conversion
is waiting for oVirt to create the target disk.  On my quite fast
local machine it seems to take around 16 seconds (per disk).

From the virt-v2v full log:

  00:00:41 python3 '/tmp/v2v.fvLq7Z/rhv-upload-transfer.py' '/tmp/v2v.fvLq7Z/params4.json'
  00:00:41 cannot read /etc/vdsm/vdsm.id, using any host: [Errno 2] No such file or directory: '/etc/vdsm/vdsm.id'
* 00:00:41 disk.id = 'eb30f110-75bb-4a6b-9f21-272392abeb60'
* 00:00:57 transfer.id = 'cde22cf4-7ab7-4e2c-86c8-b6d704987e4a'
  00:00:58 transfer output parsed as: {
  00:00:58   "": {
  00:00:58     "is_ovirt_host": false,
  00:00:58     "destination_url": "https://ovirt4410-host.home.annexia.org:54322/images/3c29aafb-1a90-4c6b-9372-117bdbea288e";,
  00:00:58     "transfer_id": "cde22cf4-7ab7-4e2c-86c8-b6d704987e4a"
  00:00:58   }
  00:00:58 }

This corresponds to the following Python code:

https://github.com/libguestfs/virt-v2v/blob/18b11018d2c5411693d120add445031758eee5ce/output/rhv-upload-transfer.py#L275
https://github.com/libguestfs/virt-v2v/blob/18b11018d2c5411693d120add445031758eee5ce/output/rhv-upload-transfer.py#L132

More background here:

https://listman.redhat.com/archives/libguestfs/2022-February/msg00130.html

Version-Release number of selected component (if applicable):

Version 4.4.10.6-1.el8 

How reproducible:

100%

Steps to Reproduce:

It's tricky to reproduce this on its own, but I guess you could
write a small Python program using the API which would create a
disk and then poll until it reaches the DiskStatus.OK state.

Comment 1 Nir Soffer 2022-02-10 16:33:33 UTC
Yes, this is a known issue, and it is very easy to reproduce.

To reproduce, upload a disk using the upload_disk.py example script:
https://github.com/oVirt/python-ovirt-engine-sdk4/blob/main/examples/upload_disk.py

I filtered out the unimportant events from the following command so we can
focus on the importent events:
- Creating disk
- Disk was created
- Create transfer
- Transfer ready
- Uploading image
- Finliazing transfer

$ for i in $(seq 10); do python3 /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py -c engine --sd-name alpine-nfs-01 --disk-format raw --disk-sparse /var/tmp/fedora-35.raw; done
...
[   0.0 ] Creating disk...
[   8.4 ] Disk ID: 6c49fce5-36a3-4cf4-9fac-1c70f32a8d82
[   8.4 ] Creating image transfer...
[  10.8 ] Transfer ID: 7881b9eb-1173-4e3d-8f84-68a246a44f34
[  10.8 ] Transfer host name: host4
[  10.8 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.57 seconds, 3.82 GiB/s                                 
[  12.3 ] Finalizing image transfer...
[  18.4 ] Upload completed successfully

...
[   0.0 ] Creating disk...
[   8.4 ] Disk ID: 1c255b43-8bef-4a02-b4a2-b8d9d0118a4a
[   8.4 ] Creating image transfer...
[  10.7 ] Transfer ID: 40780064-b9e3-4913-b1f7-d5f6083421a3
[  10.7 ] Transfer host name: host4
[  10.7 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.48 seconds, 4.06 GiB/s                                 
[  12.2 ] Finalizing image transfer...
[  18.3 ] Upload completed successfully

...
[   0.0 ] Creating disk...
[  16.6 ] Disk ID: 8e52232e-c833-4883-a897-7ab01fff3986
[  16.6 ] Creating image transfer...
[  18.9 ] Transfer ID: 725ac8a2-8f06-4bdc-8268-809b8d39f7a0
[  18.9 ] Transfer host name: host4
[  18.9 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.61 seconds, 3.72 GiB/s                                 
[  20.6 ] Finalizing image transfer...
[  25.6 ] Upload completed successfully

...
[   0.0 ] Creating disk...
[   4.4 ] Disk ID: 352648bf-2c64-436a-9151-8dcc3ce5dfcc
[   4.4 ] Creating image transfer...
[   6.7 ] Transfer ID: 8a2b8380-fda4-4159-8f26-eb37a6f25c87
[   6.7 ] Transfer host name: host4
[   6.7 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.36 seconds, 4.40 GiB/s                                 
[   8.0 ] Finalizing image transfer...
[  13.1 ] Upload completed successfully

...
[   0.0 ] Creating disk...
[  16.6 ] Disk ID: dc3751ba-568f-4c4d-9846-d528c3b00449
[  16.6 ] Creating image transfer...
[  18.9 ] Transfer ID: cc27587c-45bd-4404-a933-22469c6d43af
[  18.9 ] Transfer host name: host4
[  18.9 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.60 seconds, 3.74 GiB/s                                 
[  20.5 ] Finalizing image transfer...
[  25.5 ] Upload completed successfully

...
[   0.0 ] Creating disk...
[   8.5 ] Disk ID: 771dbd4c-247c-44de-9182-f9b01affc95b
[   8.5 ] Creating image transfer...
[  10.8 ] Transfer ID: 6e6f6143-87df-4944-8faf-ed310c103851
[  10.8 ] Transfer host name: host4
[  10.8 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.44 seconds, 4.17 GiB/s                                 
[  12.3 ] Finalizing image transfer...
[  17.3 ] Upload completed successfully

...
[   0.0 ] Creating disk...
[   8.4 ] Disk ID: 1f82c81e-0e69-4c58-abe1-c9a866a952e8
[   8.4 ] Creating image transfer...
[  10.7 ] Transfer ID: 7f1d6651-bad9-4c05-b7d8-ddb2a00930fe
[  10.7 ] Transfer host name: host4
[  10.7 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.36 seconds, 4.42 GiB/s                                 
[  12.0 ] Finalizing image transfer...
[  17.1 ] Upload completed successfully

...
[   0.0 ] Creating disk...
[  16.6 ] Disk ID: 0d048ceb-f6c5-4ec2-a996-19e2769ceb64
[  16.6 ] Creating image transfer...
[  18.9 ] Transfer ID: c3954d72-ffb9-48c2-8f83-e1e13a69121d
[  18.9 ] Transfer host name: host4
[  18.9 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.64 seconds, 3.67 GiB/s                                 
[  20.5 ] Finalizing image transfer...
[  25.6 ] Upload completed successfully

...
[   0.0 ] Creating disk...
[   7.5 ] Disk ID: 648d8a3c-44f8-44ce-9529-dae7c7a8144d
[   7.5 ] Creating image transfer...
[   9.8 ] Transfer ID: 9bbc8f5c-91cd-4101-a8ba-50a8702e2d85
[   9.8 ] Transfer host name: host4
[   9.8 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.54 seconds, 3.89 GiB/s                                 
[  11.3 ] Finalizing image transfer...
[  13.3 ] Upload completed successfully

...
[   0.0 ] Creating disk...
[  16.6 ] Disk ID: 3bc9a4f5-9b65-4890-9963-a6a6ca91351a
[  16.6 ] Creating image transfer...
[  19.0 ] Transfer ID: 53f1119a-d063-4a33-9491-146619c5a270
[  19.0 ] Transfer host name: host4
[  19.0 ] Uploading image...
[ 100.00% ] 6.00 GiB, 1.35 seconds, 4.43 GiB/s                                 
[  20.3 ] Finalizing image transfer...
[  22.4 ] Upload completed successfully

We can see that the longest operation is creating a disk, which takes
4.4 - 16.6 seconds.


Testing the same flow with block storage:

$ for i in $(seq 10); do python3 /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py -c engine --sd-name alpine-iscsi-01 --disk-format qcow2 --disk-sparse /var/tmp/fedora-35.raw; done
...
[   0.0 ] Creating disk...
[   8.6 ] Disk ID: d877984b-3388-4a25-a589-94e73f723c92
[   8.6 ] Creating image transfer...
[  11.2 ] Transfer ID: 9af12c60-f4e3-4468-b647-c8254c02066a
[  11.2 ] Transfer host name: host4
[  11.2 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.63 seconds, 2.28 GiB/s                                 
[  13.8 ] Finalizing image transfer...
[  18.8 ] Upload completed successfully
...
[   0.0 ] Creating disk...
[   8.5 ] Disk ID: 22b2c1cf-45cd-4a26-88e7-9235b16ecc03
[   8.5 ] Creating image transfer...
[  11.1 ] Transfer ID: fba7aff1-178d-489d-990d-79351b4bcd1c
[  11.1 ] Transfer host name: host4
[  11.1 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.38 seconds, 2.52 GiB/s                                 
[  13.5 ] Finalizing image transfer...
[  18.5 ] Upload completed successfully
...
[   0.0 ] Creating disk...
[   8.4 ] Disk ID: a709f044-0357-45db-a515-e96afebd8adc
[   8.4 ] Creating image transfer...
[  11.1 ] Transfer ID: c68cf405-f4f1-44fc-b707-870f95d45cf9
[  11.1 ] Transfer host name: host4
[  11.1 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.37 seconds, 2.53 GiB/s                                 
[  13.5 ] Finalizing image transfer...
[  18.5 ] Upload completed successfully
...
[   0.0 ] Creating disk...
[  16.6 ] Disk ID: fc235199-dbd5-435f-812a-a264689ada3c
[  16.6 ] Creating image transfer...
[  19.1 ] Transfer ID: e935af69-0b96-471c-ad64-644cd35e32ff
[  19.1 ] Transfer host name: host4
[  19.1 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.34 seconds, 2.56 GiB/s                                 
[  21.4 ] Finalizing image transfer...
[  26.5 ] Upload completed successfully
...
[   0.0 ] Creating disk...
[  16.5 ] Disk ID: 373ddbd2-cc9f-4829-bbad-dd14335a3058
[  16.5 ] Creating image transfer...
[  19.2 ] Transfer ID: a510198d-727e-4017-970e-dbd50c77f68f
[  19.2 ] Transfer host name: host4
[  19.2 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.44 seconds, 2.46 GiB/s                                 
[  21.6 ] Finalizing image transfer...
[  26.7 ] Upload completed successfully
...
[   0.0 ] Creating disk...
[   8.4 ] Disk ID: 335c9a35-f1a8-4ebc-83b1-0ab9aa3515de
[   8.4 ] Creating image transfer...
[  11.0 ] Transfer ID: f1b716b9-4810-4f8a-84f3-feed9ecf5588
[  11.0 ] Transfer host name: host4
[  11.0 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.53 seconds, 2.37 GiB/s                                 
[  13.5 ] Finalizing image transfer...
[  18.6 ] Upload completed successfully
...
[   0.0 ] Creating disk...
[   8.5 ] Disk ID: 85d9ea60-b696-44e5-a34f-2512dea4dd22
[   8.5 ] Creating image transfer...
[  11.1 ] Transfer ID: 7e4f2383-dd70-4815-a607-e54b4200d847
[  11.1 ] Transfer host name: host4
[  11.1 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.52 seconds, 2.38 GiB/s                                 
[  13.6 ] Finalizing image transfer...
[  18.7 ] Upload completed successfully
...
[   0.0 ] Creating disk...
[   9.5 ] Disk ID: a0be1c04-d203-4f66-bc7e-5c9c34726297
[   9.5 ] Creating image transfer...
[  12.1 ] Transfer ID: e7ed5f15-c1e9-46b5-9f25-3a13dd408908
[  12.1 ] Transfer host name: host4
[  12.1 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.41 seconds, 2.49 GiB/s                                 
[  14.5 ] Finalizing image transfer...
[  19.5 ] Upload completed successfully
...
[   0.0 ] Creating disk...
[   8.4 ] Disk ID: ca19bf6d-65d4-47bb-8875-32145d56bd22
[   8.4 ] Creating image transfer...
[  11.1 ] Transfer ID: 459dc6f9-9727-4184-ab82-97dba009a71b
[  11.1 ] Transfer host name: host4
[  11.1 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.49 seconds, 2.41 GiB/s                                 
[  13.6 ] Finalizing image transfer...
[  18.7 ] Upload completed successfully
...
[   0.0 ] Creating disk...
[  16.6 ] Disk ID: c23486fd-0bad-4dc4-8709-713839416d49
[  16.6 ] Creating image transfer...
[  19.2 ] Transfer ID: 44628ed1-3a85-4282-8a2a-88b2281b1e1a
[  19.2 ] Transfer host name: host4
[  19.2 ] Uploading image...
[ 100.00% ] 6.00 GiB, 2.54 seconds, 2.36 GiB/s                                 
[  21.8 ] Finalizing image transfer...
[  25.8 ] Upload completed successfully

Creating a disk took 8.4-16.6 seconds.


If we look at the logs creating the disk on NFS when it took 16.6 seconds:

1. API call

2022-02-10 17:22:03,055+0200 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer] Calling 'Volume.create' in bridge with {'volumeID': '9ca89fa6-ec7d-4adc-a0c8-28cd41014b60', 'storagepoolID': '3eb2b49f-dfdd-4807-90b9-4984ade85b3a', 'storagedomainID': 'e9467633-ee31-4e15-b3f8-3812b374c764', 'imageID': '0d048ceb-f6c5-4ec2-a996-19e2769ceb64', 'size': '6442450944', 'volFormat': 5, 'preallocate': 2, 'diskType': 'DATA', 'desc': '{"DiskAlias":"fedora-35.raw","DiskDescription":"Uploaded disk"}', 'srcImgUUID': '00000000-0000-0000-0000-000000000000', 'srcVolUUID': '00000000-0000-0000-0000-000000000000', 'addBitmaps': False} (__init__:333)

2. Work scheduled to task d4b7f25c-135f-43cc-b352-5115cbeab975

2022-02-10 17:22:03,056+0200 INFO  (jsonrpc/7) [vdsm.api] START createVolume(sdUUID='e9467633-ee31-4e15-b3f8-3812b374c764', spUUID='3eb2b49f-dfdd-4807-90b9-4984ade85b3a', imgUUID='0d048ceb-f6c5-4ec2-a996-19e2769ceb64', size='6442450944', volFormat=5, preallocate=2, diskType='DATA', volUUID='9ca89fa6-ec7d-4adc-a0c8-28cd41014b60', desc='{"DiskAlias":"fedora-35.raw","DiskDescription":"Uploaded disk"}', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize=None, addBitmaps=False, legal=True, sequence=0) from=::ffff:192.168.122.10,48140, flow_id=55c8d258-0f5d-4441-a214-67e680869990, task_id=d4b7f25c-135f-43cc-b352-5115cbeab975 (api:48)

3. Task d4b7f25c-135f-43cc-b352-5115cbeab975 started to run on worker tasks/4

2022-02-10 17:22:03,187+0200 DEBUG (tasks/7) [storage.taskmanager.task] (Task='d4b7f25c-135f-43cc-b352-5115cbeab975') Task.run: running job 0: createVolume: <bound method StoragePool.createVolume of <vdsm.storage.sp.StoragePool object at 0x7f9d006cfcf8>> (args: ('e9467633-ee31-4e15-b3f8-3812b374c764', '0d048ceb-f6c5-4ec2-a996-19e2769ceb64', 6442450944, 5, 2, 'DATA', '9ca89fa6-ec7d-4adc-a0c8-28cd41014b60', '{"DiskAlias":"fedora-35.raw","DiskDescription":"Uploaded disk"}', '00000000-0000-0000-0000-000000000000', '00000000-0000-0000-0000-000000000000', None, False, True, 0) kwargs: {}) (task:916)

4. Tasks d4b7f25c-135f-43cc-b352-5115cbeab975 finished

2022-02-10 17:22:03,490+0200 DEBUG (tasks/7) [storage.taskmanager.task] (Task='d4b7f25c-135f-43cc-b352-5115cbeab975') Task.run: exit - success: result {'uuid': '9ca89fa6-ec7d-4adc-a0c8-28cd41014b60'} (task:929)

5. Engine polls tasks status

2022-02-10 17:22:11,090+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH getAllTasksStatuses return={'allTasksStatus': {'d4b7f25c-135f-43cc-b352-5115cbeab975': {'taskID': 'd4b7f25c-135f-43cc-b352-5115cbeab975', 'taskState': 'finished', 'taskResult': 'success', 'code': 0, 'message': '1 jobs completed successfully'}}} from=::ffff:192.168.122.10,48130, task_id=f5259469-6b11-4685-93d1-93fdb4c3c0cf (api:54)

6. Engine query volume info

2022-02-10 17:22:17,483+0200 INFO  (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='e9467633-ee31-4e15-b3f8-3812b374c764', spUUID='3eb2b49f-dfdd-4807-90b9-4984ade85b3a', imgUUID='0d048ceb-f6c5-4ec2-a996-19e2769ceb64', volUUID='9ca89fa6-ec7d-4adc-a0c8-28cd41014b60') from=::ffff:192.168.122.10,48130, flow_id=55c8d258-0f5d-4441-a214-67e680869990, task_id=cb93dd91-6fa6-4d56-aa1c-7550625bbca4 (api:48)

7. Volume info complete

2022-02-10 17:22:17,549+0200 DEBUG (jsonrpc/7) [storage.taskmanager.task] (Task='cb93dd91-6fa6-4d56-aa1c-7550625bbca4') finished: {'info': {'uuid': '9ca89fa6-ec7d-4adc-a0c8-28cd41014b60', 'type': 'SPARSE', 'format': 'RAW', 'disktype': 'DATA', 'voltype': 'LEAF', 'capacity': '6442450944', 'parent': '00000000-0000-0000-0000-000000000000', 'description': '{"DiskAlias":"fedora-35.raw","DiskDescription":"Uploaded disk"}', 'pool': '', 'domain': 'e9467633-ee31-4e15-b3f8-3812b374c764', 'image': '0d048ceb-f6c5-4ec2-a996-19e2769ceb64', 'ctime': '1644506523', 'mtime': '0', 'legality': 'LEGAL', 'generation': 0, 'sequence': 0, 'apparentsize': '6442450944', 'truesize': '4096', 'status': 'OK', 'lease': {'path': '/rhev/data-center/mnt/alpine:_01/e9467633-ee31-4e15-b3f8-3812b374c764/images/0d048ceb-f6c5-4ec2-a996-19e2769ceb64/9ca89fa6-ec7d-4adc-a0c8-28cd41014b60.lease', 'offset': 0, 'owners': [], 'version': None}, 'children': []}} (task:1202)

8. Engine query volume info again (bug?)

2022-02-10 17:22:18,595+0200 INFO  (jsonrpc/3) [vdsm.api] START getVolumeInfo(sdUUID='e9467633-ee31-4e15-b3f8-3812b374c764', spUUID='3eb2b49f-dfdd-4807-90b9-4984ade85b3a', imgUUID='0d048ceb-f6c5-4ec2-a996-19e2769ceb64', volUUID='9ca89fa6-ec7d-4adc-a0c8-28cd41014b60') from=::ffff:192.168.122.10,48130, flow_id=55c8d258-0f5d-4441-a214-67e680869990, task_id=773dc71d-43f4-453a-bea0-20c0ab571c6d (api:48)

9. Second volume info completed

2022-02-10 17:22:18,626+0200 DEBUG (jsonrpc/3) [storage.taskmanager.task] (Task='773dc71d-43f4-453a-bea0-20c0ab571c6d') finished: {'info': {'uuid': '9ca89fa6-ec7d-4adc-a0c8-28cd41014b60', 'type': 'SPARSE', 'format': 'RAW', 'disktype': 'DATA', 'voltype': 'LEAF', 'capacity': '6442450944', 'parent': '00000000-0000-0000-0000-000000000000', 'description': '{"DiskAlias":"fedora-35.raw","DiskDescription":"Uploaded disk"}', 'pool': '', 'domain': 'e9467633-ee31-4e15-b3f8-3812b374c764', 'image': '0d048ceb-f6c5-4ec2-a996-19e2769ceb64', 'ctime': '1644506523', 'mtime': '0', 'legality': 'LEGAL', 'generation': 0, 'sequence': 0, 'apparentsize': '6442450944', 'truesize': '4096', 'status': 'OK', 'lease': {'path': '/rhev/data-center/mnt/alpine:_01/e9467633-ee31-4e15-b3f8-3812b374c764/images/0d048ceb-f6c5-4ec2-a996-19e2769ceb64/9ca89fa6-ec7d-4adc-a0c8-28cd41014b60.lease', 'offset': 0, 'owners': [], 'version': None}, 'children': []}} (task:1202)


The events with relative time since the API call:

[  0.000 ] API call
[  0.001 ] Work scheduled to task d4b7f25c-135f-43cc-b352-5115cbeab975
[  0.132 ] Task d4b7f25c-135f-43cc-b352-5115cbeab975 started to run on worker tasks/4
[  0.435 ] Tasks d4b7f25c-135f-43cc-b352-5115cbeab975 finished
[  8.450 ] Engine polls tasks status
[ 14.428 ] Engine query volume info
[ 14.494 ] Volume info complete
[ 15.540 ] Engine query volume info again (bug?)
[ 15.571 ] Second volume info completed


Checking creating a disk on block storage:

1. API call

2022-02-10 18:20:14,917+0200 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] Calling 'Volume.create' in bridge with {'volumeID': 'f0ebf314-f71c-465e-8f9b-d860dc986c95', 'storagepoolID': '3eb2b49f-dfdd-4807-90b9-4984ade85b3a', 'storagedomainID': 'aecec81f-d464-4a35-9a91-6acf2ca4938c', 'imageID': 'c23486fd-0bad-4dc4-8709-713839416d49', 'size': '6442450944', 'volFormat': 4, 'preallocate': 2, 'diskType': 'DATA', 'desc': '{"DiskAlias":"fedora-35.qcow2","DiskDescription":"Uploaded disk"}', 'srcImgUUID': '00000000-0000-0000-0000-000000000000', 'srcVolUUID': '00000000-0000-0000-0000-000000000000', 'initialSize': '918945792', 'addBitmaps': False} (__init__:333)

2. Task completed

2022-02-10 18:20:15,810+0200 INFO  (tasks/9) [storage.threadpool.workerthread] FINISH task d32df63b-263f-44bc-be6e-d65d558758e4 (threadPool:148)

Disk was created in 0.893 seconds.


So we have 3 issues:
- Engine polled the disk status after 8.45 seconds it should poll much
  more often.
- After engine found that the disk was created, it waited 6 seconds
  before querying the volume info.
- Engine queuried volume info twice (bug)

With better polling in engine, we could discover that the disk was
created after one second, and query volume info once, the user could get
a response in 1-2 seconds.

Polling the SPM every second for disk creation can be a too much.
Ideally, vdsm will post events when tasks are completed, so engine does
not have to poll every second. The jobs module in vdsm
(lib/vdsm/jobs.py) already post events when stoarge jobs are completed.

Note that even if we can improve this flow to complete in 1 second,
engine client does not have a way to get events using the API. To make
this really efficient, we need to add a mechanism for getting events.
This can be done with a websocket or using long polling, for example:
https://etcd.io/docs/v2.3/api/#waiting-for-a-change

Comment 2 Nir Soffer 2022-02-10 16:36:56 UTC
Created attachment 1860375 [details]
Vdsm debug logs

Attached vdsm debug log from the uploads mentioned in comment 1.

See the disk-*.log and task-*log for detailed info.

Comment 3 Arik 2022-02-14 15:23:29 UTC
8-16 is not that much compared to the typical time it takes to upload disks in the real world
that said, changing the frequency of async-tasks polling should be possible but a bit risky so need to put more thought on such a change

about getting the volume info twice - this really shouldn't happen, need to take a closer look at the logs

Comment 4 Benny Zlotnik 2022-02-14 17:18:10 UTC
Regarding the redundant getVolumeInfo calls, look like there is one call coming from AddDiskCallback which updates the size, and AddImageFromScratchCommand#endSuccessfully also makes this call. I'll see if the logic from the callback can be moved there instead.

As for the polling, while it is not an optimal solution, we have the AsyncTaskPollingRate config value which controls the polling rate of vdsm tasks and by default it is 10 seconds, so it can be possibly tweaked if needed until/if we have better mechanism implemented

Comment 5 Nir Soffer 2022-02-14 19:31:36 UTC
This is not only the polling - we have 3 issues:
1. Engine detect that the task completed later (slow polling)
2. Engine send the first getVolumeInfo 6 seconds after it found
   that the task was completed - instead of immediatly
3. The second getVolumeInfo call

We cannot change polling to 1 second, but we can use adaptive polling,
for example, wait 1, 1.44, 2, 2.88, 4, ...

The real solution is to sue events on the vdsm side - like we do in 
storage jobs, and consume the events for detecting task completion 
instead of polling. I'm not sure such change is possible now, but 
this how the system should work if we want efficient async tasks.

Comment 6 Arik 2022-02-15 08:47:06 UTC
(In reply to Nir Soffer from comment #5)
> We cannot change polling to 1 second, but we can use adaptive polling,
> for example, wait 1, 1.44, 2, 2.88, 4, ...

As far as I remember, there's no polling thread per task/job but a global one that goes over a collection of tasks/jobs so it's a bit more complicated to change its frequency based on the duration of a task/job
 
> The real solution is to sue events on the vdsm side - like we do in 
> storage jobs, and consume the events for detecting task completion 
> instead of polling. I'm not sure such change is possible now, but 
> this how the system should work if we want efficient async tasks.

Yeah, but events came in later on and the problem with events in general is that you need to support "late joiners" - in this case, if the engine is down when the task/job is completed (and thus wasn't able to retrieve the event), you need it to be able to figure out the state of the task/job when it starts

Comment 7 Arik 2022-02-15 08:54:03 UTC
(In reply to Benny Zlotnik from comment #4)
> Regarding the redundant getVolumeInfo calls, look like there is one call
> coming from AddDiskCallback which updates the size, and
> AddImageFromScratchCommand#endSuccessfully also makes this call. I'll see if
> the logic from the callback can be moved there instead.

+1
That would be nice but I guess the impact won't be that significant so let's keep it in the backlog and get back to it after 4.5

Comment 9 Arik 2022-05-24 15:20:09 UTC
Moved to GitHub: https://github.com/oVirt/ovirt-engine/issues/391


Note You need to log in before you can comment on or make changes to this bug.