Bug 1527050 - [v2v] - ImageIO - Performance - upload disk is slow on a pure 10G environment
Summary: [v2v] - ImageIO - Performance - upload disk is slow on a pure 10G environment
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-imageio
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.2.5
: ---
Assignee: Nir Soffer
QA Contact: guy chen
URL:
Whiteboard:
Depends On: 1529293
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-18 12:41 UTC by Avihai
Modified: 2018-07-31 15:30 UTC (History)
9 users (show)

Fixed In Version: ovirt-imageio-daemon-1.4.0
Clone Of:
: 1591439 (view as bug list)
Environment:
Last Closed: 2018-07-31 15:30:18 UTC
oVirt Team: Scale
Embargoed:
rule-engine: ovirt-4.2+
ylavi: exception+


Attachments (Terms of Use)
engine , vdsm ,image_proxy,image_demon logs (1.09 MB, application/x-gzip)
2017-12-18 12:41 UTC, Avihai
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1591439 0 high CLOSED [RFE] [v2v] - imageio performance - concurrent I/O 2021-02-22 00:41:40 UTC
oVirt gerrit 85567 0 master MERGED directio: Log time we waiting for I/O 2020-06-25 09:19:28 UTC
oVirt gerrit 85720 0 master MERGED Add util.Clock helper 2020-06-25 09:19:27 UTC
oVirt gerrit 92126 0 master MERGED config: Increase buffer size to 8 MiB 2020-06-25 09:19:27 UTC

Internal Links: 1591439

Description Avihai 2017-12-18 12:41:23 UTC
Created attachment 1369430 [details]
engine , vdsm ,image_proxy,image_demon logs

Description of problem:
Uploading a disk on a pure 10G environment (All ports including management are 10GE)is much slower ( ~88Mpbs )compared to non-upload (as dd) write IO speed ~280 -320 MBpb

Version-Release number of selected component (if applicable):
4.2.0.2-0.1.el7

How reproducible:
100%

Setup details:
VDSM Host & Script client is the same !!!  
b01-h19-r620.rhev.openstack.engineering.redhat.com

Engine = b01-h21-r620.rhev.openstack.engineering.redhat.com

Storage = Netapp exposing ISCSI  LUN 172.16.249.62:3260

Steps to Reproduce:
First work with an ALL 10G environment or you will not see this issue!

1.Start upload ( using direct or indirect mode) scripts
2.measure the time for completion - will see it in the script output.


Actual results:
- With indirect script, I get ~50MBps
- With direct mode script, I get ~88MBps.

Expected results:
Utilize a more reasonable 10GB rate closer to at least ~280MBps 

Additional info:

- Running sg_dd from VDSM host to storage yeilds 397MBps.
(root@b01-h19-r620) - (09:30) - (/dev/shm)
-=>>sg_dd if=/dev/shm/10G_file_test of=/dev/dm-57 verbose=1 time=1 bs=1M count=10240 conv=notrunc oflag=direct
 >> Input file type: other (perhaps ordinary file) 
        open input, flags=0x0
 >> Output file type: block device 
        create output, flags=0x4041
time to transfer data: 27.044832 secs at 397.02 MB/sec
10240+0 records in
10240+0 records out

- Running direct script yeilds 88MBps.
Checking the same with the direct script yields 88MBps:
-=>>./upload_script_direct_new.py --direct /dev/shm/10G_file_test
Checking image...
Connecting...
Creating disk...
Creating transfer session...
Uploading image...
Uploaded 8.57%
Uploaded 17.32%
Uploaded 25.95%
Uploaded 34.67%
Uploaded 43.39%
Uploaded 51.98%
Uploaded 60.73%
Uploaded 69.33%
Uploaded 77.86%
Uploaded 86.53%
Uploaded 95.19%
Uploaded 10.00g in 115.74 seconds (88.47m/s)
Finalizing transfer session...
Upload completed successfully

- running indirect script yields ~50MBps.
-=>> ./newer_upload_disk.py
Connecting...
Creating disk...
Creating transfer session...
Uploading image...
Uploaded 48.97%
Uploaded 95.65%
Uploaded 1.00g in 20.99 seconds (48.79m/s)
Finalizing transfer session...
Upload completed successfully

Engine shows some WARNING error's , probably this is not related to the performance issue but here goesq:
2017-12-18 12:30:14,735Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, RemoveImageTicketVDSCommand, return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: 15305769
2017-12-18 12:30:14,735Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Successfully stopped image transfer session for ticket 'fcf98359-0372-4669-8787-310272d3b928'
2017-12-18 12:30:14,785Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VerifyUntrustedVolumeVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, VerifyUntrustedVolumeVDSCommand(HostName = h1, ImageActionsVDSCommandParameters:{hostId='e2d2f21d-54dc-4d35-970f-1ecaa3db633c'}), log id: 422de811
2017-12-18 12:30:15,069Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VerifyUntrustedVolumeVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, VerifyUntrustedVolumeVDSCommand, return: StatusReturn:{status='Status [code=0, message=Done]'}, log id: 422de811
2017-12-18 12:30:15,104Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeLegalityVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, SetVolumeLegalityVDSCommand( SetVolumeLegalityVDSCommandParameters:{storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', ignoreFailoverLimit='false', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: 23a89456
2017-12-18 12:30:15,401Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeLegalityVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, SetVolumeLegalityVDSCommand, log id: 23a89456
2017-12-18 12:30:15,437Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Updating image transfer d162f24f-eb06-4aa6-9037-96420b877037 (image 553f8108-917c-484d-b7ab-61e7445ac37d) phase to Finished Success
2017-12-18 12:30:15,474Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.TeardownImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, TeardownImageVDSCommand(HostName = h1, ImageActionsVDSCommandParameters:{hostId='e2d2f21d-54dc-4d35-970f-1ecaa3db633c'}), log id: 7a59344d
2017-12-18 12:30:15,939Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.TeardownImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, TeardownImageVDSCommand, return: StatusReturn:{status='Status [code=0, message=Done]'}, log id: 7a59344d
2017-12-18 12:30:25,980Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-51) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Transfer was successful. Upload disk '10G_file_test' (id '00000000-0000-0000-0000-000000000000')
2017-12-18 12:30:26,989Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Updating image transfer d162f24f-eb06-4aa6-9037-96420b877037 (image 553f8108-917c-484d-b7ab-61e7445ac37d) phase to Finished Success
2017-12-18 12:30:27,005Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, GetImageInfoVDSCommand( GetImageInfoVDSCommandParameters:{storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', ignoreFailoverLimit='false', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: 5710a60c
2017-12-18 12:30:27,006Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, GetVolumeInfoVDSCommand(HostName = h1, GetVolumeInfoVDSCommandParameters:{hostId='e2d2f21d-54dc-4d35-970f-1ecaa3db633c', storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: d891b67
2017-12-18 12:30:27,517Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@b6d7cb56, log id: d891b67
2017-12-18 12:30:27,517Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@b6d7cb56, log id: 5710a60c
2017-12-18 12:30:27,536Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Successfully transferred disk '00000000-0000-0000-0000-000000000000' (command id 'd162f24f-eb06-4aa6-9037-96420b877037')
2017-12-18 12:30:27,540Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand' successfully.
2017-12-18 12:30:27,544Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, GetImageInfoVDSCommand( GetImageInfoVDSCommandParameters:{storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', ignoreFailoverLimit='false', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: a7c61fd
2017-12-18 12:30:27,545Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, GetVolumeInfoVDSCommand(HostName = h1, GetVolumeInfoVDSCommandParameters:{hostId='e2d2f21d-54dc-4d35-970f-1ecaa3db633c', storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: 9cddad9
2017-12-18 12:30:27,828Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@b6d7cb56, log id: 9cddad9
2017-12-18 12:30:27,828Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@b6d7cb56, log id: a7c61fd
2017-12-18 12:30:27,847Z WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Trying to release a shared lock for key: '553f8108-917c-484d-b7ab-61e7445ac37dDISK' , but lock does not exist
2017-12-18 12:30:27,848Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[553f8108-917c-484d-b7ab-61e7445ac37d=DISK]'}'
2017-12-18 12:30:27,871Z INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] EVENT_ID: TRANSFER_IMAGE_SUCCEEDED(1,032), Image Upload with disk 10G_file_test succeeded.


VDSM log shows some errors (not sure if it's related):

2017-12-18 12:27:03,528+0000 ERROR (periodic/0) [virt.periodic.Operation] <vdsm.virt.sampling.HostMonitor object at 0x3167490> operation failed (periodic:215)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 213, in __call__
    self._func()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 578, in __call__
    stats = hostapi.get_stats(self._cif, self._samples.stats())
  File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 72, in get_stats
    ret.update(cif.mom.getKsmStats())
  File "/usr/lib/python2.7/site-packages/vdsm/momIF.py", line 75, in getKsmStats
    stats = self._mom.getStatistics()['host']
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1306, in single_request
    return self.parse_response(response)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1482, in parse_response
    return u.close()
  File "/usr/lib64/python2.7/xmlrpclib.py", line 794, in close
    raise Fault(**self._stack[0])
Fault: <Fault 1: "<type 'exceptions.AttributeError'>:'NoneType' object has no attribute 'statistics'">

2017-12-18 12:28:19,569+0000 WARN  (tasks/8) [root] File: /rhev/data-center/mnt/blockSD/1304bd0d-8089-4420-9dc2-8da5b6bae1be/images/553f8108-917c-484d-b7ab-61e7445ac37d/607a1f3c-9a27-4e64-ae01-68372e62017a alrea
dy removed (fileutils:51)
2017-12-18 12:28:19,571+0000 INFO  (tasks/8) [storage.Volume] Request to create RAW volume /rhev/data-center/mnt/blockSD/1304bd0d-8089-4420-9dc2-8da5b6bae1be/images/553f8108-917c-484d-b7ab-61e7445ac37d/607a1f3c-
9a27-4e64-ae01-68372e62017a with size = 20971520 sectors (blockVolume:509)
2017-12-18 12:28:19,933+0000 WARN  (tasks/8) [storage.StorageDomainManifest] Could not find mapping for lv 1304bd0d-8089-4420-9dc2-8da5b6bae1be/607a1f3c-9a27-4e64-ae01-68372e62017a (blockSD:864)

Also, it was noticed that CPU utilization is at ~70% running the direct mode script.

Comment 1 Nir Soffer 2017-12-18 16:42:32 UTC
I did some tests on Avihi host:

1. Network bandwidth

I tested using http-bench:
https://github.com/nirs/http-bench

./upload-httplib -b 1024 /dev/shm/10G_file_test https://localhost:8000/
Uploaded 10240.00 MiB in 20.05 seconds (510.78 MiB/s)

2. Storage bandwith

./bench recv --size 10240 -b $((1024*1024)) /dev/1304bd0d-8089-4420-9dc2-8da5b6bae1be/e4bcf31b-a1b3-4846-9e29-7751a4fa252b < /dev/shm/10G_file_test 
[   done  ] 227.70 MiB/s 44.91s

Using 8 MiB buffer, I got 320 MiB/s.

3. On the daemon, even with 8 MiB buffer, we get only 100 MiB/s.

I think the issue is the basic design of the daemon.  We have single thread per
upload, ding a read/write loop - reading chunk from the socket, and writing
the chunk to storage using direct I/O.

So for every chunk (1MiB by default), we have 2 waits:
- wait until we get 1 MiB from the SSL socket
- wait until we write 1 MiB to storage using direct I/O

I think we should have 2 threads:
- reader thread read from the SSL socket and put chunks on a queue
- writer thread get chunks from the queue and write them to the storage

A very short queue (2 chunks?) should eliminate the unwanted latency.

Next steps:
- Collect data in current, how much we wait for storage and network for each
  chunk
- If the data confirm my theory, create a prototype with read/write threads
  for testing possible improvement.
- If we get significant improvement, implement it.

Moving to 4.2.2 since this looks like worthy improvement for 4.2.

Comment 2 Nir Soffer 2017-12-18 19:25:17 UTC
Posted patch is the first step, logging the time spent waiting for storage or 
socket.

Comment 3 Nir Soffer 2017-12-18 21:59:28 UTC
Here are results with my poor iscsi storage (LIO).

Using 2 nested hosts on the same physical machine, and my poor LIO based iscsi
server.

Using http-bench (https://github.com/nirs/http-bench) we get 10g-like network
throughput:

# ./upload-httplib -b 1024 -s 1024 /dev/zero https://voodoo6.tlv.redhat.com:8000/
Uploaded 1024.00 MiB in 1.90 seconds (539.66 MiB/s)


Writing zeros to this storage:


# dd if=/dev/zero of=/dev/6ffbc483-0031-403a-819b-3bb2f0f8de0a/239c6f1e-8d0e-4b06-81a4-ce860b30c896 bs=1M count=1024 oflag=direct status=progress
1030750208 bytes (1.0 GB) copied, 12.433053 s, 82.9 MB/s
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 12.8741 s, 83.4 MB/s


# dd if=/dev/zero of=/dev/6ffbc483-0031-403a-819b-3bb2f0f8de0a/239c6f1e-8d0e-4b06-81a4-ce860b30c896 bs=8M count=128 oflag=direct status=progress
1048576000 bytes (1.0 GB) copied, 11.694136 s, 89.7 MB/s
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 11.9519 s, 89.8 MB/s


# ./bench recv -b $((1024*1024)) -s 1024 /dev/6ffbc483-0031-403a-819b-3bb2f0f8de0a/239c6f1e-8d0e-4b06-81a4-ce860b30c896 </dev/zero
[   done  ] 68.18 MiB/s 15.02s
...
2017-12-18 23:49:03,975 Read 1048576 bytes from socket in 0.000 seconds
2017-12-18 23:49:03,986 Wrote 1048576 bytes to storage in 0.011 seconds
2017-12-18 23:49:03,986 Read 1048576 bytes from socket in 0.000 seconds
2017-12-18 23:49:03,997 Wrote 1048576 bytes to storage in 0.011 seconds
2017-12-18 23:49:03,997 Read 1048576 bytes from socket in 0.000 seconds
2017-12-18 23:49:04,008 Wrote 1048576 bytes to storage in 0.011 seconds
2017-12-18 23:49:04,008 Read 1048576 bytes from socket in 0.000 seconds
2017-12-18 23:49:04,019 Wrote 1048576 bytes to storage in 0.011 seconds
2017-12-18 23:49:04,020 Read 1048576 bytes from socket in 0.000 seconds
2017-12-18 23:49:04,031 Wrote 1048576 bytes to storage in 0.011 seconds


# ./bench recv -b $((8*1024*1024)) -s 1024 /dev/6ffbc483-0031-403a-819b-3bb2f0f8de0a/239c6f1e-8d0e-4b06-81a4-ce860b30c896 </dev/zero
[   done  ] 75.68 MiB/s 13.53s
...
2017-12-18 23:47:24,526 Read 8388608 bytes from socket in 0.001 seconds
2017-12-18 23:47:24,616 Wrote 8388608 bytes to storage in 0.089 seconds
2017-12-18 23:47:24,617 Read 8388608 bytes from socket in 0.001 seconds
2017-12-18 23:47:24,707 Wrote 8388608 bytes to storage in 0.088 seconds
2017-12-18 23:47:24,708 Read 8388608 bytes from socket in 0.001 seconds
2017-12-18 23:47:24,796 Wrote 8388608 bytes to storage in 0.087 seconds
2017-12-18 23:47:24,796 Read 8388608 bytes from socket in 0.001 seconds
2017-12-18 23:47:24,887 Wrote 8388608 bytes to storage in 0.088 seconds
2017-12-18 23:47:24,888 Read 8388608 bytes from socket in 0.001 seconds
2017-12-18 23:47:24,978 Wrote 8388608 bytes to storage in 0.088 seconds


Testing real upload:

# cat /etc/ovirt-imageio-daemon/daemon.conf
[daemon]
buffer_size = 1048576


Client:

Uploaded 1.00g in 17.43 seconds (58.76m/s)

Server:

2017-12-18 21:43:33,581 DEBUG   (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds
2017-12-18 21:43:33,593 DEBUG   (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.012 seconds
2017-12-18 21:43:33,597 DEBUG   (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds
2017-12-18 21:43:33,609 DEBUG   (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.012 seconds
2017-12-18 21:43:33,613 DEBUG   (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds
2017-12-18 21:43:33,625 DEBUG   (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.012 seconds
2017-12-18 21:43:33,629 DEBUG   (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds
2017-12-18 21:43:33,640 DEBUG   (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.011 seconds
2017-12-18 21:43:33,644 DEBUG   (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds
2017-12-18 21:43:33,655 DEBUG   (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.011 seconds
2017-12-18 21:43:33,658 DEBUG   (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds
2017-12-18 21:43:33,670 DEBUG   (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.011 seconds
2017-12-18 21:43:33,673 DEBUG   (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds
2017-12-18 21:43:33,685 DEBUG   (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.011 seconds
2017-12-18 21:43:33,688 DEBUG   (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds
2017-12-18 21:43:33,700 DEBUG   (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.012 seconds

So we read from the socket at 333 MiB/s, and write to storage at 83 MiB/s, but
the upload rate is only 59 MiB/s.


Using 8 MiB buffer size:

# cat /etc/ovirt-imageio-daemon/daemon.conf
[daemon]
buffer_size = 1048576

Client:

Uploaded 1.00g in 17.13 seconds (59.77m/s)

Server:

2017-12-18 21:58:48,308 DEBUG   (Thread-1) [directio] Read 8388608 bytes from socket in 0.023 seconds
2017-12-18 21:58:48,397 DEBUG   (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.088 seconds
2017-12-18 21:58:48,421 DEBUG   (Thread-1) [directio] Read 8388608 bytes from socket in 0.023 seconds
2017-12-18 21:58:48,511 DEBUG   (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.089 seconds
2017-12-18 21:58:48,534 DEBUG   (Thread-1) [directio] Read 8388608 bytes from socket in 0.023 seconds
2017-12-18 21:58:48,624 DEBUG   (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.089 seconds
2017-12-18 21:58:48,646 DEBUG   (Thread-1) [directio] Read 8388608 bytes from socket in 0.021 seconds
2017-12-18 21:58:48,742 DEBUG   (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.095 seconds
2017-12-18 21:58:48,763 DEBUG   (Thread-1) [directio] Read 8388608 bytes from socket in 0.020 seconds
2017-12-18 21:58:48,857 DEBUG   (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.092 seconds
2017-12-18 21:58:48,879 DEBUG   (Thread-1) [directio] Read 8388608 bytes from socket in 0.021 seconds
2017-12-18 21:58:48,998 DEBUG   (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.118 seconds
2017-12-18 21:58:49,018 DEBUG   (Thread-1) [directio] Read 8388608 bytes from socket in 0.019 seconds
2017-12-18 21:58:49,121 DEBUG   (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.102 seconds
2017-12-18 21:58:49,143 DEBUG   (Thread-1) [directio] Read 8388608 bytes from socket in 0.021 seconds
2017-12-18 21:58:49,230 DEBUG   (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.086 seconds
2017-12-18 21:58:49,257 DEBUG   (Thread-1) [directio] Read 8388608 bytes from socket in 0.026 seconds
2017-12-18 21:58:49,344 DEBUG   (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.086 seconds

We read from the socket at 380 MiB/s, and write to storage at 93 MiB/s, but the
upload rate is only 60 MiB/s.

The delay reading data from the socket is 27% of the time writing to storage.
With fast storage, we would spend only half of the time writing.


Need to run same test with fast storage on physical host.

Comment 4 Yaniv Kaul 2017-12-19 07:58:51 UTC
(In reply to Nir Soffer from comment #1)
> I did some tests on Avihi host:
> 
> 1. Network bandwidth
> 
> I tested using http-bench:
> https://github.com/nirs/http-bench
> 
> ./upload-httplib -b 1024 /dev/shm/10G_file_test https://localhost:8000/
> Uploaded 10240.00 MiB in 20.05 seconds (510.78 MiB/s)
> 
> 2. Storage bandwith
> 
> ./bench recv --size 10240 -b $((1024*1024))
> /dev/1304bd0d-8089-4420-9dc2-8da5b6bae1be/e4bcf31b-a1b3-4846-9e29-
> 7751a4fa252b < /dev/shm/10G_file_test 
> [   done  ] 227.70 MiB/s 44.91s
> 
> Using 8 MiB buffer, I got 320 MiB/s.

That's low for XtremIO, w/ large blocks. It should be 1000-1400MBps. Probably the IO is serialized and synchronous, but that's still quite low, worth looking what is the written block size on the XtremIO side (should be 512K).

> 
> 3. On the daemon, even with 8 MiB buffer, we get only 100 MiB/s.

On my nested environment, I get ~200MBps.

> 
> I think the issue is the basic design of the daemon.  We have single thread
> per
> upload, ding a read/write loop - reading chunk from the socket, and writing
> the chunk to storage using direct I/O.
> 
> So for every chunk (1MiB by default), we have 2 waits:
> - wait until we get 1 MiB from the SSL socket
> - wait until we write 1 MiB to storage using direct I/O

Assuming storage is usually (always?) the slowest link, doesn't matter if it's in the source, destination or both:
1. You have to wait for the data from the SSL socket anyway.
2. Not using direct IO would result in faster writes to the destination storage - they'll probably be flushed when we switch back to reading from the SSL socket.

So I'd try with not using direct IO on the destination first.
I'm not so sure why we are so concerned with thrashing the host cache - it's there to be used, not to be bypassed (unless you really need to because you do your own caching, like in databases and such, or virtual machines...), or we worry about multiple readers, etc.

> 
> I think we should have 2 threads:
> - reader thread read from the SSL socket and put chunks on a queue
> - writer thread get chunks from the queue and write them to the storage
> 
> A very short queue (2 chunks?) should eliminate the unwanted latency.
> 
> Next steps:
> - Collect data in current, how much we wait for storage and network for each
>   chunk
> - If the data confirm my theory, create a prototype with read/write threads
>   for testing possible improvement.
> - If we get significant improvement, implement it.
> 
> Moving to 4.2.2 since this looks like worthy improvement for 4.2.

Comment 5 Nir Soffer 2017-12-25 23:41:44 UTC
Raz, can you test again with current master?
http://jenkins.ovirt.org/job/ovirt-imageio_master_build-artifacts-el7-x86_64/173/artifact/exported-artifacts/

We have this log now, showing how we spent time during upload:

INFO   (Thread-1) [directio] Operation stats: <Clock(total=24.44, read=6.66, write=16.24, sync=1.23)>

After you test again with the same file you uploaded before, we also like to run
some tests using http-bench (https://github.com/nirs/http-bench).

Here is how to run the tests:

0. Install requirements

    yum install python-six golang git

1. Get the code

    git clone https://github.com/nirs/http-bench

3. Build the server

    cd http-bench
    make

4. Run the server with the same logical volume used for the upload:

    lvchange -ay vgname/lvname

    ./serve --output /dev/vgname/lvname --blocksize-kb 8192 --stats --direct

Please try also without the --direct flag.

5. From another shell, upload the same file to the test server

    ./upload-httplib -b 128 /dev/shm/10G_file_test https://localhost:8000/

We expect to see much better throughput with the test server.

Note 1: about the sg_dd command in comment 0:

> -=>>sg_dd if=/dev/shm/10G_file_test of=/dev/dm-57 verbose=1 time=1 bs=1M count=10240 conv=notrunc oflag=direct

This command does not sync when the write is complete. This is incorrect and may lead
to data loss if the server fail after the command was completed. If you configure
sg_dd to perform fsync after the write, you will see much lower throughput. I could
not find how to configure sg_dd to fsync.

Note 2: testing with faster storage

Yaniv suggests in comment 4 that we should see 1000-1400 MiB/s on with XTREMIO, but
you tested with NetApp. Can we test again with LUN from XTREMIO?

Note 3: testing with faster host

Uploading from /dev/zero to /dev/null on this host, we get only 500 MiB/s. This is
very low rate. I can get 1500 MiB/s on my 4 years old minidel (i7 4770). Do we have
a faster host that we can test with?

Comment 6 Avihai 2017-12-26 07:26:47 UTC
Eyal ,

1) Please see Nir query's above  & replay.
- Note 2: testing with faster storage - Do you have any faster storage?
- Note 3:  Nir needs a faster host (note3) - Do we have any ?

2) As you can see from the comment above I will need the stand again (or a better one with fatser host/storage - see comments above), please tell me when it's ready for testing.

Comment 7 Nir Soffer 2018-06-14 13:09:30 UTC
Testing on decent host and fast storage show that we have several issues.


1. The default buffer size is not efficient

Here is example results using XtremIO server in the scale lab:

    --------------------------------
    size (MiB)    throughput (MiB/s)
                     iSCSI        FC
    --------------------------------
             1      137.66    211.65
             8      202.93    292.03
            16      246.43    284.98
            24      258.18    314.55
            32      142.84    123.90

With larger buffer size we get decent performance with fast storage.


2. Read and writes are not concurrent.

Here are results from current release (1.3.0):

Client
------
$ ./upload_disk.py --direct /dev/shm/10G.raw
Uploaded 10.00g in 37.32 seconds (274.40m/s)

Server
------
Operation stats: <Clock(total=37.32, read=20.55, write=14.41, sync=0.00)>

write:  710 MiB/s
read:   498 MiB/s
total:  274 MiB/s

Upload throughput is limited by the naive read-write loop - when we read from
the socket, we don't write to storage, and we write to storage we don't read
data from the socket.

We have patches in review showing good improvement:
https://gerrit.ovirt.org/#/q/topic:cio+project:ovirt-imageio+is:open

Client
------
$ ./upload_disk.py --direct /dev/shm/10G.raw
Uploaded 10.00g in 23.29 seconds (439.73m/s)

Server
------
Operation stats: <Clock(total=23.28, read=23.21, write=14.56, sync=0.00)>

write:  703 MiB/s
read:   441 MiB/s
total:  439 MiB/s

Upload throughput is limited by reading from SSL socket.


3. Reading from SSL socket is a bottleneck

ovirt-imageio 1.4.0 supports now upload via unix socket. Testing show that 
this give significant increase in throughput, and decrease CPU usage:

Here is an example upload using HTTPS:
    
    Client:
    $ ./upload_disk.py --direct /dev/shm/10G.raw
    Uploaded 10.00g in 37.38 seconds (273.95m/s)

    Server operation stats:
    <Clock(total=37.38, read=20.67, write=14.48, sync=0.00)>
    
    Server CPU usage: 57%
    
Same upload using unix socket:
    
    Client:
    $ ./upload_disk_unix.py --direct /dev/shm/10GB.raw
    Uploaded 10.00g in 21.18 seconds (483.42m/s)

    Server operation stats:
    <Clock(total=21.18, read=4.99, write=14.69, sync=0.00)>
    
    Server CPU usage: 35%
 

Since the patches are big and it is too late to include them in 4.2.4, we will
limit this bug to first issue, and file another bug for the concurrency work.

Comment 8 Sandro Bonazzola 2018-06-15 09:33:57 UTC
moving back to post for visibility since referenced patch https://gerrit.ovirt.org/#/c/91645/ is still open.
If referenced patch is not relevant, please remove from external trakers.

I understood this will be split considering to fix only part of the issue here.
can you please update fixed in version field with the version including the fix for this bug?
Can you also add a reference to the splitted part bug # in See also field?

Comment 9 Nir Soffer 2018-06-15 13:39:33 UTC
(In reply to Sandro Bonazzola from comment #8)
Thanks for alerting.

> moving back to post for visibility since referenced patch
> https://gerrit.ovirt.org/#/c/91645/ is still open.
> If referenced patch is not relevant, please remove from external trakers.

Removed.

> I understood this will be split considering to fix only part of the issue
> here.
> can you please update fixed in version field with the version including the
> fix for this bug?

Added.

> Can you also add a reference to the splitted part bug # in See also field?

Sure, the bug 1591439 is tracking improving concurrency.

Comment 11 Sandro Bonazzola 2018-07-31 15:30:18 UTC
This bugzilla is included in oVirt 4.2.5 release, published on July 30th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.