Created attachment 1369430 [details] engine , vdsm ,image_proxy,image_demon logs Description of problem: Uploading a disk on a pure 10G environment (All ports including management are 10GE)is much slower ( ~88Mpbs )compared to non-upload (as dd) write IO speed ~280 -320 MBpb Version-Release number of selected component (if applicable): 4.2.0.2-0.1.el7 How reproducible: 100% Setup details: VDSM Host & Script client is the same !!! b01-h19-r620.rhev.openstack.engineering.redhat.com Engine = b01-h21-r620.rhev.openstack.engineering.redhat.com Storage = Netapp exposing ISCSI LUN 172.16.249.62:3260 Steps to Reproduce: First work with an ALL 10G environment or you will not see this issue! 1.Start upload ( using direct or indirect mode) scripts 2.measure the time for completion - will see it in the script output. Actual results: - With indirect script, I get ~50MBps - With direct mode script, I get ~88MBps. Expected results: Utilize a more reasonable 10GB rate closer to at least ~280MBps Additional info: - Running sg_dd from VDSM host to storage yeilds 397MBps. (root@b01-h19-r620) - (09:30) - (/dev/shm) -=>>sg_dd if=/dev/shm/10G_file_test of=/dev/dm-57 verbose=1 time=1 bs=1M count=10240 conv=notrunc oflag=direct >> Input file type: other (perhaps ordinary file) open input, flags=0x0 >> Output file type: block device create output, flags=0x4041 time to transfer data: 27.044832 secs at 397.02 MB/sec 10240+0 records in 10240+0 records out - Running direct script yeilds 88MBps. Checking the same with the direct script yields 88MBps: -=>>./upload_script_direct_new.py --direct /dev/shm/10G_file_test Checking image... Connecting... Creating disk... Creating transfer session... Uploading image... Uploaded 8.57% Uploaded 17.32% Uploaded 25.95% Uploaded 34.67% Uploaded 43.39% Uploaded 51.98% Uploaded 60.73% Uploaded 69.33% Uploaded 77.86% Uploaded 86.53% Uploaded 95.19% Uploaded 10.00g in 115.74 seconds (88.47m/s) Finalizing transfer session... Upload completed successfully - running indirect script yields ~50MBps. -=>> ./newer_upload_disk.py Connecting... Creating disk... Creating transfer session... Uploading image... Uploaded 48.97% Uploaded 95.65% Uploaded 1.00g in 20.99 seconds (48.79m/s) Finalizing transfer session... Upload completed successfully Engine shows some WARNING error's , probably this is not related to the performance issue but here goesq: 2017-12-18 12:30:14,735Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, RemoveImageTicketVDSCommand, return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: 15305769 2017-12-18 12:30:14,735Z INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Successfully stopped image transfer session for ticket 'fcf98359-0372-4669-8787-310272d3b928' 2017-12-18 12:30:14,785Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VerifyUntrustedVolumeVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, VerifyUntrustedVolumeVDSCommand(HostName = h1, ImageActionsVDSCommandParameters:{hostId='e2d2f21d-54dc-4d35-970f-1ecaa3db633c'}), log id: 422de811 2017-12-18 12:30:15,069Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VerifyUntrustedVolumeVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, VerifyUntrustedVolumeVDSCommand, return: StatusReturn:{status='Status [code=0, message=Done]'}, log id: 422de811 2017-12-18 12:30:15,104Z INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeLegalityVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, SetVolumeLegalityVDSCommand( SetVolumeLegalityVDSCommandParameters:{storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', ignoreFailoverLimit='false', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: 23a89456 2017-12-18 12:30:15,401Z INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeLegalityVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, SetVolumeLegalityVDSCommand, log id: 23a89456 2017-12-18 12:30:15,437Z INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Updating image transfer d162f24f-eb06-4aa6-9037-96420b877037 (image 553f8108-917c-484d-b7ab-61e7445ac37d) phase to Finished Success 2017-12-18 12:30:15,474Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.TeardownImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, TeardownImageVDSCommand(HostName = h1, ImageActionsVDSCommandParameters:{hostId='e2d2f21d-54dc-4d35-970f-1ecaa3db633c'}), log id: 7a59344d 2017-12-18 12:30:15,939Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.TeardownImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, TeardownImageVDSCommand, return: StatusReturn:{status='Status [code=0, message=Done]'}, log id: 7a59344d 2017-12-18 12:30:25,980Z INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-51) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Transfer was successful. Upload disk '10G_file_test' (id '00000000-0000-0000-0000-000000000000') 2017-12-18 12:30:26,989Z INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Updating image transfer d162f24f-eb06-4aa6-9037-96420b877037 (image 553f8108-917c-484d-b7ab-61e7445ac37d) phase to Finished Success 2017-12-18 12:30:27,005Z INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, GetImageInfoVDSCommand( GetImageInfoVDSCommandParameters:{storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', ignoreFailoverLimit='false', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: 5710a60c 2017-12-18 12:30:27,006Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, GetVolumeInfoVDSCommand(HostName = h1, GetVolumeInfoVDSCommandParameters:{hostId='e2d2f21d-54dc-4d35-970f-1ecaa3db633c', storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: d891b67 2017-12-18 12:30:27,517Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@b6d7cb56, log id: d891b67 2017-12-18 12:30:27,517Z INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@b6d7cb56, log id: 5710a60c 2017-12-18 12:30:27,536Z INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Successfully transferred disk '00000000-0000-0000-0000-000000000000' (command id 'd162f24f-eb06-4aa6-9037-96420b877037') 2017-12-18 12:30:27,540Z INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand' successfully. 2017-12-18 12:30:27,544Z INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, GetImageInfoVDSCommand( GetImageInfoVDSCommandParameters:{storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', ignoreFailoverLimit='false', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: a7c61fd 2017-12-18 12:30:27,545Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] START, GetVolumeInfoVDSCommand(HostName = h1, GetVolumeInfoVDSCommandParameters:{hostId='e2d2f21d-54dc-4d35-970f-1ecaa3db633c', storagePoolId='a418b8ac-77bf-4e3d-9377-d993c9af2c25', storageDomainId='1304bd0d-8089-4420-9dc2-8da5b6bae1be', imageGroupId='553f8108-917c-484d-b7ab-61e7445ac37d', imageId='607a1f3c-9a27-4e64-ae01-68372e62017a'}), log id: 9cddad9 2017-12-18 12:30:27,828Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@b6d7cb56, log id: 9cddad9 2017-12-18 12:30:27,828Z INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@b6d7cb56, log id: a7c61fd 2017-12-18 12:30:27,847Z WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Trying to release a shared lock for key: '553f8108-917c-484d-b7ab-61e7445ac37dDISK' , but lock does not exist 2017-12-18 12:30:27,848Z INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[553f8108-917c-484d-b7ab-61e7445ac37d=DISK]'}' 2017-12-18 12:30:27,871Z INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [45108bb5-eb8e-4fe0-a6b7-eddf5a79b312] EVENT_ID: TRANSFER_IMAGE_SUCCEEDED(1,032), Image Upload with disk 10G_file_test succeeded. VDSM log shows some errors (not sure if it's related): 2017-12-18 12:27:03,528+0000 ERROR (periodic/0) [virt.periodic.Operation] <vdsm.virt.sampling.HostMonitor object at 0x3167490> operation failed (periodic:215) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 213, in __call__ self._func() File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 578, in __call__ stats = hostapi.get_stats(self._cif, self._samples.stats()) File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 72, in get_stats ret.update(cif.mom.getKsmStats()) File "/usr/lib/python2.7/site-packages/vdsm/momIF.py", line 75, in getKsmStats stats = self._mom.getStatistics()['host'] File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request verbose=self.__verbose File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib64/python2.7/xmlrpclib.py", line 1306, in single_request return self.parse_response(response) File "/usr/lib64/python2.7/xmlrpclib.py", line 1482, in parse_response return u.close() File "/usr/lib64/python2.7/xmlrpclib.py", line 794, in close raise Fault(**self._stack[0]) Fault: <Fault 1: "<type 'exceptions.AttributeError'>:'NoneType' object has no attribute 'statistics'"> 2017-12-18 12:28:19,569+0000 WARN (tasks/8) [root] File: /rhev/data-center/mnt/blockSD/1304bd0d-8089-4420-9dc2-8da5b6bae1be/images/553f8108-917c-484d-b7ab-61e7445ac37d/607a1f3c-9a27-4e64-ae01-68372e62017a alrea dy removed (fileutils:51) 2017-12-18 12:28:19,571+0000 INFO (tasks/8) [storage.Volume] Request to create RAW volume /rhev/data-center/mnt/blockSD/1304bd0d-8089-4420-9dc2-8da5b6bae1be/images/553f8108-917c-484d-b7ab-61e7445ac37d/607a1f3c- 9a27-4e64-ae01-68372e62017a with size = 20971520 sectors (blockVolume:509) 2017-12-18 12:28:19,933+0000 WARN (tasks/8) [storage.StorageDomainManifest] Could not find mapping for lv 1304bd0d-8089-4420-9dc2-8da5b6bae1be/607a1f3c-9a27-4e64-ae01-68372e62017a (blockSD:864) Also, it was noticed that CPU utilization is at ~70% running the direct mode script.
I did some tests on Avihi host: 1. Network bandwidth I tested using http-bench: https://github.com/nirs/http-bench ./upload-httplib -b 1024 /dev/shm/10G_file_test https://localhost:8000/ Uploaded 10240.00 MiB in 20.05 seconds (510.78 MiB/s) 2. Storage bandwith ./bench recv --size 10240 -b $((1024*1024)) /dev/1304bd0d-8089-4420-9dc2-8da5b6bae1be/e4bcf31b-a1b3-4846-9e29-7751a4fa252b < /dev/shm/10G_file_test [ done ] 227.70 MiB/s 44.91s Using 8 MiB buffer, I got 320 MiB/s. 3. On the daemon, even with 8 MiB buffer, we get only 100 MiB/s. I think the issue is the basic design of the daemon. We have single thread per upload, ding a read/write loop - reading chunk from the socket, and writing the chunk to storage using direct I/O. So for every chunk (1MiB by default), we have 2 waits: - wait until we get 1 MiB from the SSL socket - wait until we write 1 MiB to storage using direct I/O I think we should have 2 threads: - reader thread read from the SSL socket and put chunks on a queue - writer thread get chunks from the queue and write them to the storage A very short queue (2 chunks?) should eliminate the unwanted latency. Next steps: - Collect data in current, how much we wait for storage and network for each chunk - If the data confirm my theory, create a prototype with read/write threads for testing possible improvement. - If we get significant improvement, implement it. Moving to 4.2.2 since this looks like worthy improvement for 4.2.
Posted patch is the first step, logging the time spent waiting for storage or socket.
Here are results with my poor iscsi storage (LIO). Using 2 nested hosts on the same physical machine, and my poor LIO based iscsi server. Using http-bench (https://github.com/nirs/http-bench) we get 10g-like network throughput: # ./upload-httplib -b 1024 -s 1024 /dev/zero https://voodoo6.tlv.redhat.com:8000/ Uploaded 1024.00 MiB in 1.90 seconds (539.66 MiB/s) Writing zeros to this storage: # dd if=/dev/zero of=/dev/6ffbc483-0031-403a-819b-3bb2f0f8de0a/239c6f1e-8d0e-4b06-81a4-ce860b30c896 bs=1M count=1024 oflag=direct status=progress 1030750208 bytes (1.0 GB) copied, 12.433053 s, 82.9 MB/s 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 12.8741 s, 83.4 MB/s # dd if=/dev/zero of=/dev/6ffbc483-0031-403a-819b-3bb2f0f8de0a/239c6f1e-8d0e-4b06-81a4-ce860b30c896 bs=8M count=128 oflag=direct status=progress 1048576000 bytes (1.0 GB) copied, 11.694136 s, 89.7 MB/s 128+0 records in 128+0 records out 1073741824 bytes (1.1 GB) copied, 11.9519 s, 89.8 MB/s # ./bench recv -b $((1024*1024)) -s 1024 /dev/6ffbc483-0031-403a-819b-3bb2f0f8de0a/239c6f1e-8d0e-4b06-81a4-ce860b30c896 </dev/zero [ done ] 68.18 MiB/s 15.02s ... 2017-12-18 23:49:03,975 Read 1048576 bytes from socket in 0.000 seconds 2017-12-18 23:49:03,986 Wrote 1048576 bytes to storage in 0.011 seconds 2017-12-18 23:49:03,986 Read 1048576 bytes from socket in 0.000 seconds 2017-12-18 23:49:03,997 Wrote 1048576 bytes to storage in 0.011 seconds 2017-12-18 23:49:03,997 Read 1048576 bytes from socket in 0.000 seconds 2017-12-18 23:49:04,008 Wrote 1048576 bytes to storage in 0.011 seconds 2017-12-18 23:49:04,008 Read 1048576 bytes from socket in 0.000 seconds 2017-12-18 23:49:04,019 Wrote 1048576 bytes to storage in 0.011 seconds 2017-12-18 23:49:04,020 Read 1048576 bytes from socket in 0.000 seconds 2017-12-18 23:49:04,031 Wrote 1048576 bytes to storage in 0.011 seconds # ./bench recv -b $((8*1024*1024)) -s 1024 /dev/6ffbc483-0031-403a-819b-3bb2f0f8de0a/239c6f1e-8d0e-4b06-81a4-ce860b30c896 </dev/zero [ done ] 75.68 MiB/s 13.53s ... 2017-12-18 23:47:24,526 Read 8388608 bytes from socket in 0.001 seconds 2017-12-18 23:47:24,616 Wrote 8388608 bytes to storage in 0.089 seconds 2017-12-18 23:47:24,617 Read 8388608 bytes from socket in 0.001 seconds 2017-12-18 23:47:24,707 Wrote 8388608 bytes to storage in 0.088 seconds 2017-12-18 23:47:24,708 Read 8388608 bytes from socket in 0.001 seconds 2017-12-18 23:47:24,796 Wrote 8388608 bytes to storage in 0.087 seconds 2017-12-18 23:47:24,796 Read 8388608 bytes from socket in 0.001 seconds 2017-12-18 23:47:24,887 Wrote 8388608 bytes to storage in 0.088 seconds 2017-12-18 23:47:24,888 Read 8388608 bytes from socket in 0.001 seconds 2017-12-18 23:47:24,978 Wrote 8388608 bytes to storage in 0.088 seconds Testing real upload: # cat /etc/ovirt-imageio-daemon/daemon.conf [daemon] buffer_size = 1048576 Client: Uploaded 1.00g in 17.43 seconds (58.76m/s) Server: 2017-12-18 21:43:33,581 DEBUG (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds 2017-12-18 21:43:33,593 DEBUG (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.012 seconds 2017-12-18 21:43:33,597 DEBUG (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds 2017-12-18 21:43:33,609 DEBUG (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.012 seconds 2017-12-18 21:43:33,613 DEBUG (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds 2017-12-18 21:43:33,625 DEBUG (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.012 seconds 2017-12-18 21:43:33,629 DEBUG (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds 2017-12-18 21:43:33,640 DEBUG (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.011 seconds 2017-12-18 21:43:33,644 DEBUG (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds 2017-12-18 21:43:33,655 DEBUG (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.011 seconds 2017-12-18 21:43:33,658 DEBUG (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds 2017-12-18 21:43:33,670 DEBUG (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.011 seconds 2017-12-18 21:43:33,673 DEBUG (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds 2017-12-18 21:43:33,685 DEBUG (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.011 seconds 2017-12-18 21:43:33,688 DEBUG (Thread-1) [directio] Read 1048576 bytes from socket in 0.003 seconds 2017-12-18 21:43:33,700 DEBUG (Thread-1) [directio] Wrote 1048576 bytes to storage in 0.012 seconds So we read from the socket at 333 MiB/s, and write to storage at 83 MiB/s, but the upload rate is only 59 MiB/s. Using 8 MiB buffer size: # cat /etc/ovirt-imageio-daemon/daemon.conf [daemon] buffer_size = 1048576 Client: Uploaded 1.00g in 17.13 seconds (59.77m/s) Server: 2017-12-18 21:58:48,308 DEBUG (Thread-1) [directio] Read 8388608 bytes from socket in 0.023 seconds 2017-12-18 21:58:48,397 DEBUG (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.088 seconds 2017-12-18 21:58:48,421 DEBUG (Thread-1) [directio] Read 8388608 bytes from socket in 0.023 seconds 2017-12-18 21:58:48,511 DEBUG (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.089 seconds 2017-12-18 21:58:48,534 DEBUG (Thread-1) [directio] Read 8388608 bytes from socket in 0.023 seconds 2017-12-18 21:58:48,624 DEBUG (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.089 seconds 2017-12-18 21:58:48,646 DEBUG (Thread-1) [directio] Read 8388608 bytes from socket in 0.021 seconds 2017-12-18 21:58:48,742 DEBUG (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.095 seconds 2017-12-18 21:58:48,763 DEBUG (Thread-1) [directio] Read 8388608 bytes from socket in 0.020 seconds 2017-12-18 21:58:48,857 DEBUG (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.092 seconds 2017-12-18 21:58:48,879 DEBUG (Thread-1) [directio] Read 8388608 bytes from socket in 0.021 seconds 2017-12-18 21:58:48,998 DEBUG (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.118 seconds 2017-12-18 21:58:49,018 DEBUG (Thread-1) [directio] Read 8388608 bytes from socket in 0.019 seconds 2017-12-18 21:58:49,121 DEBUG (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.102 seconds 2017-12-18 21:58:49,143 DEBUG (Thread-1) [directio] Read 8388608 bytes from socket in 0.021 seconds 2017-12-18 21:58:49,230 DEBUG (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.086 seconds 2017-12-18 21:58:49,257 DEBUG (Thread-1) [directio] Read 8388608 bytes from socket in 0.026 seconds 2017-12-18 21:58:49,344 DEBUG (Thread-1) [directio] Wrote 8388608 bytes to storage in 0.086 seconds We read from the socket at 380 MiB/s, and write to storage at 93 MiB/s, but the upload rate is only 60 MiB/s. The delay reading data from the socket is 27% of the time writing to storage. With fast storage, we would spend only half of the time writing. Need to run same test with fast storage on physical host.
(In reply to Nir Soffer from comment #1) > I did some tests on Avihi host: > > 1. Network bandwidth > > I tested using http-bench: > https://github.com/nirs/http-bench > > ./upload-httplib -b 1024 /dev/shm/10G_file_test https://localhost:8000/ > Uploaded 10240.00 MiB in 20.05 seconds (510.78 MiB/s) > > 2. Storage bandwith > > ./bench recv --size 10240 -b $((1024*1024)) > /dev/1304bd0d-8089-4420-9dc2-8da5b6bae1be/e4bcf31b-a1b3-4846-9e29- > 7751a4fa252b < /dev/shm/10G_file_test > [ done ] 227.70 MiB/s 44.91s > > Using 8 MiB buffer, I got 320 MiB/s. That's low for XtremIO, w/ large blocks. It should be 1000-1400MBps. Probably the IO is serialized and synchronous, but that's still quite low, worth looking what is the written block size on the XtremIO side (should be 512K). > > 3. On the daemon, even with 8 MiB buffer, we get only 100 MiB/s. On my nested environment, I get ~200MBps. > > I think the issue is the basic design of the daemon. We have single thread > per > upload, ding a read/write loop - reading chunk from the socket, and writing > the chunk to storage using direct I/O. > > So for every chunk (1MiB by default), we have 2 waits: > - wait until we get 1 MiB from the SSL socket > - wait until we write 1 MiB to storage using direct I/O Assuming storage is usually (always?) the slowest link, doesn't matter if it's in the source, destination or both: 1. You have to wait for the data from the SSL socket anyway. 2. Not using direct IO would result in faster writes to the destination storage - they'll probably be flushed when we switch back to reading from the SSL socket. So I'd try with not using direct IO on the destination first. I'm not so sure why we are so concerned with thrashing the host cache - it's there to be used, not to be bypassed (unless you really need to because you do your own caching, like in databases and such, or virtual machines...), or we worry about multiple readers, etc. > > I think we should have 2 threads: > - reader thread read from the SSL socket and put chunks on a queue > - writer thread get chunks from the queue and write them to the storage > > A very short queue (2 chunks?) should eliminate the unwanted latency. > > Next steps: > - Collect data in current, how much we wait for storage and network for each > chunk > - If the data confirm my theory, create a prototype with read/write threads > for testing possible improvement. > - If we get significant improvement, implement it. > > Moving to 4.2.2 since this looks like worthy improvement for 4.2.
Raz, can you test again with current master? http://jenkins.ovirt.org/job/ovirt-imageio_master_build-artifacts-el7-x86_64/173/artifact/exported-artifacts/ We have this log now, showing how we spent time during upload: INFO (Thread-1) [directio] Operation stats: <Clock(total=24.44, read=6.66, write=16.24, sync=1.23)> After you test again with the same file you uploaded before, we also like to run some tests using http-bench (https://github.com/nirs/http-bench). Here is how to run the tests: 0. Install requirements yum install python-six golang git 1. Get the code git clone https://github.com/nirs/http-bench 3. Build the server cd http-bench make 4. Run the server with the same logical volume used for the upload: lvchange -ay vgname/lvname ./serve --output /dev/vgname/lvname --blocksize-kb 8192 --stats --direct Please try also without the --direct flag. 5. From another shell, upload the same file to the test server ./upload-httplib -b 128 /dev/shm/10G_file_test https://localhost:8000/ We expect to see much better throughput with the test server. Note 1: about the sg_dd command in comment 0: > -=>>sg_dd if=/dev/shm/10G_file_test of=/dev/dm-57 verbose=1 time=1 bs=1M count=10240 conv=notrunc oflag=direct This command does not sync when the write is complete. This is incorrect and may lead to data loss if the server fail after the command was completed. If you configure sg_dd to perform fsync after the write, you will see much lower throughput. I could not find how to configure sg_dd to fsync. Note 2: testing with faster storage Yaniv suggests in comment 4 that we should see 1000-1400 MiB/s on with XTREMIO, but you tested with NetApp. Can we test again with LUN from XTREMIO? Note 3: testing with faster host Uploading from /dev/zero to /dev/null on this host, we get only 500 MiB/s. This is very low rate. I can get 1500 MiB/s on my 4 years old minidel (i7 4770). Do we have a faster host that we can test with?
Eyal , 1) Please see Nir query's above & replay. - Note 2: testing with faster storage - Do you have any faster storage? - Note 3: Nir needs a faster host (note3) - Do we have any ? 2) As you can see from the comment above I will need the stand again (or a better one with fatser host/storage - see comments above), please tell me when it's ready for testing.
Testing on decent host and fast storage show that we have several issues. 1. The default buffer size is not efficient Here is example results using XtremIO server in the scale lab: -------------------------------- size (MiB) throughput (MiB/s) iSCSI FC -------------------------------- 1 137.66 211.65 8 202.93 292.03 16 246.43 284.98 24 258.18 314.55 32 142.84 123.90 With larger buffer size we get decent performance with fast storage. 2. Read and writes are not concurrent. Here are results from current release (1.3.0): Client ------ $ ./upload_disk.py --direct /dev/shm/10G.raw Uploaded 10.00g in 37.32 seconds (274.40m/s) Server ------ Operation stats: <Clock(total=37.32, read=20.55, write=14.41, sync=0.00)> write: 710 MiB/s read: 498 MiB/s total: 274 MiB/s Upload throughput is limited by the naive read-write loop - when we read from the socket, we don't write to storage, and we write to storage we don't read data from the socket. We have patches in review showing good improvement: https://gerrit.ovirt.org/#/q/topic:cio+project:ovirt-imageio+is:open Client ------ $ ./upload_disk.py --direct /dev/shm/10G.raw Uploaded 10.00g in 23.29 seconds (439.73m/s) Server ------ Operation stats: <Clock(total=23.28, read=23.21, write=14.56, sync=0.00)> write: 703 MiB/s read: 441 MiB/s total: 439 MiB/s Upload throughput is limited by reading from SSL socket. 3. Reading from SSL socket is a bottleneck ovirt-imageio 1.4.0 supports now upload via unix socket. Testing show that this give significant increase in throughput, and decrease CPU usage: Here is an example upload using HTTPS: Client: $ ./upload_disk.py --direct /dev/shm/10G.raw Uploaded 10.00g in 37.38 seconds (273.95m/s) Server operation stats: <Clock(total=37.38, read=20.67, write=14.48, sync=0.00)> Server CPU usage: 57% Same upload using unix socket: Client: $ ./upload_disk_unix.py --direct /dev/shm/10GB.raw Uploaded 10.00g in 21.18 seconds (483.42m/s) Server operation stats: <Clock(total=21.18, read=4.99, write=14.69, sync=0.00)> Server CPU usage: 35% Since the patches are big and it is too late to include them in 4.2.4, we will limit this bug to first issue, and file another bug for the concurrency work.
moving back to post for visibility since referenced patch https://gerrit.ovirt.org/#/c/91645/ is still open. If referenced patch is not relevant, please remove from external trakers. I understood this will be split considering to fix only part of the issue here. can you please update fixed in version field with the version including the fix for this bug? Can you also add a reference to the splitted part bug # in See also field?
(In reply to Sandro Bonazzola from comment #8) Thanks for alerting. > moving back to post for visibility since referenced patch > https://gerrit.ovirt.org/#/c/91645/ is still open. > If referenced patch is not relevant, please remove from external trakers. Removed. > I understood this will be split considering to fix only part of the issue > here. > can you please update fixed in version field with the version including the > fix for this bug? Added. > Can you also add a reference to the splitted part bug # in See also field? Sure, the bug 1591439 is tracking improving concurrency.
This bugzilla is included in oVirt 4.2.5 release, published on July 30th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.