Bug 1591534
| Summary: | [RFE] [v2v] imageio performance - support keep alive connections | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-imageio | Reporter: | Nir Soffer <nsoffer> | ||||
| Component: | Common | Assignee: | Nir Soffer <nsoffer> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | guy chen <guchen> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 1.2.2 | CC: | bugs, dagur, dfediuck, mtessun, nsoffer, ratamir, tnisan, ylavi | ||||
| Target Milestone: | ovirt-4.2.5 | Keywords: | FutureFeature, Performance | ||||
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.2+
mtessun: planning_ack+ rule-engine: devel_ack+ ratamir: testing_ack+ |
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-08-10 12:34:52 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Nir Soffer
2018-06-15 00:32:47 UTC
I got keep alive connections working with python builtin WSGI server. The patch is not ready yet, but the results are very interesting. ## https Without keep alive connections: # time ./upload RHEL7_1-001.raw https://host:54322/images/test real 1m2.492s user 0m16.232s sys 0m5.619s With keep alive connections: # time -p ./upload RHEL7_1-001.raw https://host:54322/images/test real 29.40 user 7.24 sys 1.58 ## unix socket Without keep alive connections: # time ./upload RHEL7_1-001.raw https://host:54322/images/test real 0m29.474s user 0m4.796s sys 0m2.280s With keep alive connections # time -p ./upload RHEL7_1-001.raw https://host:54322/images/test real 25.82 user 4.40 sys 1.65 ## Profiles with keep alive connections # profile-stats -c small-requests/upload-keep-alive-https-cpu.prof Sun Jun 17 21:18:26 2018 small-requests/upload-keep-alive-https-cpu.prof 9542164 function calls (9557434 primitive calls) in 27.672 seconds Ordered by: internal time List reduced from 378 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 105279 1.621 0.000 1.621 0.000 __builtin__:0(<method 'read' of '_ssl._SSLSocket' objects>) 165283 0.891 0.000 0.891 0.000 __builtin__:0(<method 'write' of 'cStringIO.StringO' objects>) 5139 0.735 0.000 0.735 0.000 __builtin__:0(<method 'write' of 'mmap.mmap' objects>) 10477 0.712 0.000 0.712 0.000 __builtin__:0(<method 'write' of '_io.FileIO' objects>) 30509 0.569 0.000 1.183 0.000 __init__.py:242(LogRecord.__init__) 66103 0.564 0.000 1.214 0.000 socket.py:406(_fileobject.readline) 4133 0.543 0.000 0.543 0.000 __builtin__:0(<method 'getvalue' of 'cStringIO.StringO' objects>) 40677 0.493 0.000 0.493 0.000 __builtin__:0(<method 'write' of '_ssl._SSLSocket' objects>) 61018 0.461 0.000 2.475 0.000 __init__.py:451(Formatter.format) 61018 0.444 0.000 0.444 0.000 time:0(localtime) 10170 0.420 0.000 1.567 0.000 rfc822.py:122(Message.readheaders) 10223 0.396 0.000 3.858 0.000 socket.py:336(_fileobject.read) 10167 0.368 0.000 0.368 0.000 __builtin__:0(<method 'close' of 'mmap.mmap' objects>) 61018 0.362 0.000 0.537 0.000 threading.py:186(_RLock.release) 30509 0.354 0.000 0.354 0.000 __builtin__:0(<method 'flush' of 'file' objects>) 5084 0.338 0.000 5.938 0.001 directio.py:168(Receive._run) 61018 0.330 0.000 0.941 0.000 __init__.py:311(LogRecord.getMessage) 61018 0.304 0.000 0.920 0.000 __init__.py:404(Formatter.formatTime) 218635 0.299 0.000 0.420 0.000 descriptors.py:47(fget) 473237 0.276 0.000 0.276 0.000 __builtin__:0(<method 'get' of 'dict' objects>) We can see that the SSL hanshake is gone (looking at the bottom of the profile we can find only 2 calls). $ profile-stats -c -s cumtime small-requests/upload-keep-alive-https-cpu.prof Sun Jun 17 21:18:26 2018 small-requests/upload-keep-alive-https-cpu.prof 9542164 function calls (9557434 primitive calls) in 27.672 seconds Ordered by: cumulative time List reduced from 378 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 2/3 0.000 0.000 27.658 9.219 threading.py:754(Thread.run) 2/3 0.000 0.000 27.658 9.219 SocketServer.py:586(WSGIServer.process_request_thread) 2/3 0.000 0.000 27.658 9.219 SocketServer.py:332(WSGIServer.finish_request) 2/3 0.000 0.000 27.658 9.219 SocketServer.py:643(WSGIRequestHandler.__init__) 2/3 0.028 0.014 27.657 9.219 wsgi.py:97(WSGIRequestHandler.handle) 10172/10173 0.150 0.000 27.629 0.003 wsgi.py:48(UnixWSGIRequestHandler.handle_one_request) 10169/10170 0.062 0.000 24.212 0.002 handlers.py:76(ServerHandler.run) 10169/10170 0.171 0.000 18.208 0.002 web.py:92(Application.__call__) 10169/10170 0.166 0.000 14.182 0.001 web.py:111(Application.dispatch) 5084 0.126 0.000 8.651 0.002 server.py:229(Images.put) 30509 0.179 0.000 7.489 0.000 __init__.py:1139(Logger.info) 10168 0.043 0.000 7.466 0.001 tickets.py:92(Ticket.run) 10168 0.048 0.000 7.300 0.001 directio.py:76(Receive.run) 30509 0.143 0.000 7.216 0.000 __init__.py:1249(Logger._log) 5084 0.338 0.000 5.938 0.001 directio.py:168(Receive._run) 10170 0.069 0.000 5.726 0.001 handlers.py:117(ServerHandler.finish_response) 30509 0.092 0.000 5.536 0.000 __init__.py:1270(Logger.handle) 30509 0.109 0.000 5.410 0.000 __init__.py:1302(Logger.callHandlers) 30509 0.148 0.000 5.301 0.000 __init__.py:736(RotatingFileHandler.handle) 5139 0.124 0.000 5.095 0.001 directio.py:198(Receive._receive_chunk) We can see that the next target for optimization is logging. Calling log.info() took 7.4 seconds of of 27.6 seconds (26.8%). # profile-stats -c small-requests/upload-keep-alive-unix-cpu.prof Sun Jun 17 10:23:40 2018 small-requests/upload-keep-alive-unix-cpu.prof 8671042 function calls (8686312 primitive calls) in 22.018 seconds Ordered by: internal time List reduced from 378 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 5139 0.725 0.000 0.725 0.000 __builtin__:0(<method 'write' of 'mmap.mmap' objects>) 10477 0.687 0.000 0.687 0.000 __builtin__:0(<method 'write' of '_io.FileIO' objects>) 30509 0.556 0.000 1.135 0.000 __init__.py:242(LogRecord.__init__) 66104 0.512 0.000 0.964 0.000 socket.py:406(_fileobject.readline) 61018 0.442 0.000 2.368 0.000 __init__.py:451(Formatter.format) 61018 0.410 0.000 0.410 0.000 time:0(localtime) 10170 0.369 0.000 1.397 0.000 rfc822.py:122(Message.readheaders) 10167 0.348 0.000 0.348 0.000 __builtin__:0(<method 'close' of 'mmap.mmap' objects>) 82469 0.331 0.000 0.331 0.000 __builtin__:0(<method 'write' of 'cStringIO.StringO' objects>) 61018 0.322 0.000 0.925 0.000 __init__.py:311(LogRecord.getMessage) 24014 0.298 0.000 0.298 0.000 __builtin__:0(<method 'recv' of '_socket.socket' objects>) 218635 0.285 0.000 0.398 0.000 descriptors.py:47(fget) 61018 0.281 0.000 0.855 0.000 __init__.py:404(Formatter.formatTime) 10170 0.266 0.000 0.529 0.000 simple_server.py:73(WSGIRequestHandler.get_environ) 473237 0.257 0.000 0.257 0.000 __builtin__:0(<method 'get' of 'dict' objects>) 50860 0.241 0.000 0.451 0.000 socket.py:292(_fileobject.flush) 10167 0.240 0.000 0.240 0.000 directio.py:378(aligned_buffer) 5084 0.235 0.000 2.754 0.001 directio.py:168(Receive._run) 30509 0.232 0.000 0.232 0.000 __builtin__:0(<method 'flush' of 'file' objects>) 61018 0.217 0.000 0.317 0.000 threading.py:146(_RLock.acquire) # profile-stats -c -s cumtime small-requests/upload-keep-alive-unix-cpu.prof Sun Jun 17 10:23:40 2018 small-requests/upload-keep-alive-unix-cpu.prof 8671042 function calls (8686312 primitive calls) in 22.018 seconds Ordered by: cumulative time List reduced from 378 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 2/3 0.000 0.000 22.005 7.335 threading.py:754(Thread.run) 2/3 0.000 0.000 22.005 7.335 SocketServer.py:586(WSGIServer.process_request_thread) 2/3 0.000 0.000 22.005 7.335 SocketServer.py:332(WSGIServer.finish_request) 2/3 0.000 0.000 22.005 7.335 SocketServer.py:643(WSGIRequestHandler.__init__) 2/3 0.029 0.015 22.004 7.335 wsgi.py:97(WSGIRequestHandler.handle) 10172/10173 0.151 0.000 21.975 0.002 wsgi.py:48(UnixWSGIRequestHandler.handle_one_request) 10169/10170 0.062 0.000 18.968 0.002 handlers.py:76(ServerHandler.run) 10169/10170 0.159 0.000 14.121 0.001 web.py:92(Application.__call__) 10169/10170 0.159 0.000 10.356 0.001 web.py:111(Application.dispatch) 30509 0.138 0.000 6.665 0.000 __init__.py:1139(Logger.info) 30509 0.128 0.000 6.437 0.000 __init__.py:1249(Logger._log) 5084 0.116 0.000 5.157 0.001 server.py:229(Images.put) 30509 0.086 0.000 4.839 0.000 __init__.py:1270(Logger.handle) 30509 0.101 0.000 4.721 0.000 __init__.py:1302(Logger.callHandlers) 30509 0.137 0.000 4.619 0.000 __init__.py:736(RotatingFileHandler.handle) 10170 0.068 0.000 4.572 0.000 handlers.py:117(ServerHandler.finish_response) 10168 0.041 0.000 4.249 0.000 tickets.py:92(Ticket.run) 5084 0.024 0.000 4.163 0.001 server.py:306(Images.patch) 10168 0.043 0.000 4.092 0.000 directio.py:76(Receive.run) 30509 0.115 0.000 4.013 0.000 handlers.py:68(RotatingFileHandler.emit) Again, logging took 6.6 seconds of 22.8 seconds (28.9%). Created attachment 1452430 [details]
Logs and profiles
virt-v2v supports now keep alive connections: https://github.com/libguestfs/libguestfs/commit/f4e0a8342dbeb2c779c76e1807a37b24a0c96feb ovirt-imageio 1.4.2 will support keep alive connections for direct uploads. Supporting proxy uploads requires more work since the proxy duplicated the daemon web server code. There is no change in behavior expect improved performance. The best way to test this it to use imageio example upload script here: https://github.com/oVirt/ovirt-imageio/blob/master/examples/upload with imageio 1.4.1 (does not support keep alive) and imageio 1.4.2. This script support both unix socket and HTTPS, but it will use unix socket automatically. To test HTTPS, disable unix socket support by changing: if "unix_socket" in server_options: con = UnixHTTPConnection(server_options["unix_socket"]) To: if False and "unix_socket" in server_options: con = UnixHTTPConnection(server_options["unix_socket"]) (We may add a flag to make this easier). To test: 1. Create a sparse image using virt-builder: virt-builder fedora-27 2. Create a 6g volume in a storage domain in engine. 3. If the volume is on FC/iSCSI storage, find the volume and activate it: lvs -o vg_name,lv_name,tags | grep disk-uuid lvchange -ay vg_name,lv_name 4. Create this ticket: $ cat ticket.json { "uuid": "test", "size": 6442450944, "url": "file:///dev/vg_name/lv_name", "timeout": 3000, "ops": ["write"] } If the volume is on file storage, use the path to the volume in /rhev/data-center/... 5. Add the ticket to the imageio daemon: $ curl --unix-socket /run/vdsm/ovirt-imageio-daemon.sock \ -X PUT \ --upload-file ticket.json \ http://localhost/tickets/test 6. Upload the image: time examples/upload fedora-27.img https://host:54322/images/test Expected behavior: - sparse upload using HTTPS should be about 2X faster compared with 1.4.1. - sparse upload using unix socket should be about 1.15X faster compared with 1.4.1. Same improvement is expected with virt-v2v -o rhv-upload import, but it is hard to change virt-v2v to use HTTPS instead of unix socket, so you are likely to test only the unix socket, and will not see the improvement in HTTPS upload. Uploading using oVirt SDK examples will not be improved since they are not using multiple requests or support sparse uploads. We will update the examples to use sparse uploads in the future. This bug is referenced in ovirt-imageio release to be shipped in oVirt 4.2.5. Can you please check Target Milestone for this bug? I also see that the single patch attached to this BZ has been merged, can you check if this should be in MODIFIED state? (In reply to Sandro Bonazzola from comment #5) Thanks Sandro, I changed the milestone to 4.2.5, and moved to modified. This feature is included in ovirt-imageio-*-1.4.2, which is part of 4.2.5.
Verified, checked during performance the log and the threads stay's the same through the upload and not closing and opening :
2018-07-30 14:19:32,834 INFO (Thread-10) [web] START: [10.35.68.26] OPTIONS /images/1ce0f599-0458-48e5-8529-4ca009e80962
31 2018-07-30 14:19:32,835 INFO (Thread-10) [web] FINISH [10.35.68.26] OPTIONS /images/1ce0f599-0458-48e5-8529-4ca009e80962: [200] 74 [request=0.000693]
32 2018-07-30 14:19:32,873 INFO (Thread-11) [web] START: [^@] PUT /images/1ce0f599-0458-48e5-8529-4ca009e80962
33 2018-07-30 14:19:32,874 INFO (Thread-11) [images] Writing 54784 bytes at offset 0 flush False to /rhev/data-center/mnt/blockSD/bee83d01-aab7-4581-921f-d0145235c757/images/d19e887a-182d-4723 -912c-2dafa588ab77/27d58f77-5fae-4262-9d04-dc70adea51c2 for ticket 1ce0f599-0458-48e5-8529-4ca009e80962
34 2018-07-30 14:19:32,878 INFO (Thread-11) [web] FINISH [^@] PUT /images/1ce0f599-0458-48e5-8529-4ca009e80962: [200] 0 [request=0.004747, operation=0.004061, read=0.000096, write=0.003067]
35 2018-07-30 14:19:32,879 INFO (Thread-11) [web] START: [^@] PATCH /images/1ce0f599-0458-48e5-8529-4ca009e80962
36 2018-07-30 14:19:32,879 INFO (Thread-11) [images] Zeroing 10752 bytes at offset 54784 flush False to /rhev/data-center/mnt/blockSD/bee83d01-aab7-4581-921f-d0145235c757/images/d19e887a-182d- 4723-912c-2dafa588ab77/27d58f77-5fae-4262-9d04-dc70adea51c2 for ticket 1ce0f599-0458-48e5-8529-4ca009e80962
This bugzilla is included in oVirt 4.2.5 release, published on July 30th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |