Bug 1158029

Summary: [RFE] Spice: Improve performance over high latency WAN links
Product: Red Hat Enterprise Linux 7 Reporter: Evgheni Dereveanchin <ederevea>
Component: spiceAssignee: Frediano Ziglio <fziglio>
Status: CLOSED WONTFIX QA Contact: SPICE QE bug list <spice-qe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.4CC: amarchuk, bdoran, cfergeau, dblechte, ederevea, fziglio, inetkach, jbuchta, jpullen, lpeer, lsurette, marcandre.lureau, meverett, mtessun, obockows, pzhukov, rbalakri, srevivo, tpelka
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-24 10:59:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Spice RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Evgheni Dereveanchin 2014-10-28 11:17:11 UTC
Description of problem:
SPICE is operating poorly when used over high-latency WAN/VPN links - mouse cursor is lagging, as well as text input. 

Version-Release number of selected component (if applicable):
virt-viewer 0.6.0
rhev 3.4

How reproducible:
Always

Steps to Reproduce:
1. set up RHEV with a separate display network (to isolate the problem)
2. simulate a high-latency WAN link on the Display Network (change eth0 to device name used for the display network):
# tc qdisc add dev eth0 root netem delay 200ms 10ms 25% loss 0.5% 5% corrupt 0.1%  reorder 1% 2%

This creates ~200ms delay, up to 5% loss, 0.1% corruption and 2% reordering of all outgoing packets. The shaper does not affect network throughput.

3. initiate a SPICE session to a VM running on this host

Actual results:
Slow response from all interfaces of the VM, mouse lagging, text input into an editor delayed, screen redraw time more than 2 seconds for 1024*768 displays (open black window over white desktop, etc).

Expected results:
Interface not affected by latency and remains usable

Additional info:

It is understandable that 200 ms delay is very high, however RDP protocol seems to handle it much better. At least a local cursor is drawn instead of the remote one which massively improves user experience and removes perceived lag.

Comment 1 Evgheni Dereveanchin 2014-10-29 07:30:12 UTC
I want to add that the shaper (reproducer step 2) can be applied on client side as well, so no changes to RHEV side are required to test this.

Comment 9 Frediano Ziglio 2016-12-02 15:31:59 UTC
Got some investigations.
I instrumented spice-server adding message counter.
I tested with a connection having 100ms latency and 20Mbit.
As expected the number of commands per second is much higher than Windows (70 versus 200 of RHEL7).
I applied the network improvements patches I sent 18 months ago and got clear improvements, the commands goes up to 500 (a bit less) doing continued scrolling and responsiveness is better.
The reason of so much commands compared to Windows is that on RHEL7 Mesa implementation split images in chunks of 64K pixels (more or less). This increase the number of needed commands. Also this make video detection and streaming in general much worse basically causing spice-server code to create multiple streams.
Beside the network optimizations possible I think would be worth collapsing (so doing the reverse of what Mesa is doing) so to have bigger images and less commands.

Comment 10 Frediano Ziglio 2016-12-05 12:22:42 UTC
Got a couple of promising patches specifically for RHEL7.
Trying to get also some better statistics.

Comment 11 Marc-Andre Lureau 2016-12-06 14:46:47 UTC
worth pointing out the upstream series to improve mesa (also reducing memcpy and traffic in guest): drisw/glx: use XShm if possible

Comment 12 Frediano Ziglio 2017-05-24 10:59:57 UTC
Current status:
- mesa patch does not fix the issue (https://bugzilla.redhat.com/show_bug.cgi?id=1030024#c19);
- my workaround patch which is working and tested are rejected for 7.4;
- for 7.5 we are going to support Virgl instead of Qxl so the problem will disappear.