Bug 1298590

Summary:	[RFE] Implement a SPICE protocol level keepalive mechanism for all channels
Product:	Red Hat Enterprise Linux 7	Reporter:	Tim Speetjens <tspeetje>
Component:	spice	Assignee:	Default Assignee for SPICE Bugs <rh-spice-bugs>
Status:	CLOSED ERRATA	QA Contact:	SPICE QE bug list <spice-qe-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.3	CC:	arne.gogala, cfergeau, dblechte, mburgerh, rduda, sigbjorn, tpelka, tspeetje, ylavi
Target Milestone:	rc	Keywords:	FutureFeature
Target Release:	---
Hardware:	All
OS:	All
Whiteboard:	spice
Fixed In Version:	spice-0.12.4-17.el7	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:
Clones:	1298944 1298945 1298950 1298982 1299373 (view as bug list)		Environment:
Last Closed:	2016-11-04 03:44:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Tim Speetjens 2016-01-14 13:54:12 UTC

Description of problem:
When RHEV is used as a VD solution, it is very common to have a firewall between datacenter and client network. When a session is inactive (lunch break, overnight), and then reused, the firewall will drop packets related to keyboard/mouse channel.

Version-Release number of selected component (if applicable):
spice-server-0.12.4-12
spice-gtk3-0.30-1
spice-gtk-0.30-1

How reproducible:
100%

Steps to Reproduce:
1. Open a spice session, and minimize the window (to avoid mouse moves to be sent across)
2. No data is sent over for at least the channel for keyboard and mouse.
3. When a firewall is in between client and server, traffic for this connections will be dropped after an extended period of time

Actual results:
Traffic is dropped, because no traffic is sent over this connection

Expected results:
Traffic is sent over to keep the connection open (at a level higher than the TCP level)

Additional info:
TCP connection keepalive is already activated in spice-gtk and spice-gtk3, however, default TCP parameter (2h on Windows and Linux) is likely higher than the firewall timeout.

Comment 2 David Blechter 2016-01-14 15:11:56 UTC

The solution should be in rhevm, libvirt, qemu and spice.
The patches for spice are already approved upstream. And working on qemu ones.
It is not clear what OS are running on the hosts: RHEL 6.x, RHEL 7.x or both?
It will define the qemu and spice server components.
I suggest to use this bug as the tracking one.

Comment 5 Yaniv Kaul 2016-01-17 10:51:51 UTC

(In reply to David Blechter from comment #2)
> The solution should be in rhevm, libvirt, qemu and spice.

I'm probably missing the reason why would anyone but Spice needs to be involved - why wouldn't the Spice client send a 'ping' on the spice protocol level every X minutes (say, 15 minutes) by default, if there has been no activity (or just send it anyway - use it to measure latency)? What is the downside?

Clearly from implementation point of view, makes everyone's life easier.

> The patches for spice are already approved upstream. And working on qemu
> ones.
> It is not clear what OS are running on the hosts: RHEL 6.x, RHEL 7.x or both?
> It will define the qemu and spice server components.
> I suggest to use this bug as the tracking one.

Comment 6 Christophe Fergeau 2016-01-18 10:17:58 UTC

(In reply to Yaniv Kaul from comment #5)
> (In reply to David Blechter from comment #2)
> > The solution should be in rhevm, libvirt, qemu and spice.
> 
> I'm probably missing the reason why would anyone but Spice needs to be
> involved - why wouldn't the Spice client send a 'ping' on the spice protocol
> level every X minutes (say, 15 minutes) by default, if there has been no
> activity (or just send it anyway - use it to measure latency)? What is the
> downside?
> 

There are already some regular ping probes at the spice protocol level, but only on the main and display channels. This results in the input (and other) channels sometimes getting closed (ie no longer usable) while display is still functional.
We could extend it to do that on all channels, but it seems better to me to use the equivalent TCP mechanism.

Pings at the SPICE level would not necessarily be easier to implement, as we'd need to make the timeout configurable in RHEV, so this would imply changes in at least spice-server/spice-client and some RHEV components.
In my opinion, this needs to be configurable by admins on the RHEV instance as it will sometimes be more convenient for that setting to be centrally managed rather than set on each client separately (but I would not be surprised if we need some client-side keepalive setting too at a later point).

Comment 7 Yaniv Kaul 2016-01-18 10:22:57 UTC

(In reply to Christophe Fergeau from comment #6)
> (In reply to Yaniv Kaul from comment #5)
> > (In reply to David Blechter from comment #2)
> > > The solution should be in rhevm, libvirt, qemu and spice.
> > 
> > I'm probably missing the reason why would anyone but Spice needs to be
> > involved - why wouldn't the Spice client send a 'ping' on the spice protocol
> > level every X minutes (say, 15 minutes) by default, if there has been no
> > activity (or just send it anyway - use it to measure latency)? What is the
> > downside?
> > 
> 
> There are already some regular ping probes at the spice protocol level, but
> only on the main and display channels. This results in the input (and other)
> channels sometimes getting closed (ie no longer usable) while display is
> still functional.
> We could extend it to do that on all channels, but it seems better to me to
> use the equivalent TCP mechanism.

I'm not sure - didn't find a way it can be tuned per connection (only overall system wide). Perhaps I've missed something.

> 
> Pings at the SPICE level would not necessarily be easier to implement, as
> we'd need to make the timeout configurable in RHEV, so this would imply
> changes in at least spice-server/spice-client and some RHEV components.

Why? Just set it to 10 minutes fixed, if there's no activity. Or 1 minute or any sensible number. the overhead of a ping (if there's no traffic) over all channels, every 1 minute is negligible. The effort to implement the configurability of it across Spice and RHEV - isn't.

> In my opinion, this needs to be configurable by admins on the RHEV instance
> as it will sometimes be more convenient for that setting to be centrally
> managed rather than set on each client separately (but I would not be
> surprised if we need some client-side keepalive setting too at a later
> point).

Comment 8 Christophe Fergeau 2016-01-18 13:39:40 UTC

(In reply to Yaniv Kaul from comment #7)
> Why? Just set it to 10 minutes fixed, if there's no activity. Or 1 minute or
> any sensible number. the overhead of a ping (if there's no traffic) over all
> channels, every 1 minute is negligible. The effort to implement the
> configurability of it across Spice and RHEV - isn't.

I suspect 10 minutes isn't going to be low enough for some specific setups. 1 minute should hopefully be good (?), I wonder if people would complain that it's too much when they don't need this..

Comment 9 Yaniv Kaul 2016-01-18 15:03:47 UTC

(In reply to Christophe Fergeau from comment #8)
> (In reply to Yaniv Kaul from comment #7)
> > Why? Just set it to 10 minutes fixed, if there's no activity. Or 1 minute or
> > any sensible number. the overhead of a ping (if there's no traffic) over all
> > channels, every 1 minute is negligible. The effort to implement the
> > configurability of it across Spice and RHEV - isn't.
> 
> I suspect 10 minutes isn't going to be low enough for some specific setups.
> 1 minute should hopefully be good (?), I wonder if people would complain
> that it's too much when they don't need this..

Assuming 100 bytes per message, and 7 channels (forgot a bit about the Spice protocol...), so we are talking about 1KB/minute, when idle... RFC1149 can probably handle such low bw traffic.

Comment 10 Yaniv Lavi 2016-01-18 19:41:54 UTC

(In reply to Christophe Fergeau from comment #8)
> (In reply to Yaniv Kaul from comment #7)
> > Why? Just set it to 10 minutes fixed, if there's no activity. Or 1 minute or
> > any sensible number. the overhead of a ping (if there's no traffic) over all
> > channels, every 1 minute is negligible. The effort to implement the
> > configurability of it across Spice and RHEV - isn't.
> 
> I suspect 10 minutes isn't going to be low enough for some specific setups.
> 1 minute should hopefully be good (?), I wonder if people would complain
> that it's too much when they don't need this..

I think even if we do not support exposing this option in RHEV yet this should be on by default. 1KB/minute sound very reasonable.

Comment 11 David Blechter 2016-01-19 14:43:40 UTC

info was provided by Christophe in https://bugzilla.redhat.com/show_bug.cgi?id=1298590#c6, cleaning the needinfo.

Comment 12 Yaniv Lavi 2016-01-19 16:39:09 UTC

I'll define the scope:
We should be good with keep live from client which is on by default every 10 minutes in all channels, if session is idle. We don't need this to be configurable from the manager for now.

Comment 13 David Blechter 2016-01-20 14:52:20 UTC

(In reply to Yaniv Dary from comment #12)
> I'll define the scope:
> We should be good with keep live from client which is on by default every 10
> minutes in all channels, if session is idle. We don't need this to be
> configurable from the manager for now.

correction: on the spice-server and it will not be configurable.

Comment 14 David Blechter 2016-01-20 15:02:00 UTC

moving to spice in 7.3. No needs in tracking

Comment 15 Christophe Fergeau 2016-01-20 15:23:13 UTC

*** Bug 1298944 has been marked as a duplicate of this bug. ***

Comment 16 Christophe Fergeau 2016-01-20 15:24:53 UTC

Some discussion which happened on rhbz#1298944 :



Comment 3 Tim Speetjens 2016-01-20 03:22:59 EST

(In reply to Christophe Fergeau from comment #1)
> This is solved on the SPICE side by
> http://cgit.freedesktop.org/spice/spice/commit/?h=0.
> 12&id=98417d8309893e14efadfe5d2f9002b94d6c56a6

From my understanding, this activates the TCP level keepalive, which is not what this RFE was filed for.

From C#1:
TCP connection keepalive is already activated in spice-gtk and spice-gtk3, however, default TCP parameter (2h on Windows and Linux) is likely higher than the firewall timeout.

Having it on the hypervisor side simplifies activating it, though, but having it implemented on the protocol, would make the standard use case work out of the box.

[reply] [−]
Private
Comment 4 Christophe Fergeau 2016-01-20 03:29:12 EST

The description is saying that the default value (2h) is too high, this commit allows to enable keepalive server-side and to change the interval with which keepalive probes are sent (ie it's possible to send them more often than once every 2 hours).
The description also says "Customer has implemented a workaround that seems to be sufficient (TCP keepalive)", so my understanding is that TCP keepalive/SPICE-level pings are going to work equally well as long as they are sent often enough.
How long can a connection be idle before it's closed in your customer's setup btw?

[reply] [−]
Private
Comment 5 Tim Speetjens 2016-01-20 03:47:28 EST

(In reply to Christophe Fergeau from comment #4)
> The description is saying that the default value (2h) is too high, this
> commit allows to enable keepalive server-side and to change the interval
> with which keepalive probes are sent (ie it's possible to send them more
> often than once every 2 hours).
I missed that the timeout is configurable, indeed.

> The description also says "Customer has implemented a workaround that seems
> to be sufficient (TCP keepalive)", so my understanding is that TCP
> keepalive/SPICE-level pings are going to work equally well as long as they
> are sent often enough.
Correct.

> How long can a connection be idle before it's closed in your customer's
> setup btw?
In this case, the firewall is configured with 1h timeout.

As a patch has only limited context, does 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all channels? (Or only the red / main channel)?

Flags: needinfo?(cfergeau)
[reply] [−]
Private
Comment 6 Christophe Fergeau 2016-01-20 04:35:21 EST

(In reply to Tim Speetjens from comment #5)
> As a patch has only limited context, does
> 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all
> channels? (Or only the red / main channel)?

Yes, it enables tcp keepalive with a configurable rate for all spice channels. However, we won't go with this patch as RHEV/PM do not want the interval to be configurable, so this will be hardcoded in spice-server.

Comment 17 David Blechter 2016-01-20 15:28:44 UTC

Tim, 
let's keep the on-going discussion here

Comment 18 Tim Speetjens 2016-01-20 15:34:06 UTC

David,

I don't understand what you mean, can you clarify?

Comment 19 Yaniv Kaul 2016-01-20 15:52:28 UTC

I don't think http://cgit.freedesktop.org/spice/spice/commit/?id=b9b0590bc053b8fdd9b8ac04256d01a35f0cbcf7 is good enough - shouldn't it set TCP_KEEPINTVL as well?

Comment 20 David Blechter 2016-01-20 15:56:36 UTC

(In reply to Tim Speetjens from comment #18)
> David,
> 
> I don't understand what you mean, can you clarify?

Tim,
Just wanted to be sure you are aware that there is only one bug remaining for keepalive. And Christophe copied discussions from 1298944 here.

Comment 21 Yaniv Kaul 2016-01-20 16:00:33 UTC

Re-setting needinfo for comment 19.

Comment 22 Christophe Fergeau 2016-01-20 16:15:43 UTC

The patch in rhbz#1298944 is not doing what has been requested in this bug, a different one will be used, this is why this bug is not in POST.

Comment 23 Yaniv Lavi 2016-01-26 16:00:52 UTC

(In reply to Christophe Fergeau from comment #16)
> (In reply to Tim Speetjens from comment #5)
> > As a patch has only limited context, does
> > 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all
> > channels? (Or only the red / main channel)?
> 
> Yes, it enables tcp keepalive with a configurable rate for all spice
> channels. However, we won't go with this patch as RHEV/PM do not want the
> interval to be configurable, so this will be hardcoded in spice-server.

One comment, if it doesn't complicate the solution I would want this value configurable in a conf file locally and not hard coded.

Comment 24 David Blechter 2016-01-26 16:22:33 UTC

(In reply to Yaniv Dary from comment #23)
> (In reply to Christophe Fergeau from comment #16)
> > (In reply to Tim Speetjens from comment #5)
> > > As a patch has only limited context, does
> > > 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all
> > > channels? (Or only the red / main channel)?
> > 
> > Yes, it enables tcp keepalive with a configurable rate for all spice
> > channels. However, we won't go with this patch as RHEV/PM do not want the
> > interval to be configurable, so this will be hardcoded in spice-server.
> 
> One comment, if it doesn't complicate the solution I would want this value
> configurable in a conf file locally and not hard coded.


There are a lot of reasons not to do it this way.

Comment 25 Yaniv Lavi 2016-01-26 16:30:01 UTC

(In reply to David Blechter from comment #24)
> (In reply to Yaniv Dary from comment #23)
> > (In reply to Christophe Fergeau from comment #16)
> > > (In reply to Tim Speetjens from comment #5)
> > > > As a patch has only limited context, does
> > > > 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all
> > > > channels? (Or only the red / main channel)?
> > > 
> > > Yes, it enables tcp keepalive with a configurable rate for all spice
> > > channels. However, we won't go with this patch as RHEV/PM do not want the
> > > interval to be configurable, so this will be hardcoded in spice-server.
> > 
> > One comment, if it doesn't complicate the solution I would want this value
> > configurable in a conf file locally and not hard coded.
> 
> 
> There are a lot of reasons not to do it this way.

Please share them.

Comment 26 Christophe Fergeau 2016-01-26 18:16:54 UTC

(In reply to Yaniv Dary from comment #23)
> One comment, if it doesn't complicate the solution I would want this value
> configurable in a conf file locally and not hard coded.

The usual way of doing this kind of configuration is through the libvirt XML definition of the VM, which is what the initial set of patches was doing.
Once this is configurable in the libvirt XML, the parameter can be exposed in some local VDSM configuration file if it's deemed important enough, or it can most likely be changed through hooks.

Comment 27 Yaniv Lavi 2016-01-31 14:49:13 UTC

(In reply to Christophe Fergeau from comment #26)
> (In reply to Yaniv Dary from comment #23)
> > One comment, if it doesn't complicate the solution I would want this value
> > configurable in a conf file locally and not hard coded.
> 
> The usual way of doing this kind of configuration is through the libvirt XML
> definition of the VM, which is what the initial set of patches was doing.
> Once this is configurable in the libvirt XML, the parameter can be exposed
> in some local VDSM configuration file if it's deemed important enough, or it
> can most likely be changed through hooks.

Ok, exposing this in libvirt XML sound like a good path as well.

Comment 28 Christophe Fergeau 2016-03-16 12:07:41 UTC

Patches are now in git master.

Comment 29 Mike McCune 2016-03-28 23:26:37 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 31 Radek Duda 2016-05-24 16:48:16 UTC

(In reply to Christophe Fergeau from comment #28)
> Patches are now in git master.

How was this issue finally resolved? By implementing fixed keep-alive time-out 10 minutes?
Need some hints how to test it..

Comment 32 Christophe Fergeau 2016-05-25 07:39:17 UTC

(In reply to Radek Duda from comment #31)
> (In reply to Christophe Fergeau from comment #28)
> > Patches are now in git master.
> 
> How was this issue finally resolved? By implementing fixed keep-alive
> time-out 10 minutes?

Yes.

> Need some hints how to test it..

I tested using wireshark which will show you the keepalive probes being sent. Another way of testing would be setting up "something" (a proxy?) which cuts TCP connections after 15 minutes of inactivity, and testing behaviour with older/newer spice-server when SPICE traffic goes through it.

Comment 35 errata-xmlrpc 2016-11-04 03:44:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2324.html