Bug 1298590
Summary: | [RFE] Implement a SPICE protocol level keepalive mechanism for all channels | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Tim Speetjens <tspeetje> | |
Component: | spice | Assignee: | Default Assignee for SPICE Bugs <rh-spice-bugs> | |
Status: | CLOSED ERRATA | QA Contact: | SPICE QE bug list <spice-qe-bugs> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 7.3 | CC: | arne.gogala, cfergeau, dblechte, mburgerh, rduda, sigbjorn, tpelka, tspeetje, ylavi | |
Target Milestone: | rc | Keywords: | FutureFeature | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | spice | |||
Fixed In Version: | spice-0.12.4-17.el7 | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1298944 1298945 1298950 1298982 1299373 (view as bug list) | Environment: | ||
Last Closed: | 2016-11-04 03:44:10 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Tim Speetjens
2016-01-14 13:54:12 UTC
The solution should be in rhevm, libvirt, qemu and spice. The patches for spice are already approved upstream. And working on qemu ones. It is not clear what OS are running on the hosts: RHEL 6.x, RHEL 7.x or both? It will define the qemu and spice server components. I suggest to use this bug as the tracking one. (In reply to David Blechter from comment #2) > The solution should be in rhevm, libvirt, qemu and spice. I'm probably missing the reason why would anyone but Spice needs to be involved - why wouldn't the Spice client send a 'ping' on the spice protocol level every X minutes (say, 15 minutes) by default, if there has been no activity (or just send it anyway - use it to measure latency)? What is the downside? Clearly from implementation point of view, makes everyone's life easier. > The patches for spice are already approved upstream. And working on qemu > ones. > It is not clear what OS are running on the hosts: RHEL 6.x, RHEL 7.x or both? > It will define the qemu and spice server components. > I suggest to use this bug as the tracking one. (In reply to Yaniv Kaul from comment #5) > (In reply to David Blechter from comment #2) > > The solution should be in rhevm, libvirt, qemu and spice. > > I'm probably missing the reason why would anyone but Spice needs to be > involved - why wouldn't the Spice client send a 'ping' on the spice protocol > level every X minutes (say, 15 minutes) by default, if there has been no > activity (or just send it anyway - use it to measure latency)? What is the > downside? > There are already some regular ping probes at the spice protocol level, but only on the main and display channels. This results in the input (and other) channels sometimes getting closed (ie no longer usable) while display is still functional. We could extend it to do that on all channels, but it seems better to me to use the equivalent TCP mechanism. Pings at the SPICE level would not necessarily be easier to implement, as we'd need to make the timeout configurable in RHEV, so this would imply changes in at least spice-server/spice-client and some RHEV components. In my opinion, this needs to be configurable by admins on the RHEV instance as it will sometimes be more convenient for that setting to be centrally managed rather than set on each client separately (but I would not be surprised if we need some client-side keepalive setting too at a later point). (In reply to Christophe Fergeau from comment #6) > (In reply to Yaniv Kaul from comment #5) > > (In reply to David Blechter from comment #2) > > > The solution should be in rhevm, libvirt, qemu and spice. > > > > I'm probably missing the reason why would anyone but Spice needs to be > > involved - why wouldn't the Spice client send a 'ping' on the spice protocol > > level every X minutes (say, 15 minutes) by default, if there has been no > > activity (or just send it anyway - use it to measure latency)? What is the > > downside? > > > > There are already some regular ping probes at the spice protocol level, but > only on the main and display channels. This results in the input (and other) > channels sometimes getting closed (ie no longer usable) while display is > still functional. > We could extend it to do that on all channels, but it seems better to me to > use the equivalent TCP mechanism. I'm not sure - didn't find a way it can be tuned per connection (only overall system wide). Perhaps I've missed something. > > Pings at the SPICE level would not necessarily be easier to implement, as > we'd need to make the timeout configurable in RHEV, so this would imply > changes in at least spice-server/spice-client and some RHEV components. Why? Just set it to 10 minutes fixed, if there's no activity. Or 1 minute or any sensible number. the overhead of a ping (if there's no traffic) over all channels, every 1 minute is negligible. The effort to implement the configurability of it across Spice and RHEV - isn't. > In my opinion, this needs to be configurable by admins on the RHEV instance > as it will sometimes be more convenient for that setting to be centrally > managed rather than set on each client separately (but I would not be > surprised if we need some client-side keepalive setting too at a later > point). (In reply to Yaniv Kaul from comment #7) > Why? Just set it to 10 minutes fixed, if there's no activity. Or 1 minute or > any sensible number. the overhead of a ping (if there's no traffic) over all > channels, every 1 minute is negligible. The effort to implement the > configurability of it across Spice and RHEV - isn't. I suspect 10 minutes isn't going to be low enough for some specific setups. 1 minute should hopefully be good (?), I wonder if people would complain that it's too much when they don't need this.. (In reply to Christophe Fergeau from comment #8) > (In reply to Yaniv Kaul from comment #7) > > Why? Just set it to 10 minutes fixed, if there's no activity. Or 1 minute or > > any sensible number. the overhead of a ping (if there's no traffic) over all > > channels, every 1 minute is negligible. The effort to implement the > > configurability of it across Spice and RHEV - isn't. > > I suspect 10 minutes isn't going to be low enough for some specific setups. > 1 minute should hopefully be good (?), I wonder if people would complain > that it's too much when they don't need this.. Assuming 100 bytes per message, and 7 channels (forgot a bit about the Spice protocol...), so we are talking about 1KB/minute, when idle... RFC1149 can probably handle such low bw traffic. (In reply to Christophe Fergeau from comment #8) > (In reply to Yaniv Kaul from comment #7) > > Why? Just set it to 10 minutes fixed, if there's no activity. Or 1 minute or > > any sensible number. the overhead of a ping (if there's no traffic) over all > > channels, every 1 minute is negligible. The effort to implement the > > configurability of it across Spice and RHEV - isn't. > > I suspect 10 minutes isn't going to be low enough for some specific setups. > 1 minute should hopefully be good (?), I wonder if people would complain > that it's too much when they don't need this.. I think even if we do not support exposing this option in RHEV yet this should be on by default. 1KB/minute sound very reasonable. info was provided by Christophe in https://bugzilla.redhat.com/show_bug.cgi?id=1298590#c6, cleaning the needinfo. I'll define the scope: We should be good with keep live from client which is on by default every 10 minutes in all channels, if session is idle. We don't need this to be configurable from the manager for now. (In reply to Yaniv Dary from comment #12) > I'll define the scope: > We should be good with keep live from client which is on by default every 10 > minutes in all channels, if session is idle. We don't need this to be > configurable from the manager for now. correction: on the spice-server and it will not be configurable. moving to spice in 7.3. No needs in tracking *** Bug 1298944 has been marked as a duplicate of this bug. *** Some discussion which happened on rhbz#1298944 : Comment 3 Tim Speetjens 2016-01-20 03:22:59 EST (In reply to Christophe Fergeau from comment #1) > This is solved on the SPICE side by > http://cgit.freedesktop.org/spice/spice/commit/?h=0. > 12&id=98417d8309893e14efadfe5d2f9002b94d6c56a6 From my understanding, this activates the TCP level keepalive, which is not what this RFE was filed for. From C#1: TCP connection keepalive is already activated in spice-gtk and spice-gtk3, however, default TCP parameter (2h on Windows and Linux) is likely higher than the firewall timeout. Having it on the hypervisor side simplifies activating it, though, but having it implemented on the protocol, would make the standard use case work out of the box. [reply] [−] Private Comment 4 Christophe Fergeau 2016-01-20 03:29:12 EST The description is saying that the default value (2h) is too high, this commit allows to enable keepalive server-side and to change the interval with which keepalive probes are sent (ie it's possible to send them more often than once every 2 hours). The description also says "Customer has implemented a workaround that seems to be sufficient (TCP keepalive)", so my understanding is that TCP keepalive/SPICE-level pings are going to work equally well as long as they are sent often enough. How long can a connection be idle before it's closed in your customer's setup btw? [reply] [−] Private Comment 5 Tim Speetjens 2016-01-20 03:47:28 EST (In reply to Christophe Fergeau from comment #4) > The description is saying that the default value (2h) is too high, this > commit allows to enable keepalive server-side and to change the interval > with which keepalive probes are sent (ie it's possible to send them more > often than once every 2 hours). I missed that the timeout is configurable, indeed. > The description also says "Customer has implemented a workaround that seems > to be sufficient (TCP keepalive)", so my understanding is that TCP > keepalive/SPICE-level pings are going to work equally well as long as they > are sent often enough. Correct. > How long can a connection be idle before it's closed in your customer's > setup btw? In this case, the firewall is configured with 1h timeout. As a patch has only limited context, does 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all channels? (Or only the red / main channel)? Flags: needinfo?(cfergeau) [reply] [−] Private Comment 6 Christophe Fergeau 2016-01-20 04:35:21 EST (In reply to Tim Speetjens from comment #5) > As a patch has only limited context, does > 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all > channels? (Or only the red / main channel)? Yes, it enables tcp keepalive with a configurable rate for all spice channels. However, we won't go with this patch as RHEV/PM do not want the interval to be configurable, so this will be hardcoded in spice-server. Tim, let's keep the on-going discussion here David, I don't understand what you mean, can you clarify? I don't think http://cgit.freedesktop.org/spice/spice/commit/?id=b9b0590bc053b8fdd9b8ac04256d01a35f0cbcf7 is good enough - shouldn't it set TCP_KEEPINTVL as well? (In reply to Tim Speetjens from comment #18) > David, > > I don't understand what you mean, can you clarify? Tim, Just wanted to be sure you are aware that there is only one bug remaining for keepalive. And Christophe copied discussions from 1298944 here. Re-setting needinfo for comment 19. The patch in rhbz#1298944 is not doing what has been requested in this bug, a different one will be used, this is why this bug is not in POST. (In reply to Christophe Fergeau from comment #16) > (In reply to Tim Speetjens from comment #5) > > As a patch has only limited context, does > > 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all > > channels? (Or only the red / main channel)? > > Yes, it enables tcp keepalive with a configurable rate for all spice > channels. However, we won't go with this patch as RHEV/PM do not want the > interval to be configurable, so this will be hardcoded in spice-server. One comment, if it doesn't complicate the solution I would want this value configurable in a conf file locally and not hard coded. (In reply to Yaniv Dary from comment #23) > (In reply to Christophe Fergeau from comment #16) > > (In reply to Tim Speetjens from comment #5) > > > As a patch has only limited context, does > > > 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all > > > channels? (Or only the red / main channel)? > > > > Yes, it enables tcp keepalive with a configurable rate for all spice > > channels. However, we won't go with this patch as RHEV/PM do not want the > > interval to be configurable, so this will be hardcoded in spice-server. > > One comment, if it doesn't complicate the solution I would want this value > configurable in a conf file locally and not hard coded. There are a lot of reasons not to do it this way. (In reply to David Blechter from comment #24) > (In reply to Yaniv Dary from comment #23) > > (In reply to Christophe Fergeau from comment #16) > > > (In reply to Tim Speetjens from comment #5) > > > > As a patch has only limited context, does > > > > 98417d8309893e14efadfe5d2f9002b94d6c56a6 activate the tcp keepalive on all > > > > channels? (Or only the red / main channel)? > > > > > > Yes, it enables tcp keepalive with a configurable rate for all spice > > > channels. However, we won't go with this patch as RHEV/PM do not want the > > > interval to be configurable, so this will be hardcoded in spice-server. > > > > One comment, if it doesn't complicate the solution I would want this value > > configurable in a conf file locally and not hard coded. > > > There are a lot of reasons not to do it this way. Please share them. (In reply to Yaniv Dary from comment #23) > One comment, if it doesn't complicate the solution I would want this value > configurable in a conf file locally and not hard coded. The usual way of doing this kind of configuration is through the libvirt XML definition of the VM, which is what the initial set of patches was doing. Once this is configurable in the libvirt XML, the parameter can be exposed in some local VDSM configuration file if it's deemed important enough, or it can most likely be changed through hooks. (In reply to Christophe Fergeau from comment #26) > (In reply to Yaniv Dary from comment #23) > > One comment, if it doesn't complicate the solution I would want this value > > configurable in a conf file locally and not hard coded. > > The usual way of doing this kind of configuration is through the libvirt XML > definition of the VM, which is what the initial set of patches was doing. > Once this is configurable in the libvirt XML, the parameter can be exposed > in some local VDSM configuration file if it's deemed important enough, or it > can most likely be changed through hooks. Ok, exposing this in libvirt XML sound like a good path as well. Patches are now in git master. This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions (In reply to Christophe Fergeau from comment #28) > Patches are now in git master. How was this issue finally resolved? By implementing fixed keep-alive time-out 10 minutes? Need some hints how to test it.. (In reply to Radek Duda from comment #31) > (In reply to Christophe Fergeau from comment #28) > > Patches are now in git master. > > How was this issue finally resolved? By implementing fixed keep-alive > time-out 10 minutes? Yes. > Need some hints how to test it.. I tested using wireshark which will show you the keepalive probes being sent. Another way of testing would be setting up "something" (a proxy?) which cuts TCP connections after 15 minutes of inactivity, and testing behaviour with older/newer spice-server when SPICE traffic goes through it. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2324.html |