RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1719736 - remote-viewer hangs after~1 hour of client inactivity when SPICE proxy enabled
Summary: remote-viewer hangs after~1 hour of client inactivity when SPICE proxy enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: spice-gtk
Version: 7.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Frediano Ziglio
QA Contact: SPICE QE bug list
URL:
Whiteboard:
Depends On:
Blocks: 1428804 1436589
TreeView+ depends on / blocked
 
Reported: 2019-06-12 12:45 UTC by Frediano Ziglio
Modified: 2020-03-31 20:09 UTC (History)
10 users (show)

Fixed In Version: spice-0.14.0-9.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1428804
Environment:
Last Closed: 2020-03-31 20:09:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed backported patch (4.67 KB, application/mbox)
2019-06-12 12:45 UTC, Frediano Ziglio
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:1170 0 None None None 2020-03-31 20:09:33 UTC

Description Frediano Ziglio 2019-06-12 12:45:45 UTC
Created attachment 1579827 [details]
Proposed backported patch

+++ This bug was initially created as a clone of Bug #1428804 +++

There was a RFE: "Implement a SPICE protocol level keepalive mechanism for all channels" https://bugzilla.redhat.com/show_bug.cgi?id=1298590

It was believed, that above bug would help to solve issues with RV hangups. But, above solution doesn't not help. RV hangs after a period of client inactivity.

Server:
spice-server-0.12.4-20.el7_3.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64

Client RHEL73:
rpm -qa | grep -E 'spice|viewer'
virt-viewer-2.0-12.el7.x86_64
gnome-font-viewer-3.14.1-4.el7.x86_64
spice-server-0.12.4-20.el7_3.x86_64
spice-gtk3-0.31-6.el7_3.2.x86_64
spice-vdagent-0.14.0-14.el7.x86_64
spice-glib-0.31-6.el7_3.2.x86_64
spice-protocol-0.12.11-1.el7.noarch

or Client  Windows10 with spice-client-msi-x86-4.1-6.el7ev.noarch

Steps to Reproduce:
1. Connect to VM with remote-viewer
2. Keep remote-viewer running.
3. Do not interact with VM in any way.

Actual results: After ~30-60 minutes RV hangs. It doesn't respond to any keys, a mouse clicks.

This bug is very tricky in terms of reproduction. It would be good to know what logs are necessary in advance.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-03-03 11:53:07 UTC ---

Since this bug report was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Radek Duda on 2017-03-06 13:03:54 UTC ---

not reproducible with spice-0.12.4-17.el7 installed on host. Keepalive packet is sent approx. every 10 minutes.

--- Additional comment from Radek Duda on 2017-03-06 17:21:59 UTC ---

When SPICE proxy is disabled, keepalive packets are sent and guest in remote-viewer does not hang. So this seems to be a SPICE proxy problem

--- Additional comment from Christophe Fergeau on 2017-03-06 17:29:14 UTC ---

What is the exact problem description now?

When spice proxy is enabled,
1) keepalive packets are not sent, and the guest hangs?
2) keepalive packets are sent, but the guest hangs?

--- Additional comment from Frediano Ziglio on 2017-03-06 18:14:56 UTC ---

(In reply to Radek Duda from comment #3)
> When SPICE proxy is disabled, keepalive packets are sent and guest in
> remote-viewer does not hang. So this seems to be a SPICE proxy problem

When client is connected to a proxy is the proxy that should send the keep alive packets, the server cannot as a proxy is a L7.
If the proxy (which actually can be any http proxy) is not configured for keep alive packets this disconnection is expected.

There are some possible solutions:
- configure proxy for keep alive;
- add keep alive support to client, the client also will send keep alive packets to the proxy to keep client <-> proxy connection alive;
- implement keep alive at spice protocol level (I would avoid this);
- do not close connections on inactivity (usually not possible as done for security reasons in the network infrastructure).

--- Additional comment from Radek Duda on 2017-03-07 09:31:17 UTC ---

(In reply to Christophe Fergeau from comment #4)
> What is the exact problem description now?
> 
> When spice proxy is enabled,
> 1) keepalive packets are not sent, and the guest hangs?
> 2) keepalive packets are sent, but the guest hangs?

When proxy is enabled, keepalive packets are not sent to client at all

--- Additional comment from Frediano Ziglio on 2017-03-07 12:38:46 UTC ---

(In reply to Radek Duda from comment #6)
> (In reply to Christophe Fergeau from comment #4)
> > What is the exact problem description now?
> > 
> > When spice proxy is enabled,
> > 1) keepalive packets are not sent, and the guest hangs?
> > 2) keepalive packets are sent, but the guest hangs?
> 
> When proxy is enabled, keepalive packets are not sent to client at all

This is based on proxy configuration as explained above.
We are currently discussing on implementing keep alive on client to keep the client <-> proxy connection alive.

--- Additional comment from Christophe Fergeau on 2017-03-10 16:56:41 UTC ---

This spice-gtk scratch build should send keepalives every 10 minutes, can you test it Radek and see if it helps with the hang you are seeing?
http://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12738921

--- Additional comment from Frediano Ziglio on 2017-03-13 09:19:13 UTC ---

> 
> Actual results: After ~30-60 minutes RV hangs. It doesn't respond to any
> keys, a mouse clicks.
> 

I was just reading this line. If there are no activity on the connection and you have some connection tracking on the middle is possibly that some hop close silently the connection not sending any data. However clicks and keys are supposed to generate some traffic on the network. At this point the packets (in this case TCP) should be just dropped. But TCP should have some timeout as is waiting for some packet acknowledge and RV should after a while detect the connection close and TCP will complain (from user level prospective after some timeout the connection get closed by the kernel and notify the user space).
So why this dos not happen sending keys/mouse events? Is the timeout too long? Or RV just silently close the affected connection (input channel)?

--- Additional comment from Radek Duda on 2017-03-13 13:02:30 UTC ---

(In reply to Christophe Fergeau from comment #8)
> This spice-gtk scratch build should send keepalives every 10 minutes, can
> you test it Radek and see if it helps with the hang you are seeing?
> http://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12738921

I installed this spice-gtk package to client (rhel7.3-z) and detected keepalive packets after 10 minutes of inactivity to be send from client to proxy (have proxy enabled). But that is all - after another 10 minutes no any more keepalive packet is sent and guest in remote-viewer becames unresponsive to user inputs. 

Before I didn't detected keepalive at all.

--- Additional comment from Radek Duda on 2017-03-13 13:04 UTC ---

Screenshot from wireshark - after ~10 minutes is sent keepalive packet
10.34.130.192 - client
13.34.73.1 - proxy

--- Additional comment from Frediano Ziglio on 2017-03-13 14:23:01 UTC ---

(In reply to Radek Duda from comment #10)
> (In reply to Christophe Fergeau from comment #8)
> > This spice-gtk scratch build should send keepalives every 10 minutes, can
> > you test it Radek and see if it helps with the hang you are seeing?
> > http://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12738921
> 
> I installed this spice-gtk package to client (rhel7.3-z) and detected
> keepalive packets after 10 minutes of inactivity to be send from client to
> proxy (have proxy enabled). But that is all - after another 10 minutes no
> any more keepalive packet is sent and guest in remote-viewer becames
> unresponsive to user inputs. 
> 
> Before I didn't detected keepalive at all.

There are some weird thing. There's no patch attached so maybe there are some detail missing, however:
- the keepalives are only for a single connection (port 49734, first 2 packets are other stuff), where are the other connections? I have to suppose that either keepalive is enabled for a single connection or that the client is closing other connection but in this case why it hangs?
- it seems that interval is 10 seconds but default for RHEL7.3 is 75 and count is 5 (there are 5 keep alive packets) but the default for RHEL7.3 is 9. Did the code patch changed these settings?

Surely the client is not detecting properly disconnection from the server or not propagate a channel close to other channel. Not really knowledgeable about spice-gtk code.

--- Additional comment from Frediano Ziglio on 2017-03-13 17:20:07 UTC ---

(In reply to Frediano Ziglio from comment #12)
> (In reply to Radek Duda from comment #10)
> > (In reply to Christophe Fergeau from comment #8)
> > > This spice-gtk scratch build should send keepalives every 10 minutes, can
> > > you test it Radek and see if it helps with the hang you are seeing?
> > > http://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12738921
> > 
> > I installed this spice-gtk package to client (rhel7.3-z) and detected
> > keepalive packets after 10 minutes of inactivity to be send from client to
> > proxy (have proxy enabled). But that is all - after another 10 minutes no
> > any more keepalive packet is sent and guest in remote-viewer becames
> > unresponsive to user inputs. 
> > 
> > Before I didn't detected keepalive at all.
> 
> There are some weird thing. There's no patch attached so maybe there are
> some detail missing, however:
> - the keepalives are only for a single connection (port 49734, first 2
> packets are other stuff), where are the other connections? I have to suppose
> that either keepalive is enabled for a single connection or that the client
> is closing other connection but in this case why it hangs?
> - it seems that interval is 10 seconds but default for RHEL7.3 is 75 and
> count is 5 (there are 5 keep alive packets) but the default for RHEL7.3 is
> 9. Did the code patch changed these settings?
> 
> Surely the client is not detecting properly disconnection from the server or
> not propagate a channel close to other channel. Not really knowledgeable
> about spice-gtk code.

Ignore...
Our keepalive packets are the first 2, the ones to 13.34.73.1, the proxy.
2 for 2 different connections (ports 49942 and 49944).
But this seems to indicate that the client closed the connection to the proxy as we should have other keepalives. But this should be visible to the user too.

--- Additional comment from Christophe Fergeau on 2017-03-24 17:20:32 UTC ---

Most likely not solvable with a client-side TCP keepalive change as I reproduced this on a system with
# cat /proc/sys/net/ipv4/tcp_keepalive_time 
120

--- Additional comment from Christophe Fergeau on 2017-03-24 17:23:44 UTC ---

I reproduced using the rhel74-CLW VM from http://rhv41.spice.brq.redhat.com/ with a proxy set for the console.

--- Additional comment from Christophe Fergeau on 2018-12-17 10:59:19 UTC ---

Adding TCP keepalive was unfortunately not enough, but I'd fix this both in el7 and el8

--- Additional comment from David Blechter on 2019-06-11 14:50:27 UTC ---

moving to 7.8. Too late for 7.7. We need to solve it in rhel 8 as well

--- Additional comment from Frediano Ziglio on 2019-06-11 18:57:34 UTC ---

Both client and server implemented keep-alives. Just a matter of backport needed changesets.

--- Additional comment from Frediano Ziglio on 2019-06-12 11:35:16 UTC ---

spice-server in RHEL 7.7 already has the support for keepalive at tcp level.
Note that having a proxy in the middle both server and client must have support to keepalive in order to keep the proxy tunnel alive.
Patch for client is:

commit 677782fb6aa471d5e6d007744a5c6564b1f3021f
Author: Jeremy White <jwhite>
Date:   Tue Apr 30 17:04:59 2019 -0500

    Detect timeout conditions more aggressively on Linux
    
    This mitigates a fairly rare problem we see with our kiosk mode clients.
    That is, normally if something goes wrong with a client connection
    (e.g. the session is killed, or the server is restarted ), the kiosk will
    exit on disconnect, and we get a chance to retry the connection, or
    present the user with a 'server down' style message.
    
    But in the case of a serious network problem or a server hard power
    cycle (i.e. no TCP FIN packets can flow), our end user behavior is not
    ideal - the kiosk appears to hang solid, requiring a power cycle.
    
    That's because we've got the stock keepalive timeouts, or about 2 hours
    and 11 minutes, before the client sees the disconnect.
    
    This change will cause the client to recognize the server has vanished
    without a TCP FIN after 75 seconds.
    
    See this thread:
      https://lists.freedesktop.org/archives/spice-devel/2017-March/036553.html
    
    As well as this bug:
      https://bugzilla.redhat.com/show_bug.cgi?id=1436589
    
    Signed-off-by: Jeremy White <jwhite>


which is included in spice-gtk 0.37 version. The version ditributed with RHEL 7.7 is 0.35 so the patch needs to be backported on the client.

--- Additional comment from David Blechter on 2019-06-12 11:48:02 UTC ---

(In reply to Frediano Ziglio from comment #19)
> spice-server in RHEL 7.7 already has the support for keepalive at tcp level.
> Note that having a proxy in the middle both server and client must have
> support to keepalive in order to keep the proxy tunnel alive.
> Patch for client is:
> 
> commit 677782fb6aa471d5e6d007744a5c6564b1f3021f
> Author: Jeremy White <jwhite>
> Date:   Tue Apr 30 17:04:59 2019 -0500
> 
>     Detect timeout conditions more aggressively on Linux
>     
>     This mitigates a fairly rare problem we see with our kiosk mode clients.
>     That is, normally if something goes wrong with a client connection
>     (e.g. the session is killed, or the server is restarted ), the kiosk will
>     exit on disconnect, and we get a chance to retry the connection, or
>     present the user with a 'server down' style message.
>     
>     But in the case of a serious network problem or a server hard power
>     cycle (i.e. no TCP FIN packets can flow), our end user behavior is not
>     ideal - the kiosk appears to hang solid, requiring a power cycle.
>     
>     That's because we've got the stock keepalive timeouts, or about 2 hours
>     and 11 minutes, before the client sees the disconnect.
>     
>     This change will cause the client to recognize the server has vanished
>     without a TCP FIN after 75 seconds.
>     
>     See this thread:
>      
> https://lists.freedesktop.org/archives/spice-devel/2017-March/036553.html
>     
>     As well as this bug:
>       https://bugzilla.redhat.com/show_bug.cgi?id=1436589
>     
>     Signed-off-by: Jeremy White <jwhite>
> 
> 
> which is included in spice-gtk 0.37 version. The version distributed with
> RHEL 7.7 is 0.35 so the patch needs to be backported on the client.

Does it mean that spice server is up to date, and this Bz should be closed as Current release ( 7.7 )? New BZ for spice-gtk should be filed for 7.8 to include the upstream patch. It is too late for 7.7, and it is not a blocker for 7.7

--- Additional comment from Frediano Ziglio on 2019-06-12 11:55:39 UTC ---

> 
> Does it mean that spice server is up to date, and this Bz should be closed
> as Current release ( 7.7 )? New BZ for spice-gtk should be filed for 7.8 to
> include the upstream patch. It is too late for 7.7, and it is not a blocker
> for 7.7

Yes, it seems reasonable.

--- Additional comment from Frediano Ziglio on 2019-06-12 12:00 UTC ---

Attached patch for spice-gtk (rhel 7.7)

Comment 2 Frediano Ziglio 2019-07-18 09:33:37 UTC
I already backported the required patch.
Missing flag for 7.8 and branch on rhpkg.
Note that RHEL 8.1 is using spice-gtk 0.37 which already contain the required patch.

Comment 3 Victor Toso 2019-09-30 13:48:04 UTC
Based on comment #2

Comment 8 Frediano Ziglio 2019-11-15 15:31:34 UTC
Posted proposal fix at https://gitlab.freedesktop.org/spice/spice/merge_requests/11.

Comment 15 errata-xmlrpc 2020-03-31 20:09:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1170


Note You need to log in before you can comment on or make changes to this bug.