Bug 1064495

Summary: spice-vdagent-win: don't terminate and re-start vdagent when client disconnects
Product: Red Hat Enterprise Linux 8 Reporter: Jonathon Jongsma <jjongsma>
Component: spice-vdagent-winAssignee: Jonathon Jongsma <jjongsma>
Status: CLOSED ERRATA QA Contact: SPICE QE bug list <spice-qe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: apinnick, astepano, bsanford, cfergeau, dblechte, djasa, fziglio, jjongsma, jraju, mkrcmari, rbalakri, rh-spice-bugs, srevivo, tpelka, uril
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdagent-win-4.1.5 mingw-spice-vdagent-0.9.0-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1018145 Environment:
Last Closed: 2018-05-15 18:02:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Spice RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1009648, 1431074, 1431091    

Description Jonathon Jongsma 2014-02-12 17:29:07 UTC
The vdagent for linux is tied to the lifetime of the guest session, but the windows vdagent is terminated when a client disconnects and then gets restarted.  This causes a race condition when a client gets disconnected by another client connecting to the guest.  The windows vdagent should probably be changed to match the linux vdagent behavior.



+++ This bug was initially created as a clone of Bug #1018145 +++

Description of problem:
The monitor auto configuration doesn't work reliably. The reproducers are not 100% yielding different results in otherwise unchanged environment. When there is another client connected and the new client takes over its session, the problems are more likely to occur.
What always works:
1) fullscreening window with existing monitors
2) enabling another guest monitors (when guest monitor count < client monitor count)

What sometimes does work and sometimes doesn't:
3) disabling another guest monitor (when active guest monitor count > client monitor count)
4) when 2 or more guest monitors get enabled at the same time, they get enabled one-by-one and in seemingly random layout that sometime gets fixed afterwards. This causes lots of flicker before things settle down
5) this is not actually a subject of this bug but it may shed a bit more light: when connecting without full screen to a client that was previously run with full-screen=auto-conf, screens 2-4 are displayed in tiny windows (my guesstimate: 50x150)

Based on 4) and the greater odds of hitting the problems when taking over existing session, I'd suspect that --full-screen=auto-conf is actually somehow racy - doing interdependent actions in parallel instead of doing them in one go (or if that is not possible, in order where latter action won't override former one)

Version-Release number of selected component (if applicable):
mingw-virt-viewer-0.5.6-6.el6 @ Windows 7 32b
vdagent-win-3.3-1 @ Windows 7 64b

How reproducible:
sometimes not, frequently yes

Steps to Reproduce:
1. connect with --full-screen=auto-conf to a guest:
    * with more enabled displays than the client has
    * with some client already connected (taking over its session)
2.
3.

Actual results:
  * "redundant" guest monitors don't get disabled
  * it takes quite some time before monitors settle

Expected results:
  * guest display count always matches client monitor count
  * display geometry is reconfigured just once - guest monitors flashes just once and that's it

Additional info:
part of this bug was already described in bug 973664 comment 7.

--- Additional comment from Marc-Andre Lureau on 2013-11-07 08:52:59 EST ---

I have not been able to reproduce this issue locally, but I can imagine some of the reasons we still have configuration races over WAN. I would need a remote VM to conduct more testing (that can be arranged easily I guess). This is probably not mingw specific, and although it's quite annoying that it just doesn't work all the time, I think we can minor the relative importance of this bug, since the user should be able to accommodate his configuration manually. So I'd move to 3.4, where we are still working on other multi-monitors bugs.

--- Additional comment from David Blechter on 2013-12-08 15:47:25 EST ---

moving to 3.4 as no blocker flag proposed, never has a pm ack,  and we are at RC phase.

--- Additional comment from Jonathon Jongsma on 2013-12-10 15:36:40 EST ---

I can reproduce this here.

--- Additional comment from Jonathon Jongsma on 2013-12-10 16:40:00 EST ---

From my initial investigation, it seems that the root problem is probably that the windows vdagent gets restarted whenever a client disconnects. So when client B connects to the guest, it disconnects client A which triggers the vdagent to restart itself.

[log excerpt from windows vdagent.log]
2940::INFO::2013-12-10 15:05:26,549::set_displays::Set display mode 0x0
2940::INFO::2013-12-10 15:10:05,893::dispatch_message::Client disconnected, agent to be restarted
2940::INFO::2013-12-10 15:10:05,893::handle_control_event::Control command 0
2940::INFO::2013-12-10 15:10:05,893::run::Agent stopped
288::INFO::2013-12-10 15:10:08,909::run::***Agent started in session 1***
288::INFO::2013-12-10 15:10:08,909::log_version::0.5.1.0

But according to the virt-viewer debug logs, client B does actually see the vdagent as connected initially, but then gets a 'agent disconnected' message soon thereafter:

[log excerpt from client]
(virt-viewer:8673): virt-viewer-DEBUG: main channel: opened
(virt-viewer:8673): GSpice-DEBUG: spice-channel.c:1104 main-1:0: channel up, state 2
(virt-viewer:8673): GSpice-DEBUG: spice-session.c:1972 set mm time: 616807192
(virt-viewer:8673): GSpice-DEBUG: spice-session.c:1975 spice_session_set_mm_time: mm-time-reset, old 954421725, new 616807192
(virt-viewer:8673): GSpice-DEBUG: channel-main.c:1375 agent connected: yes
(virt-viewer:8673): GSpice-DEBUG: spice-gtk-session.c:464 clipboard_get_targets:
(virt-viewer:8673): GSpice-DEBUG: spice-gtk-session.c:464 clipboard_get_targets:
(virt-viewer:8673): GSpice-DEBUG: channel-main.c:1375 agent connected: no
(virt-viewer:8673): GSpice-DEBUG: spice-channel.c:127 usbredir-9:0: spice_channel_constructed
(virt-viewer:8673): virt-viewer-DEBUG: New spice channel 0x263ac00 SpiceUsbredirChannel 0
(virt-viewer:8673): GSpice-DEBUG: spice-channel.c:127 record-6:0: spice_channel_constructed
(virt-viewer:8673): virt-viewer-DEBUG: New spice channel 0x2633180 SpiceRecordChannel 0
(virt-viewer:8673): GSpice-DEBUG: spice-channel.c:127 playback-5:0: spice_channel_constructed
(virt-viewer:8673): virt-viewer-DEBUG: New spice channel 0x26c2000 SpicePlaybackChannel 0
(virt-viewer:8673): virt-viewer-DEBUG: new audio channel
(virt-viewer:8673): GSpice-DEBUG: spice-channel.c:127 cursor-4:3: spice_channel_constructed
(virt-viewer:8673): virt-viewer-DEBUG: New spice channel 0x26c9c90 SpiceCursorChannel 3
(virt-viewer:8673): GSpice-DEBUG: spice-channel.c:127 display-2:3: spice_channel_constructed
(virt-viewer:8673): virt-viewer-DEBUG: New spice channel 0x26cc7c0 SpiceDisplayChannel 3
...

Then some time later, we get a notification that the agent is again connected.  Perhaps virt-viewer attempts to do auto-conf in the ~3 seconds between the time that the agent disappears and re-appears again. That would probably explain some of the racy-ness. Investigation continues.

--- Additional comment from Jonathon Jongsma on 2013-12-11 16:17:53 EST ---

After initially reproducing this bug, I had quite a bit of trouble re-reproducing it.  But I finally managed to. I can reproduce the racy behavior with e.g. the remote-viewer version from rhel6.  But I cannot reproduce it with virt-viewer from git master.  The way that fullscreen auto-conf is done has changed quite a lot upstream, so I'm quite sure that virt-viewer is no longer susceptible to this particular issue. Unfortunately the changes to this particular functionality touch quite a bit of code, so backporting a patch for this issue may not be trivial.

However, It seems to me that it's not necessarily desirable for the windows vdagent to restart itself upon client disconnection. Especially when the client disconnection is caused by another client connection, this will inevitably be a bit racy. The Linux vdagent does not behave this way -- it simply cleans up some things such as the clipboard / file transfers, etc when a client disconnects. So perhaps it would be useful to re-assign this to the windows vdagent to change this behavior?

FYI, here's the commit that caused the windows agent to restart itself on client disconnections: 

http://cgit.freedesktop.org/spice/win32/vd_agent/commit/?id=e55103589a62460a1128625740ec8c9126adeb64

--- Additional comment from Jonathon Jongsma on 2013-12-12 12:36:13 EST ---

Since upstream virt-viewer isn't susceptible to this issue, I'm moving this to POST, and we can re-test it after we re-base.

Comment 3 Sandro Bonazzola 2015-10-26 12:39:49 UTC
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 4 Yaniv Lavi 2016-05-09 11:07:19 UTC
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

Comment 10 Jonathon Jongsma 2017-03-16 15:40:53 UTC
potential patch posted: https://lists.freedesktop.org/archives/spice-devel/2017-March/036653.html

Comment 11 Jonathon Jongsma 2017-03-17 21:55:43 UTC
*** Bug 1431074 has been marked as a duplicate of this bug. ***

Comment 12 Jonathon Jongsma 2017-03-17 22:04:34 UTC
*** Bug 1431091 has been marked as a duplicate of this bug. ***

Comment 13 Jonathon Jongsma 2017-04-05 17:12:23 UTC
pushed upstream as a2b4da90a1206c2e9d89fb17037321b73a057075

Comment 15 Bill Sanford 2018-03-21 12:35:18 UTC
Was on wrong bug to verify

Comment 16 Bill Sanford 2018-03-21 12:35:32 UTC
Was on wrong bug to verify

Comment 17 Bill Sanford 2018-03-21 12:40:12 UTC
before bugfix there was 1852::INFO::2018-03-21 12:29:03,448::run::Agent stopped
2324::INFO::2018-03-21 12:30:50,089::run::***Agent started in session 1***

Now there is:
dispatch_message::Client disconnected, resetting agent state

Comment 21 errata-xmlrpc 2018-05-15 18:02:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1544

Comment 22 Franta Kust 2019-05-16 13:04:29 UTC
BZ<2>Jira Resync