Bug 732073 - Vdagent responds very slowly
Summary: Vdagent responds very slowly
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: spice-vdagent-win
Version: ---
Hardware: Unspecified
OS: Windows
unspecified
high
Target Milestone: rc
: ---
Assignee: Arnon Gilboa
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On: 572483 714908
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-19 17:23 UTC by Marian Krcmarik
Modified: 2019-10-10 14:19 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-09-21 16:59:35 UTC
Type: ---
Target Upstream Version:


Attachments (Terms of Use)
vdservice.log (13.46 KB, text/plain)
2011-08-19 17:24 UTC, Marian Krcmarik
no flags Details
/vdagent.log (55.84 KB, text/plain)
2011-08-19 17:25 UTC, Marian Krcmarik
no flags Details
vdservice (2).log (12.16 KB, text/plain)
2011-08-19 17:26 UTC, Marian Krcmarik
no flags Details
vdagent (2).log (74.29 KB, text/plain)
2011-08-19 17:26 UTC, Marian Krcmarik
no flags Details
vdservice (3).log (2.25 KB, text/plain)
2011-08-19 17:27 UTC, Marian Krcmarik
no flags Details

Description Marian Krcmarik 2011-08-19 17:23:08 UTC
Description of problem:
Vdagent becomes slowly responsive after leaving VM untouched (spice client disconnected) for a while - may be even for hours. It is visible when invoking features which vdagent is responsible for:
- reaction on the click of mouse is very slow
- changing resolution to client's native when connecting to VM is very slow - conenction is very slow but successful
- copy-paste is very slow - even copying several characters between guest and client can take 10 seconds.

* Restarting of service does not help.
* Stopping service helps -> server mode works
* I can reproduce that on Windows7 (both 32/64bit) - so far not on WinXP. 

I am attaching vdservice/vdagent logs of 3 different VMs:
vdservice.log/vdagent.log - behaviour occurs, stop service, star service, behaviour still occurs, stop service, mouse works -> server mode.
vdservice (2).log/vdagent (2).log - behaviour occurs (I made empty line in log), disconnecting spice client, connecting spice client, behaviour still occurs.
vdservice (3).log - behaviour occurs

Version-Release number of selected component (if applicable):
rhev-guest-tools-iso-3.0-14.noarch, which contains vdagent-win-0.1-8.
RHEVM3.0 (ic136)
Guests: Windows7 with latest updates
Host: RHEL6.1 with hybrid repo

How reproducible:
Often

Steps to Reproduce:
1. Start Windows7 guest with installed RHEV tools.
2. Connect to the guest with spice client.
3. Disconnect.
4. Connect again after some time (hours).
  
Actual results:
vdagent responds slowly - provided features (copy-paste, automatic adjust of resolution, client mouse behaves slowly).

Additional info:
This is only a shot but It seems that It occurs when using the "2.2" based driver qxl-win-0.1-9. I updated to the one from spice space (0.1-10) on the guest It occurs permanently and was not able to reproduce.
I could provide such VM with broken vdagent.

Comment 1 Marian Krcmarik 2011-08-19 17:24:52 UTC
Created attachment 519071 [details]
vdservice.log

Comment 2 Marian Krcmarik 2011-08-19 17:25:34 UTC
Created attachment 519072 [details]
/vdagent.log

Comment 3 Marian Krcmarik 2011-08-19 17:26:06 UTC
Created attachment 519073 [details]
vdservice (2).log

Comment 4 Marian Krcmarik 2011-08-19 17:26:34 UTC
Created attachment 519074 [details]
vdagent (2).log

Comment 5 Marian Krcmarik 2011-08-19 17:27:08 UTC
Created attachment 519075 [details]
vdservice (3).log

Comment 6 Andrew Cathrow 2011-08-21 00:10:19 UTC
Might this relate to the virtio-serial/S3 issue?

Comment 7 Marian Krcmarik 2011-08-22 08:21:46 UTC
(In reply to comment #0)

> Additional info:
> This is only a shot but It seems that It occurs when using the "2.2" based
> driver qxl-win-0.1-9. I updated to the one from spice space (0.1-10) on the
> guest It occurs permanently and was not able to reproduce.

Scratch this observation. I m able to reproduce with 01.-10 qxl-win, It seems more to be related to virtio driver + qxl driver combination.

> I could provide such VM with broken vdagent.

Comment 8 Arnon Gilboa 2011-08-29 12:29:04 UTC
(In reply to comment #7)
> (In reply to comment #0)
> 
> > Additional info:
> > This is only a shot but It seems that It occurs when using the "2.2" based
> > driver qxl-win-0.1-9. I updated to the one from spice space (0.1-10) on the
> > guest It occurs permanently and was not able to reproduce.
> 
> Scratch this observation. I m able to reproduce with 01.-10 qxl-win, It seems
> more to be related to virtio driver + qxl driver combination.
> 
> > I could provide such VM with broken vdagent.

"Restarting of service does not help", seems like it's not a vdagent issue. I have a win7x64 guest with latest vdagent, up for more than a week now, with no responsiveness issues. I guess the virtio-serial driver is much older than yours (6.1.7600.16385 2/21/2011). Please try to repro with older virtio-serial driver to see if the regression is there.

Comment 9 Marian Krcmarik 2011-08-29 17:46:46 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #0)
> > 
> > > Additional info:
> > > This is only a shot but It seems that It occurs when using the "2.2" based
> > > driver qxl-win-0.1-9. I updated to the one from spice space (0.1-10) on the
> > > guest It occurs permanently and was not able to reproduce.
> > 
> > Scratch this observation. I m able to reproduce with 01.-10 qxl-win, It seems
> > more to be related to virtio driver + qxl driver combination.
> > 
> > > I could provide such VM with broken vdagent.
> 
> "Restarting of service does not help", seems like it's not a vdagent issue. I
> have a win7x64 guest with latest vdagent, up for more than a week now, with no
> responsiveness issues. I guess the virtio-serial driver is much older than
> yours (6.1.7600.16385 2/21/2011). Please try to repro with older virtio-serial
> driver to see if the regression is there.

I am not able to reproduce with older virtio-serial driver (virtio-win-prewhql-0.1-10) so it seems to be virtio-serial issue with vdagent.

Comment 10 Arnon Gilboa 2011-09-04 14:45:08 UTC
Vadim, please give it a look. Suspected recent changes in the driver? Any insights?

Comment 11 Vadim Rozenfeld 2011-09-04 15:35:26 UTC
(In reply to comment #10)
> Vadim, please give it a look. Suspected recent changes in the driver? Any
> insights?

Cannot say it for sure, but it should be easy to check.
If increasing \HKLM\System\CurrentControlSet\Control\WaitToKillServiceTimeout
registry value will make vdagent even less responsive - then it is a problem in the read request completion logic.

Comment 12 Marian Krcmarik 2011-09-05 13:06:20 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Vadim, please give it a look. Suspected recent changes in the driver? Any
> > insights?
> 
> Cannot say it for sure, but it should be easy to check.
> If increasing \HKLM\System\CurrentControlSet\Control\WaitToKillServiceTimeout
> registry value will make vdagent even less responsive - then it is a problem in
> the read request completion logic.

I've reproduced with virtio-win-prewhql-0.1-15 and as well I adjusted the registry item WaitToKillServiceTimeo from 12000 to 48000 and it seems that responsiveness of vdagent is much lower. It's hard to measure but It's visible from user's persepctive.

Comment 13 Vadim Rozenfeld 2011-09-05 13:30:06 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > Vadim, please give it a look. Suspected recent changes in the driver? Any
> > > insights?
> > 
> > Cannot say it for sure, but it should be easy to check.
> > If increasing \HKLM\System\CurrentControlSet\Control\WaitToKillServiceTimeout
> > registry value will make vdagent even less responsive - then it is a problem in
> > the read request completion logic.
> 
> I've reproduced with virtio-win-prewhql-0.1-15 and as well I adjusted the
> registry item WaitToKillServiceTimeo from 12000 to 48000 and it seems that
> responsiveness of vdagent is much lower. It's hard to measure but It's visible
> from user's persepctive.

Thank you, Marian.

Arnon, is it urgent? 
If yes, we can try to fix it in 6.2.
If not, I would like to fix the problem in 6.2.z./6.3.

btw, is it OK if I'll assign this bug to myself, or would you like 
to create a new bug against virtio-win?


Best,
Vadim.

Comment 14 Arnon Gilboa 2011-09-05 13:42:11 UTC
(In reply to comment #13)

> 
> Thank you, Marian.
> 
> Arnon, is it urgent? 
> If yes, we can try to fix it in 6.2.
> If not, I would like to fix the problem in 6.2.z./6.3.
> 
> btw, is it OK if I'll assign this bug to myself, or would you like 
> to create a new bug against virtio-win?
> 
> 
> Best,
> Vadim.

Vadim, to me it looks like an urgent issue, because the guest becomes almost unusable after a while (when using the agent). I guess acathrow can give us a better answer.

BTW, what about rhev agent? isn't it reproduced there?

What do you mean by "problem in the read request completion logic"? can we patch something in the agent side due to driver changes (in addition to re-read on timeout which we fixed in the past) or it seems like a driver bug?

Comment 15 Vadim Rozenfeld 2011-09-05 14:13:42 UTC
(In reply to comment #14)
> (In reply to comment #13)
> 
> > 
> > Thank you, Marian.
> > 
> > Arnon, is it urgent? 
> > If yes, we can try to fix it in 6.2.
> > If not, I would like to fix the problem in 6.2.z./6.3.
> > 
> > btw, is it OK if I'll assign this bug to myself, or would you like 
> > to create a new bug against virtio-win?
> > 
> > 
> > Best,
> > Vadim.
> 
> Vadim, to me it looks like an urgent issue, because the guest becomes almost

I totally agree with you. I would prefer to fix it right now, even if it will cost us several days delay before starting WHQL submission. 

> unusable after a while (when using the agent). I guess acathrow can give us a
> better answer.
> 
> BTW, what about rhev agent? isn't it reproduced there?
I don't know, but it should be.
> 
> What do you mean by "problem in the read request completion logic"? can we
> patch something in the agent side due to driver changes (in addition to re-read
> on timeout which we fixed in the past) or it seems like a driver bug?

It smells like a driver's bug, and it must be fixed in driver.
But you can try reading a port with zero or very short (60...100 millisecond) wait period on the first iteration of a port reading loop, right after (re-)opening this port, and then switch back to the normal operation waiting period.

Comment 16 Marian Krcmarik 2011-09-06 15:17:06 UTC
It seems that this issue is triggered by sleep state which I have no idea why I missed but maybe because the behaviour of sleep state change:

1. I have a guest with qxl driver - 0.1-9 ("2.2" based) or 0.1-10 ("3.0" based) and virtio-serial driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15), Automatic sleep state is set to (let's say) 5 mins, I close spice session and leave guest alone for (let's say) 10 minutes, then I open Spice session again, Guest is alive and functional, only responsiveness of vdagent is very low.

2. I have a guest with qxl driver - 0.1-9 ("2.2" based) and virtio-serial driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15). I enter sleep state directly from guest Start -> Sleep button and Guest hangs.

3. I have a guest with qxl driver - 0.1-10 ("3.0" based) and virtio-serial driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15). I enter sleep state directly from guest Start -> Sleep button and guest does not hang and after a while "wakes up", guest is functional, only vdagent responds very slowly.

I am sorry I did not notice that, My templates where power management was disabled had to be changed somehow (no idea how and who).

Anyway I am trying to reproduce with power management turned off, just in case, since I believed It was disabled and I checked it earlier.

Comment 17 Marian Krcmarik 2011-09-07 16:18:31 UTC
(In reply to comment #16)
> It seems that this issue is triggered by sleep state which I have no idea why I
> missed but maybe because the behaviour of sleep state change:
> 
> 1. I have a guest with qxl driver - 0.1-9 ("2.2" based) or 0.1-10 ("3.0" based)
> and virtio-serial driver (virtio-win-prewhql-0.1-13 based or
> virtio-win-prewhql-0.1-15), Automatic sleep state is set to (let's say) 5 mins,
> I close spice session and leave guest alone for (let's say) 10 minutes, then I
> open Spice session again, Guest is alive and functional, only responsiveness of
> vdagent is very low.
> 
> 2. I have a guest with qxl driver - 0.1-9 ("2.2" based) and virtio-serial
> driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15). I enter
> sleep state directly from guest Start -> Sleep button and Guest hangs.
> 
> 3. I have a guest with qxl driver - 0.1-10 ("3.0" based) and virtio-serial
> driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15). I enter
> sleep state directly from guest Start -> Sleep button and guest does not hang
> and after a while "wakes up", guest is functional, only vdagent responds very
> slowly.
> 
> I am sorry I did not notice that, My templates where power management was
> disabled had to be changed somehow (no idea how and who).
> 
> Anyway I am trying to reproduce with power management turned off, just in case,
> since I believed It was disabled and I checked it earlier.

Adjusting registry item \HKLM\System\CurrentControlSet\Control\WaitToKillServiceTimeout (as mentioned in comment #11) to very low value (from 12000 to 200), solves problems with very low responsiveness of vdagent in case 1. and 3, (see above).
Once WaitToKillServiceTimeout item has (for example) value 200 then vdagent responds regulary after waking up from sleep state.

Comment 18 Arnon Gilboa 2011-09-08 07:47:51 UTC
1. As Marian mentioned in comment 17, setting the timeout to lower value (200) seems to be a resonable workaround for keeping responsiveness close to normal.
2. Sleep is currently not supported by virtio-serial driver (714908) and vdagent (572483), and until they both support sleep, this BZ seems irrelevant.
3. Before recent changes to the virtio-serial driver (adding timeout to the read operaions - see 725734#c5), a read call by the vdagent after sleep/wakeup was never completed (hang, see BZs in 2.) causing vdagent disfunctionality (mouse unresponsiveness etc.). According to Vadim & Ronen, they are going to revert the driver patches for read timeouts (they were a hack for covering a RHEV agent design issue, which was already fixed), causing the read hang to reappear, making this BZ unreproducable.
 
I guess we should close this bug (resolution?), or move it to virtio-serial.

Comment 19 Cameron Meadors 2011-09-08 12:29:12 UTC
I think we should wait to verify that Arnon's third statement in comment 18 is true before we close this bug.  Moving it to virtio-serial doesn't change whether or not it is fixed or not.

Comment 20 Arnon Gilboa 2011-09-08 12:55:50 UTC
(In reply to comment #19)
> I think we should wait to verify that Arnon's third statement in comment 18 is
> true before we close this bug.  Moving it to virtio-serial doesn't change
> whether or not it is fixed or not.

As comment 18 says, it won't be fixed, but become irrelevant, due to the known sleep bugs. I already tested a scratch build of the reverted virtio-serial driver and it behaves exactly as described in comment 18.3.

Comment 21 Arnon Gilboa 2011-09-08 13:00:55 UTC
Moved to Windows Guest Tools. Should be closed when virtio-serial driver is reverted, verifing this BZ becomes unreproducable (due to vdagent hang on sleep).

Comment 22 Arnon Gilboa 2011-09-12 13:04:18 UTC
Taking it back to spice-vdagent-win, in order to verify it won't re-appear when S3 is supported.

The bug depends on:
714908 - virtio-serial is disfunctional after S3
572483 - vdagent become not functional after guest enters S3 (deps on 714908)

Set hevm-future? instead of rhevm-3.0.

The BZ Will not be relevant before rhevm-3.1, sinch 714908 is scheduled for rhel 6.3.

Comment 23 Marian Krcmarik 2011-09-21 16:31:48 UTC
> Anyway I am trying to reproduce with power management turned off, just in case,
> since I believed It was disabled and I checked it earlier.

No "luck" to reproduce this bug with disabled power management in last 2 weeks in my case. According to my testing The bug is triggered by power management.

Comment 24 David Blechter 2011-09-21 16:59:35 UTC
close as not a bug according to the last comment. 
Thanks Marian for verification.
It is a reflection of the virtio S3 bug


Note You need to log in before you can comment on or make changes to this bug.