| Summary: | Vdagent responds very slowly | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Marian Krcmarik <mkrcmari> | ||||||||||||
| Component: | spice-vdagent-win | Assignee: | Arnon Gilboa <agilboa> | ||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Desktop QE <desktop-qa-list> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | --- | CC: | acathrow, afrenkel, agilboa, bazulay, cmeadors, dblechte, iheim, Rhev-m-bugs, rhod, vrozenfe, ykaul | ||||||||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Windows | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2011-09-21 16:59:35 UTC | Type: | --- | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Bug Depends On: | 572483, 714908 | ||||||||||||||
| Bug Blocks: | |||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Marian Krcmarik
2011-08-19 17:23:08 UTC
Created attachment 519071 [details]
vdservice.log
Created attachment 519072 [details]
/vdagent.log
Created attachment 519073 [details]
vdservice (2).log
Created attachment 519074 [details]
vdagent (2).log
Created attachment 519075 [details]
vdservice (3).log
Might this relate to the virtio-serial/S3 issue? (In reply to comment #0) > Additional info: > This is only a shot but It seems that It occurs when using the "2.2" based > driver qxl-win-0.1-9. I updated to the one from spice space (0.1-10) on the > guest It occurs permanently and was not able to reproduce. Scratch this observation. I m able to reproduce with 01.-10 qxl-win, It seems more to be related to virtio driver + qxl driver combination. > I could provide such VM with broken vdagent. (In reply to comment #7) > (In reply to comment #0) > > > Additional info: > > This is only a shot but It seems that It occurs when using the "2.2" based > > driver qxl-win-0.1-9. I updated to the one from spice space (0.1-10) on the > > guest It occurs permanently and was not able to reproduce. > > Scratch this observation. I m able to reproduce with 01.-10 qxl-win, It seems > more to be related to virtio driver + qxl driver combination. > > > I could provide such VM with broken vdagent. "Restarting of service does not help", seems like it's not a vdagent issue. I have a win7x64 guest with latest vdagent, up for more than a week now, with no responsiveness issues. I guess the virtio-serial driver is much older than yours (6.1.7600.16385 2/21/2011). Please try to repro with older virtio-serial driver to see if the regression is there. (In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #0) > > > > > Additional info: > > > This is only a shot but It seems that It occurs when using the "2.2" based > > > driver qxl-win-0.1-9. I updated to the one from spice space (0.1-10) on the > > > guest It occurs permanently and was not able to reproduce. > > > > Scratch this observation. I m able to reproduce with 01.-10 qxl-win, It seems > > more to be related to virtio driver + qxl driver combination. > > > > > I could provide such VM with broken vdagent. > > "Restarting of service does not help", seems like it's not a vdagent issue. I > have a win7x64 guest with latest vdagent, up for more than a week now, with no > responsiveness issues. I guess the virtio-serial driver is much older than > yours (6.1.7600.16385 2/21/2011). Please try to repro with older virtio-serial > driver to see if the regression is there. I am not able to reproduce with older virtio-serial driver (virtio-win-prewhql-0.1-10) so it seems to be virtio-serial issue with vdagent. Vadim, please give it a look. Suspected recent changes in the driver? Any insights? (In reply to comment #10) > Vadim, please give it a look. Suspected recent changes in the driver? Any > insights? Cannot say it for sure, but it should be easy to check. If increasing \HKLM\System\CurrentControlSet\Control\WaitToKillServiceTimeout registry value will make vdagent even less responsive - then it is a problem in the read request completion logic. (In reply to comment #11) > (In reply to comment #10) > > Vadim, please give it a look. Suspected recent changes in the driver? Any > > insights? > > Cannot say it for sure, but it should be easy to check. > If increasing \HKLM\System\CurrentControlSet\Control\WaitToKillServiceTimeout > registry value will make vdagent even less responsive - then it is a problem in > the read request completion logic. I've reproduced with virtio-win-prewhql-0.1-15 and as well I adjusted the registry item WaitToKillServiceTimeo from 12000 to 48000 and it seems that responsiveness of vdagent is much lower. It's hard to measure but It's visible from user's persepctive. (In reply to comment #12) > (In reply to comment #11) > > (In reply to comment #10) > > > Vadim, please give it a look. Suspected recent changes in the driver? Any > > > insights? > > > > Cannot say it for sure, but it should be easy to check. > > If increasing \HKLM\System\CurrentControlSet\Control\WaitToKillServiceTimeout > > registry value will make vdagent even less responsive - then it is a problem in > > the read request completion logic. > > I've reproduced with virtio-win-prewhql-0.1-15 and as well I adjusted the > registry item WaitToKillServiceTimeo from 12000 to 48000 and it seems that > responsiveness of vdagent is much lower. It's hard to measure but It's visible > from user's persepctive. Thank you, Marian. Arnon, is it urgent? If yes, we can try to fix it in 6.2. If not, I would like to fix the problem in 6.2.z./6.3. btw, is it OK if I'll assign this bug to myself, or would you like to create a new bug against virtio-win? Best, Vadim. (In reply to comment #13) > > Thank you, Marian. > > Arnon, is it urgent? > If yes, we can try to fix it in 6.2. > If not, I would like to fix the problem in 6.2.z./6.3. > > btw, is it OK if I'll assign this bug to myself, or would you like > to create a new bug against virtio-win? > > > Best, > Vadim. Vadim, to me it looks like an urgent issue, because the guest becomes almost unusable after a while (when using the agent). I guess acathrow can give us a better answer. BTW, what about rhev agent? isn't it reproduced there? What do you mean by "problem in the read request completion logic"? can we patch something in the agent side due to driver changes (in addition to re-read on timeout which we fixed in the past) or it seems like a driver bug? (In reply to comment #14) > (In reply to comment #13) > > > > > Thank you, Marian. > > > > Arnon, is it urgent? > > If yes, we can try to fix it in 6.2. > > If not, I would like to fix the problem in 6.2.z./6.3. > > > > btw, is it OK if I'll assign this bug to myself, or would you like > > to create a new bug against virtio-win? > > > > > > Best, > > Vadim. > > Vadim, to me it looks like an urgent issue, because the guest becomes almost I totally agree with you. I would prefer to fix it right now, even if it will cost us several days delay before starting WHQL submission. > unusable after a while (when using the agent). I guess acathrow can give us a > better answer. > > BTW, what about rhev agent? isn't it reproduced there? I don't know, but it should be. > > What do you mean by "problem in the read request completion logic"? can we > patch something in the agent side due to driver changes (in addition to re-read > on timeout which we fixed in the past) or it seems like a driver bug? It smells like a driver's bug, and it must be fixed in driver. But you can try reading a port with zero or very short (60...100 millisecond) wait period on the first iteration of a port reading loop, right after (re-)opening this port, and then switch back to the normal operation waiting period. It seems that this issue is triggered by sleep state which I have no idea why I missed but maybe because the behaviour of sleep state change:
1. I have a guest with qxl driver - 0.1-9 ("2.2" based) or 0.1-10 ("3.0" based) and virtio-serial driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15), Automatic sleep state is set to (let's say) 5 mins, I close spice session and leave guest alone for (let's say) 10 minutes, then I open Spice session again, Guest is alive and functional, only responsiveness of vdagent is very low.
2. I have a guest with qxl driver - 0.1-9 ("2.2" based) and virtio-serial driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15). I enter sleep state directly from guest Start -> Sleep button and Guest hangs.
3. I have a guest with qxl driver - 0.1-10 ("3.0" based) and virtio-serial driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15). I enter sleep state directly from guest Start -> Sleep button and guest does not hang and after a while "wakes up", guest is functional, only vdagent responds very slowly.
I am sorry I did not notice that, My templates where power management was disabled had to be changed somehow (no idea how and who).
Anyway I am trying to reproduce with power management turned off, just in case, since I believed It was disabled and I checked it earlier.
(In reply to comment #16) > It seems that this issue is triggered by sleep state which I have no idea why I > missed but maybe because the behaviour of sleep state change: > > 1. I have a guest with qxl driver - 0.1-9 ("2.2" based) or 0.1-10 ("3.0" based) > and virtio-serial driver (virtio-win-prewhql-0.1-13 based or > virtio-win-prewhql-0.1-15), Automatic sleep state is set to (let's say) 5 mins, > I close spice session and leave guest alone for (let's say) 10 minutes, then I > open Spice session again, Guest is alive and functional, only responsiveness of > vdagent is very low. > > 2. I have a guest with qxl driver - 0.1-9 ("2.2" based) and virtio-serial > driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15). I enter > sleep state directly from guest Start -> Sleep button and Guest hangs. > > 3. I have a guest with qxl driver - 0.1-10 ("3.0" based) and virtio-serial > driver (virtio-win-prewhql-0.1-13 based or virtio-win-prewhql-0.1-15). I enter > sleep state directly from guest Start -> Sleep button and guest does not hang > and after a while "wakes up", guest is functional, only vdagent responds very > slowly. > > I am sorry I did not notice that, My templates where power management was > disabled had to be changed somehow (no idea how and who). > > Anyway I am trying to reproduce with power management turned off, just in case, > since I believed It was disabled and I checked it earlier. Adjusting registry item \HKLM\System\CurrentControlSet\Control\WaitToKillServiceTimeout (as mentioned in comment #11) to very low value (from 12000 to 200), solves problems with very low responsiveness of vdagent in case 1. and 3, (see above). Once WaitToKillServiceTimeout item has (for example) value 200 then vdagent responds regulary after waking up from sleep state. 1. As Marian mentioned in comment 17, setting the timeout to lower value (200) seems to be a resonable workaround for keeping responsiveness close to normal. 2. Sleep is currently not supported by virtio-serial driver (714908) and vdagent (572483), and until they both support sleep, this BZ seems irrelevant. 3. Before recent changes to the virtio-serial driver (adding timeout to the read operaions - see 725734#c5), a read call by the vdagent after sleep/wakeup was never completed (hang, see BZs in 2.) causing vdagent disfunctionality (mouse unresponsiveness etc.). According to Vadim & Ronen, they are going to revert the driver patches for read timeouts (they were a hack for covering a RHEV agent design issue, which was already fixed), causing the read hang to reappear, making this BZ unreproducable. I guess we should close this bug (resolution?), or move it to virtio-serial. I think we should wait to verify that Arnon's third statement in comment 18 is true before we close this bug. Moving it to virtio-serial doesn't change whether or not it is fixed or not. (In reply to comment #19) > I think we should wait to verify that Arnon's third statement in comment 18 is > true before we close this bug. Moving it to virtio-serial doesn't change > whether or not it is fixed or not. As comment 18 says, it won't be fixed, but become irrelevant, due to the known sleep bugs. I already tested a scratch build of the reverted virtio-serial driver and it behaves exactly as described in comment 18.3. Moved to Windows Guest Tools. Should be closed when virtio-serial driver is reverted, verifing this BZ becomes unreproducable (due to vdagent hang on sleep). Taking it back to spice-vdagent-win, in order to verify it won't re-appear when S3 is supported. The bug depends on: 714908 - virtio-serial is disfunctional after S3 572483 - vdagent become not functional after guest enters S3 (deps on 714908) Set hevm-future? instead of rhevm-3.0. The BZ Will not be relevant before rhevm-3.1, sinch 714908 is scheduled for rhel 6.3.
> Anyway I am trying to reproduce with power management turned off, just in case,
> since I believed It was disabled and I checked it earlier.
No "luck" to reproduce this bug with disabled power management in last 2 weeks in my case. According to my testing The bug is triggered by power management.
close as not a bug according to the last comment. Thanks Marian for verification. It is a reflection of the virtio S3 bug |