Bug 1549425
| Summary: | Getting response from guest-fsfreeze-thaw need about 90s sometimes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | xiagao |
| Component: | virtio-win | Assignee: | Basil Salman <bsalman> |
| virtio-win sub component: | qemu-ga-win | QA Contact: | dehanmeng <demeng> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | medium | CC: | ailan, bsalman, demeng, jinzhao, lijin, vrozenfe, xiagao, yvugenfi |
| Version: | 8.0 | Keywords: | Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 101.2.0 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-04 04:17:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1682882 | ||
| Bug Blocks: | |||
Can this be reproduced with older builds of qemu-ga? What about other OSes? (In reply to Sameeh Jubran from comment #2) > Can this be reproduced with older builds of qemu-ga? What about other OSes? Test build qemu-ga-win-7.4.5-1 on win10-64 guest, still hit this issue. And test it on win8.1-64 guest, didn't hit this issue. mingw-qemu-ga-win-7.5.0-2; win10-64; how reproducible:3/5 qemu-ga-win-7.4.5-1; win10-64; how reproducible:2/5 mingw-qemu-ga-win-7.5.0-2; win8.1-64; how reproducible:0/5 (In reply to xiagao from comment #3) > (In reply to Sameeh Jubran from comment #2) > > Can this be reproduced with older builds of qemu-ga? What about other OSes? > > Test build qemu-ga-win-7.4.5-1 on win10-64 guest, still hit this issue. > And test it on win8.1-64 guest, didn't hit this issue. > > > mingw-qemu-ga-win-7.5.0-2; win10-64; how reproducible:3/5 > > qemu-ga-win-7.4.5-1; win10-64; how reproducible:2/5 > > mingw-qemu-ga-win-7.5.0-2; win8.1-64; how reproducible:0/5 Is this still reproducible with the latest release of qemu-ga?? Test 7.6.2 version on a new installed win10-64 guest.
# nc -U /tmp/helloworld2
{"execute":"guest-ping"}
{"return": {}}
{"execute":"guest-fsfreeze-freeze"}
{"return": 2}
{"execute":"guest-fsfreeze-thaw"}
{"return": 2}
{"execute":"guest-fsfreeze-freeze"}
{"return": 2}
{"execute":"guest-fsfreeze-thaw"}
{"return": 2} //get response quickly for the first two times.
{"execute":"guest-fsfreeze-freeze"}
{"return": 2}
{"execute":"guest-fsfreeze-thaw"} //get response after more than 60s
{"return": 2}
(In reply to xiagao from comment #5) > Test 7.6.2 version on a new installed win10-64 guest. > > # nc -U /tmp/helloworld2 > {"execute":"guest-ping"} > {"return": {}} > {"execute":"guest-fsfreeze-freeze"} > {"return": 2} > {"execute":"guest-fsfreeze-thaw"} > {"return": 2} > {"execute":"guest-fsfreeze-freeze"} > {"return": 2} > {"execute":"guest-fsfreeze-thaw"} > {"return": 2} //get response quickly for the first two > times. > > {"execute":"guest-fsfreeze-freeze"} > {"return": 2} > {"execute":"guest-fsfreeze-thaw"} //get response after more than 60s > > > > {"return": 2} Still hit this issue. (In reply to xiagao from comment #6) > (In reply to xiagao from comment #5) > > Test 7.6.2 version on a new installed win10-64 guest. > > > > # nc -U /tmp/helloworld2 > > {"execute":"guest-ping"} > > {"return": {}} > > {"execute":"guest-fsfreeze-freeze"} > > {"return": 2} > > {"execute":"guest-fsfreeze-thaw"} > > {"return": 2} > > {"execute":"guest-fsfreeze-freeze"} > > {"return": 2} > > {"execute":"guest-fsfreeze-thaw"} > > {"return": 2} //get response quickly for the first two > > times. > > > > {"execute":"guest-fsfreeze-freeze"} > > {"return": 2} > > {"execute":"guest-fsfreeze-thaw"} //get response after more than 60s > > > > > > > > {"return": 2} > > Still hit this issue. Hi, We cannot reproduce this in any way, can you please give us access to a the server and vm that you are reproducing on? I need access to the host and the VM, preferably with instructions on how you are using the setup. Thanks! Reproduce it in mingw-qemu-ga-win-100.0.0.0-3.el7ev.
pkg info:
kernel-4.18.0-80.el8.x86_64
qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8.x86_64
steps:
1.boot up win2019 guest.
2.issue fsfreeze cmd via qga channel at the first time
{"execute":"guest-fsfreeze-freeze" }
{"return": 2}
3.issue fsthaw cmd in **5s**
{"execute":"guest-fsfreeze-thaw" }
{"return": 2}
==========get response quickly, and checked VSS provider service is still running
4.issue fsfreeze cmd via qga channel at the second time
{"execute":"guest-fsfreeze-freeze" }
{"return": 2}
5.issue fsthaw cmd in **5s**
==========get response need **90s**, and checked VSS provider service is still running
Reproduced bug on Windows Server 2019.
Could NOT reproduce on Windows 10 x64 1803.
Error logging when running "qemu-ga.exe -v", verbose mode:
1. Called {"execute":"guest-fsfreeze-freeze"}
1555336917.914973: debug: thread: overlapped result, count_read: 36
1555336917.930482: debug: dispatch
1555336917.930482: debug: read data, count: 36, data: {"execute":"guest-fsfreeze-freeze"}
1555336917.930482: debug: process_event: called
1555336917.946016: debug: processing command
1555336917.946016: info: guest-fsfreeze called
1555336917.946016: debug: disabling command: guest-get-time
1555336917.946016: debug: disabling command: guest-set-time
1555336917.961627: debug: disabling command: guest-shutdown
1555336917.961627: debug: disabling command: guest-file-open
1555336917.961627: debug: disabling command: guest-file-close
1555336917.977243: debug: disabling command: guest-file-read
1555336917.977243: debug: disabling command: guest-file-write
1555336917.977243: debug: disabling command: guest-file-seek
1555336917.992895: debug: disabling command: guest-file-flush
1555336917.992895: debug: disabling command: guest-fsfreeze-freeze
1555336917.992895: debug: disabling command: guest-fsfreeze-freeze-list
1555336918.8538: debug: disabling command: guest-fstrim
1555336918.8538: debug: disabling command: guest-suspend-disk
1555336918.8538: debug: disabling command: guest-suspend-ram
1555336918.24129: debug: disabling command: guest-suspend-hybrid
1555336918.24129: debug: disabling command: guest-network-get-interfaces
1555336918.24129: debug: disabling command: guest-get-vcpus
1555336918.39745: debug: disabling command: guest-set-vcpus
1555336918.39745: debug: disabling command: guest-get-fsinfo
1555336918.39745: debug: disabling command: guest-set-user-password
1555336918.39745: debug: disabling command: guest-get-memory-blocks
1555336918.55385: debug: disabling command: guest-set-memory-blocks
1555336918.55385: debug: disabling command: guest-get-memory-block-info
1555336918.71013: debug: disabling command: guest-exec-status
1555336918.71013: debug: disabling command: guest-exec
1555336918.71013: debug: disabling command: guest-get-host-name
1555336918.71013: debug: disabling command: guest-get-users
1555336918.86623: debug: disabling command: guest-get-timezone
1555336918.86623: debug: disabling command: guest-get-osinfo
1555336918.86623: warning: disabling logging due to filesystem freeze
2. Then called {"execute":"guest-fsfreeze-thaw"}, after something like 90sec
Failed to pCatalog->GetCollection. (Error: 8000402a) The server started, but did not finish initializing in a timely fashion.
Failed to QGAProviderFind. (Error: 8000402a) The server started, but did not finish initializing in a timely fashion.
1555337011.415550: warning: logging re-enabled due to filesystem unfreeze
1555337011.415550: debug: enabling command: guest-get-time
1555337011.431221: debug: enabling command: guest-set-time
1555337011.431221: debug: enabling command: guest-shutdown
1555337011.446780: debug: enabling command: guest-file-close
1555337011.446780: debug: enabling command: guest-file-read
1555337011.446780: debug: enabling command: guest-file-write
1555337011.446780: debug: enabling command: guest-file-seek
1555337011.462420: debug: enabling command: guest-file-flush
1555337011.462420: debug: enabling command: guest-fsfreeze-freeze
1555337011.462420: debug: enabling command: guest-fsfreeze-freeze-list
1555337011.478067: debug: enabling command: guest-fstrim
1555337011.478067: debug: enabling command: guest-suspend-ram
1555337011.493690: debug: enabling command: guest-network-get-interfaces
1555337011.493690: debug: enabling command: guest-get-vcpus
1555337011.493690: debug: enabling command: guest-get-fsinfo
1555337011.493690: debug: enabling command: guest-set-user-password
1555337011.509292: debug: enabling command: guest-get-memory-block-info
1555337011.509292: debug: enabling command: guest-exec-status
1555337011.509292: debug: enabling command: guest-exec
1555337011.524941: debug: enabling command: guest-get-users
1555337011.524941: debug: enabling command: guest-get-timezone
1555337011.540561: debug: enabling command: guest-get-osinfo
It seems that the COM+ Application Server is malfunctioning on Windows Server 2019 for some reason.
After testing, I found that the solution that the current qemu-ga implementation uses the COM+ System Application service to manage the QEMU GA VSS Provider service. My first assessment was that invoking the "fsfreeze-thaw" tries to stop the QEMU GA VSS Provider service using COM+ System Application and fails, which causes the 90sec delay and failure in stopping the QEMU GA VSS Provider service. After further inspection, I found that invoking "fsfreeze-freeze" stops the COM+ System Application service, and this causes the failure to stop the QEMU GA VSS Provider service on the "fsfreeze-thaw" invoke. This failure doesn't happen in 100% of the "fsfreeze-thaw" invokes, and this presents the possibility of a race condition. Now, why invoking "fsfreeze-freeze" stops the COM+ System Application service? It needs further inspection. Hit the similar issue on mingw-qemu-ga-win-101.1.0-1.el7ev with "guest-fsfreeze-freeze-list"
Only hit this issue on Win10-32/64,Win2019
Didn't hit it on Win2016,Win2012,Win8.1-64(not test other guest.)
100% reproduce with automation.
steps:
1. issue "guest-fsfreeze-freeze-list" without parameters
2020-04-10 02:46:51: {"execute": "guest-fsfreeze-freeze-list"}
2020-04-10 02:46:56: {"return": 3}
2020-04-10 02:46:56: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:46:56: {"return": "frozen"}
2020-04-10 02:47:04: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:04: {"return": "frozen"}
2. issue "guest-fsfreeze-thaw" and get the response in 1s.
2020-04-10 02:47:04: {"execute": "guest-fsfreeze-thaw"}
2020-04-10 02:47:05: {"return": 3}
2020-04-10 02:47:05: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:05: {"return": "thawed"}
2020-04-10 02:47:06: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:06: {"return": "thawed"}
2020-04-10 02:47:07: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:07: {"return": "thawed"}
3. issue "guest-fsfreeze-freeze-list" with parameters.
2020-04-10 02:47:07: {"execute": "guest-fsfreeze-freeze-list", "arguments": {"mountpoints": ["C:\\", "F:\\"]}}
2020-04-10 02:47:07: {"return": 2}
2020-04-10 02:47:07: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:07: {"return": "frozen"}
2020-04-10 02:47:15: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:15: {"return": "frozen"}
4. issue "guest-fsfreeze-thaw" and get the response in 91s.
2020-04-10 02:47:15: {"execute": "guest-fsfreeze-thaw"}
-------------------------------------------> get response takes 91s
2020-04-10 02:48:46: {"return": 2}
2020-04-10 02:48:46: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:48:46: {"return": "thawed"}
2020-04-10 02:48:46: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:48:46: {"return": "thawed"}
It works with this build. Test mingw-qemu-ga-win-101.1.0-1.el7ev for 10 times, can reproduce this issue twice. Test the build from comment #16 for 20 times, all passed. build with the fix: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1275815 Reproduce with version mingw-qemu-ga-win-101.1.0-1.el7ev
Result is:
Version-Release number of selected component (if applicable):
mingw-qemu-ga-win-101.1.0-1.el7ev
Steps to Reproduce:
1.boot up win10-64 guest with virtio serial driver and qemu-ga-win installed.
2.issue fsfreeze cmd from host
[root@dell-per440-06 Bug_1746667]# nc -U /tmp/qga.sock
{"execute":"guest-ping"}
{"return": {}}
{"execute":"guest-fsfreeze-freeze"}
{"return": 3}
--->after 10s,issue thaw cmd
{"execute":"guest-fsfreeze-thaw"} ###//Wait for a long time.about 90s .
{"return": 3}
Actual results:
getting response from fsfreeze-thaw need 90s.
Expected results:
getting response should not be too long.
Verified with version mingw-qemu-ga-win-101.2.0-1.el7ev
Result is:
Version-Release number of selected component (if applicable):
mingw-qemu-ga-win-101.2.0-1.el7ev
Steps to Reproduce:
1.boot up win10-64 guest with virtio serial driver and qemu-ga-win installed.
2.issue fsfreeze cmd from host
[root@dell-per440-06 Bug_1746667]# nc -U /tmp/qga.sock
{"execute":"guest-ping"}
{"return": {}}
{"execute":"guest-fsfreeze-freeze"}
{"return": 3}
{"execute":"guest-fsfreeze-thaw"}
{"error": {"class": "GenericError", "desc": "couldn't hold writes: fsfreeze is limited up to 10 seconds: "}}
{"execute":"guest-fsfreeze-thaw"}
{"return": 0} ###//Not wait too long.
Actual results:
getting response from fsfreeze-thaw quickly
Expected results:
getting response from fsfreeze-thaw quickly
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virtio-win bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4840 |
Description of problem: Test fsfreeze cmd in win10-64 guest. Issue guest-fsfreeze-freeze cmd and get response, wait 10s and then issue guest-fsfreeze-thaw cmd, but getting response from guest-fsfreeze-thaw need 90s. Version-Release number of selected component (if applicable): mingw-qemu-ga-win-7.5.0-2.el7ev virtio-win-1.9.4-2.el7 How reproducible: 6/10 Steps to Reproduce: 1.boot up win10-64 guest with virtio serial driver and qemu-ga-win installed. 2.issue fsfreeze cmd from host #nc -U /var/tmp/qga.sock {"execute":"guest-ping"} {"return": {}} {"execute":"guest-fsfreeze-freeze"} {"return": 2} --->after 10s,issue thaw cmd {"execute":"guest-fsfreeze-thaw"} //////////[xiagao@redhat ~]$ date //////////Fri Feb 23 14:36:25 CST 2018 {"return": 2} //////////[xiagao@redhat ~]$ date //////////Fri Feb 23 14:37:55 CST 2018 Actual results: getting response from fsfreeze-thaw need 90s. Expected results: getting response should not be too long. Additional info: