Bug 1549425
Summary: | Getting response from guest-fsfreeze-thaw need about 90s sometimes | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | xiagao |
Component: | virtio-win | Assignee: | Basil Salman <bsalman> |
virtio-win sub component: | qemu-ga-win | QA Contact: | dehanmeng <demeng> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | medium | CC: | ailan, bsalman, demeng, jinzhao, lijin, vrozenfe, xiagao, yvugenfi |
Version: | 8.0 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 101.2.0 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-04 04:17:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1682882 | ||
Bug Blocks: |
Description
xiagao
2018-02-27 04:31:25 UTC
Can this be reproduced with older builds of qemu-ga? What about other OSes? (In reply to Sameeh Jubran from comment #2) > Can this be reproduced with older builds of qemu-ga? What about other OSes? Test build qemu-ga-win-7.4.5-1 on win10-64 guest, still hit this issue. And test it on win8.1-64 guest, didn't hit this issue. mingw-qemu-ga-win-7.5.0-2; win10-64; how reproducible:3/5 qemu-ga-win-7.4.5-1; win10-64; how reproducible:2/5 mingw-qemu-ga-win-7.5.0-2; win8.1-64; how reproducible:0/5 (In reply to xiagao from comment #3) > (In reply to Sameeh Jubran from comment #2) > > Can this be reproduced with older builds of qemu-ga? What about other OSes? > > Test build qemu-ga-win-7.4.5-1 on win10-64 guest, still hit this issue. > And test it on win8.1-64 guest, didn't hit this issue. > > > mingw-qemu-ga-win-7.5.0-2; win10-64; how reproducible:3/5 > > qemu-ga-win-7.4.5-1; win10-64; how reproducible:2/5 > > mingw-qemu-ga-win-7.5.0-2; win8.1-64; how reproducible:0/5 Is this still reproducible with the latest release of qemu-ga?? Test 7.6.2 version on a new installed win10-64 guest. # nc -U /tmp/helloworld2 {"execute":"guest-ping"} {"return": {}} {"execute":"guest-fsfreeze-freeze"} {"return": 2} {"execute":"guest-fsfreeze-thaw"} {"return": 2} {"execute":"guest-fsfreeze-freeze"} {"return": 2} {"execute":"guest-fsfreeze-thaw"} {"return": 2} //get response quickly for the first two times. {"execute":"guest-fsfreeze-freeze"} {"return": 2} {"execute":"guest-fsfreeze-thaw"} //get response after more than 60s {"return": 2} (In reply to xiagao from comment #5) > Test 7.6.2 version on a new installed win10-64 guest. > > # nc -U /tmp/helloworld2 > {"execute":"guest-ping"} > {"return": {}} > {"execute":"guest-fsfreeze-freeze"} > {"return": 2} > {"execute":"guest-fsfreeze-thaw"} > {"return": 2} > {"execute":"guest-fsfreeze-freeze"} > {"return": 2} > {"execute":"guest-fsfreeze-thaw"} > {"return": 2} //get response quickly for the first two > times. > > {"execute":"guest-fsfreeze-freeze"} > {"return": 2} > {"execute":"guest-fsfreeze-thaw"} //get response after more than 60s > > > > {"return": 2} Still hit this issue. (In reply to xiagao from comment #6) > (In reply to xiagao from comment #5) > > Test 7.6.2 version on a new installed win10-64 guest. > > > > # nc -U /tmp/helloworld2 > > {"execute":"guest-ping"} > > {"return": {}} > > {"execute":"guest-fsfreeze-freeze"} > > {"return": 2} > > {"execute":"guest-fsfreeze-thaw"} > > {"return": 2} > > {"execute":"guest-fsfreeze-freeze"} > > {"return": 2} > > {"execute":"guest-fsfreeze-thaw"} > > {"return": 2} //get response quickly for the first two > > times. > > > > {"execute":"guest-fsfreeze-freeze"} > > {"return": 2} > > {"execute":"guest-fsfreeze-thaw"} //get response after more than 60s > > > > > > > > {"return": 2} > > Still hit this issue. Hi, We cannot reproduce this in any way, can you please give us access to a the server and vm that you are reproducing on? I need access to the host and the VM, preferably with instructions on how you are using the setup. Thanks! Reproduce it in mingw-qemu-ga-win-100.0.0.0-3.el7ev. pkg info: kernel-4.18.0-80.el8.x86_64 qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8.x86_64 steps: 1.boot up win2019 guest. 2.issue fsfreeze cmd via qga channel at the first time {"execute":"guest-fsfreeze-freeze" } {"return": 2} 3.issue fsthaw cmd in **5s** {"execute":"guest-fsfreeze-thaw" } {"return": 2} ==========get response quickly, and checked VSS provider service is still running 4.issue fsfreeze cmd via qga channel at the second time {"execute":"guest-fsfreeze-freeze" } {"return": 2} 5.issue fsthaw cmd in **5s** ==========get response need **90s**, and checked VSS provider service is still running Reproduced bug on Windows Server 2019. Could NOT reproduce on Windows 10 x64 1803. Error logging when running "qemu-ga.exe -v", verbose mode: 1. Called {"execute":"guest-fsfreeze-freeze"} 1555336917.914973: debug: thread: overlapped result, count_read: 36 1555336917.930482: debug: dispatch 1555336917.930482: debug: read data, count: 36, data: {"execute":"guest-fsfreeze-freeze"} 1555336917.930482: debug: process_event: called 1555336917.946016: debug: processing command 1555336917.946016: info: guest-fsfreeze called 1555336917.946016: debug: disabling command: guest-get-time 1555336917.946016: debug: disabling command: guest-set-time 1555336917.961627: debug: disabling command: guest-shutdown 1555336917.961627: debug: disabling command: guest-file-open 1555336917.961627: debug: disabling command: guest-file-close 1555336917.977243: debug: disabling command: guest-file-read 1555336917.977243: debug: disabling command: guest-file-write 1555336917.977243: debug: disabling command: guest-file-seek 1555336917.992895: debug: disabling command: guest-file-flush 1555336917.992895: debug: disabling command: guest-fsfreeze-freeze 1555336917.992895: debug: disabling command: guest-fsfreeze-freeze-list 1555336918.8538: debug: disabling command: guest-fstrim 1555336918.8538: debug: disabling command: guest-suspend-disk 1555336918.8538: debug: disabling command: guest-suspend-ram 1555336918.24129: debug: disabling command: guest-suspend-hybrid 1555336918.24129: debug: disabling command: guest-network-get-interfaces 1555336918.24129: debug: disabling command: guest-get-vcpus 1555336918.39745: debug: disabling command: guest-set-vcpus 1555336918.39745: debug: disabling command: guest-get-fsinfo 1555336918.39745: debug: disabling command: guest-set-user-password 1555336918.39745: debug: disabling command: guest-get-memory-blocks 1555336918.55385: debug: disabling command: guest-set-memory-blocks 1555336918.55385: debug: disabling command: guest-get-memory-block-info 1555336918.71013: debug: disabling command: guest-exec-status 1555336918.71013: debug: disabling command: guest-exec 1555336918.71013: debug: disabling command: guest-get-host-name 1555336918.71013: debug: disabling command: guest-get-users 1555336918.86623: debug: disabling command: guest-get-timezone 1555336918.86623: debug: disabling command: guest-get-osinfo 1555336918.86623: warning: disabling logging due to filesystem freeze 2. Then called {"execute":"guest-fsfreeze-thaw"}, after something like 90sec Failed to pCatalog->GetCollection. (Error: 8000402a) The server started, but did not finish initializing in a timely fashion. Failed to QGAProviderFind. (Error: 8000402a) The server started, but did not finish initializing in a timely fashion. 1555337011.415550: warning: logging re-enabled due to filesystem unfreeze 1555337011.415550: debug: enabling command: guest-get-time 1555337011.431221: debug: enabling command: guest-set-time 1555337011.431221: debug: enabling command: guest-shutdown 1555337011.446780: debug: enabling command: guest-file-close 1555337011.446780: debug: enabling command: guest-file-read 1555337011.446780: debug: enabling command: guest-file-write 1555337011.446780: debug: enabling command: guest-file-seek 1555337011.462420: debug: enabling command: guest-file-flush 1555337011.462420: debug: enabling command: guest-fsfreeze-freeze 1555337011.462420: debug: enabling command: guest-fsfreeze-freeze-list 1555337011.478067: debug: enabling command: guest-fstrim 1555337011.478067: debug: enabling command: guest-suspend-ram 1555337011.493690: debug: enabling command: guest-network-get-interfaces 1555337011.493690: debug: enabling command: guest-get-vcpus 1555337011.493690: debug: enabling command: guest-get-fsinfo 1555337011.493690: debug: enabling command: guest-set-user-password 1555337011.509292: debug: enabling command: guest-get-memory-block-info 1555337011.509292: debug: enabling command: guest-exec-status 1555337011.509292: debug: enabling command: guest-exec 1555337011.524941: debug: enabling command: guest-get-users 1555337011.524941: debug: enabling command: guest-get-timezone 1555337011.540561: debug: enabling command: guest-get-osinfo It seems that the COM+ Application Server is malfunctioning on Windows Server 2019 for some reason. After testing, I found that the solution that the current qemu-ga implementation uses the COM+ System Application service to manage the QEMU GA VSS Provider service. My first assessment was that invoking the "fsfreeze-thaw" tries to stop the QEMU GA VSS Provider service using COM+ System Application and fails, which causes the 90sec delay and failure in stopping the QEMU GA VSS Provider service. After further inspection, I found that invoking "fsfreeze-freeze" stops the COM+ System Application service, and this causes the failure to stop the QEMU GA VSS Provider service on the "fsfreeze-thaw" invoke. This failure doesn't happen in 100% of the "fsfreeze-thaw" invokes, and this presents the possibility of a race condition. Now, why invoking "fsfreeze-freeze" stops the COM+ System Application service? It needs further inspection. Hit the similar issue on mingw-qemu-ga-win-101.1.0-1.el7ev with "guest-fsfreeze-freeze-list" Only hit this issue on Win10-32/64,Win2019 Didn't hit it on Win2016,Win2012,Win8.1-64(not test other guest.) 100% reproduce with automation. steps: 1. issue "guest-fsfreeze-freeze-list" without parameters 2020-04-10 02:46:51: {"execute": "guest-fsfreeze-freeze-list"} 2020-04-10 02:46:56: {"return": 3} 2020-04-10 02:46:56: {"execute": "guest-fsfreeze-status"} 2020-04-10 02:46:56: {"return": "frozen"} 2020-04-10 02:47:04: {"execute": "guest-fsfreeze-status"} 2020-04-10 02:47:04: {"return": "frozen"} 2. issue "guest-fsfreeze-thaw" and get the response in 1s. 2020-04-10 02:47:04: {"execute": "guest-fsfreeze-thaw"} 2020-04-10 02:47:05: {"return": 3} 2020-04-10 02:47:05: {"execute": "guest-fsfreeze-status"} 2020-04-10 02:47:05: {"return": "thawed"} 2020-04-10 02:47:06: {"execute": "guest-fsfreeze-status"} 2020-04-10 02:47:06: {"return": "thawed"} 2020-04-10 02:47:07: {"execute": "guest-fsfreeze-status"} 2020-04-10 02:47:07: {"return": "thawed"} 3. issue "guest-fsfreeze-freeze-list" with parameters. 2020-04-10 02:47:07: {"execute": "guest-fsfreeze-freeze-list", "arguments": {"mountpoints": ["C:\\", "F:\\"]}} 2020-04-10 02:47:07: {"return": 2} 2020-04-10 02:47:07: {"execute": "guest-fsfreeze-status"} 2020-04-10 02:47:07: {"return": "frozen"} 2020-04-10 02:47:15: {"execute": "guest-fsfreeze-status"} 2020-04-10 02:47:15: {"return": "frozen"} 4. issue "guest-fsfreeze-thaw" and get the response in 91s. 2020-04-10 02:47:15: {"execute": "guest-fsfreeze-thaw"} -------------------------------------------> get response takes 91s 2020-04-10 02:48:46: {"return": 2} 2020-04-10 02:48:46: {"execute": "guest-fsfreeze-status"} 2020-04-10 02:48:46: {"return": "thawed"} 2020-04-10 02:48:46: {"execute": "guest-fsfreeze-status"} 2020-04-10 02:48:46: {"return": "thawed"} It works with this build. Test mingw-qemu-ga-win-101.1.0-1.el7ev for 10 times, can reproduce this issue twice. Test the build from comment #16 for 20 times, all passed. build with the fix: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1275815 Reproduce with version mingw-qemu-ga-win-101.1.0-1.el7ev Result is: Version-Release number of selected component (if applicable): mingw-qemu-ga-win-101.1.0-1.el7ev Steps to Reproduce: 1.boot up win10-64 guest with virtio serial driver and qemu-ga-win installed. 2.issue fsfreeze cmd from host [root@dell-per440-06 Bug_1746667]# nc -U /tmp/qga.sock {"execute":"guest-ping"} {"return": {}} {"execute":"guest-fsfreeze-freeze"} {"return": 3} --->after 10s,issue thaw cmd {"execute":"guest-fsfreeze-thaw"} ###//Wait for a long time.about 90s . {"return": 3} Actual results: getting response from fsfreeze-thaw need 90s. Expected results: getting response should not be too long. Verified with version mingw-qemu-ga-win-101.2.0-1.el7ev Result is: Version-Release number of selected component (if applicable): mingw-qemu-ga-win-101.2.0-1.el7ev Steps to Reproduce: 1.boot up win10-64 guest with virtio serial driver and qemu-ga-win installed. 2.issue fsfreeze cmd from host [root@dell-per440-06 Bug_1746667]# nc -U /tmp/qga.sock {"execute":"guest-ping"} {"return": {}} {"execute":"guest-fsfreeze-freeze"} {"return": 3} {"execute":"guest-fsfreeze-thaw"} {"error": {"class": "GenericError", "desc": "couldn't hold writes: fsfreeze is limited up to 10 seconds: "}} {"execute":"guest-fsfreeze-thaw"} {"return": 0} ###//Not wait too long. Actual results: getting response from fsfreeze-thaw quickly Expected results: getting response from fsfreeze-thaw quickly Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virtio-win bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4840 |