Bug 1549425

Summary: Getting response from guest-fsfreeze-thaw need about 90s sometimes
Product: Red Hat Enterprise Linux 8 Reporter: xiagao
Component: virtio-winAssignee: Basil Salman <bsalman>
virtio-win sub component: qemu-ga-win QA Contact: dehanmeng <demeng>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: medium CC: ailan, bsalman, demeng, jinzhao, lijin, vrozenfe, xiagao, yvugenfi
Version: 8.0Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 101.2.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 04:17:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1682882    
Bug Blocks:    

Description xiagao 2018-02-27 04:31:25 UTC
Description of problem:
Test fsfreeze cmd in win10-64 guest.
Issue guest-fsfreeze-freeze cmd and get response, wait 10s and then issue guest-fsfreeze-thaw cmd, but getting response from guest-fsfreeze-thaw need 90s.

Version-Release number of selected component (if applicable):
mingw-qemu-ga-win-7.5.0-2.el7ev
virtio-win-1.9.4-2.el7

How reproducible:
6/10

Steps to Reproduce:
1.boot up win10-64 guest with virtio serial driver and qemu-ga-win installed.

2.issue fsfreeze cmd from host
#nc -U /var/tmp/qga.sock
{"execute":"guest-ping"}
{"return": {}}

{"execute":"guest-fsfreeze-freeze"}
{"return": 2}

--->after 10s,issue thaw cmd

{"execute":"guest-fsfreeze-thaw"}  //////////[xiagao@redhat ~]$ date
                                   //////////Fri Feb  23 14:36:25 CST 2018

{"return": 2}                      //////////[xiagao@redhat ~]$ date
                                   //////////Fri Feb  23 14:37:55 CST 2018


Actual results:
getting response from fsfreeze-thaw need 90s.

Expected results:
getting response should not be too long.


Additional info:

Comment 2 Sameeh Jubran 2018-06-24 15:47:40 UTC
Can this be reproduced with older builds of qemu-ga? What about other OSes?

Comment 3 xiagao 2018-06-26 03:23:09 UTC
(In reply to Sameeh Jubran from comment #2)
> Can this be reproduced with older builds of qemu-ga? What about other OSes?

Test build qemu-ga-win-7.4.5-1 on win10-64 guest, still hit this issue.
And test it on win8.1-64 guest, didn't hit this issue.


mingw-qemu-ga-win-7.5.0-2;   win10-64;   how reproducible:3/5

qemu-ga-win-7.4.5-1;         win10-64;   how reproducible:2/5

mingw-qemu-ga-win-7.5.0-2;   win8.1-64;  how reproducible:0/5

Comment 4 Sameeh Jubran 2018-09-26 10:31:09 UTC
(In reply to xiagao from comment #3)
> (In reply to Sameeh Jubran from comment #2)
> > Can this be reproduced with older builds of qemu-ga? What about other OSes?
> 
> Test build qemu-ga-win-7.4.5-1 on win10-64 guest, still hit this issue.
> And test it on win8.1-64 guest, didn't hit this issue.
> 
> 
> mingw-qemu-ga-win-7.5.0-2;   win10-64;   how reproducible:3/5
> 
> qemu-ga-win-7.4.5-1;         win10-64;   how reproducible:2/5
> 
> mingw-qemu-ga-win-7.5.0-2;   win8.1-64;  how reproducible:0/5

Is this still reproducible with the latest release of qemu-ga??

Comment 5 xiagao 2018-09-27 08:33:54 UTC
Test 7.6.2 version on a new installed win10-64 guest.

# nc -U /tmp/helloworld2
{"execute":"guest-ping"}
{"return": {}}
{"execute":"guest-fsfreeze-freeze"}
{"return": 2}
{"execute":"guest-fsfreeze-thaw"}
{"return": 2}
{"execute":"guest-fsfreeze-freeze"}
{"return": 2}
{"execute":"guest-fsfreeze-thaw"}    
{"return": 2}                       //get response quickly for the first two   times.

{"execute":"guest-fsfreeze-freeze"}  
{"return": 2}
{"execute":"guest-fsfreeze-thaw"}   //get response after more than 60s



{"return": 2}

Comment 6 xiagao 2018-09-27 08:34:30 UTC
(In reply to xiagao from comment #5)
> Test 7.6.2 version on a new installed win10-64 guest.
> 
> # nc -U /tmp/helloworld2
> {"execute":"guest-ping"}
> {"return": {}}
> {"execute":"guest-fsfreeze-freeze"}
> {"return": 2}
> {"execute":"guest-fsfreeze-thaw"}
> {"return": 2}
> {"execute":"guest-fsfreeze-freeze"}
> {"return": 2}
> {"execute":"guest-fsfreeze-thaw"}    
> {"return": 2}                       //get response quickly for the first two
> times.
> 
> {"execute":"guest-fsfreeze-freeze"}  
> {"return": 2}
> {"execute":"guest-fsfreeze-thaw"}   //get response after more than 60s
> 
> 
> 
> {"return": 2}

Still hit this issue.

Comment 7 Sameeh Jubran 2018-11-27 12:20:53 UTC
(In reply to xiagao from comment #6)
> (In reply to xiagao from comment #5)
> > Test 7.6.2 version on a new installed win10-64 guest.
> > 
> > # nc -U /tmp/helloworld2
> > {"execute":"guest-ping"}
> > {"return": {}}
> > {"execute":"guest-fsfreeze-freeze"}
> > {"return": 2}
> > {"execute":"guest-fsfreeze-thaw"}
> > {"return": 2}
> > {"execute":"guest-fsfreeze-freeze"}
> > {"return": 2}
> > {"execute":"guest-fsfreeze-thaw"}    
> > {"return": 2}                       //get response quickly for the first two
> > times.
> > 
> > {"execute":"guest-fsfreeze-freeze"}  
> > {"return": 2}
> > {"execute":"guest-fsfreeze-thaw"}   //get response after more than 60s
> > 
> > 
> > 
> > {"return": 2}
> 
> Still hit this issue.

Hi,

We cannot reproduce this in any way, can you please give us access to a the server and vm that you are reproducing on?

I need access to the host and the VM, preferably with instructions on how you are using the setup.

Thanks!

Comment 10 xiagao 2019-04-15 09:29:54 UTC
Reproduce it in mingw-qemu-ga-win-100.0.0.0-3.el7ev.

pkg info: 
kernel-4.18.0-80.el8.x86_64
qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8.x86_64

steps:
1.boot up win2019 guest.
2.issue fsfreeze cmd via qga channel at the first time
{"execute":"guest-fsfreeze-freeze" }
{"return": 2}
3.issue fsthaw cmd in **5s**
{"execute":"guest-fsfreeze-thaw" }
{"return": 2}
==========get response quickly, and checked VSS provider service is still running

4.issue fsfreeze cmd via qga channel at the second time
{"execute":"guest-fsfreeze-freeze" }
{"return": 2}
5.issue fsthaw cmd in **5s**

==========get response need **90s**, and checked VSS provider service is still running

Comment 11 Bishara AbuHattoum 2019-04-16 14:40:50 UTC
Reproduced bug on Windows Server 2019.
Could NOT reproduce on Windows 10 x64 1803.

Error logging when running "qemu-ga.exe -v", verbose mode:
1. Called {"execute":"guest-fsfreeze-freeze"}

1555336917.914973: debug: thread: overlapped result, count_read: 36
1555336917.930482: debug: dispatch
1555336917.930482: debug: read data, count: 36, data: {"execute":"guest-fsfreeze-freeze"}

1555336917.930482: debug: process_event: called
1555336917.946016: debug: processing command
1555336917.946016: info: guest-fsfreeze called
1555336917.946016: debug: disabling command: guest-get-time
1555336917.946016: debug: disabling command: guest-set-time
1555336917.961627: debug: disabling command: guest-shutdown
1555336917.961627: debug: disabling command: guest-file-open
1555336917.961627: debug: disabling command: guest-file-close
1555336917.977243: debug: disabling command: guest-file-read
1555336917.977243: debug: disabling command: guest-file-write
1555336917.977243: debug: disabling command: guest-file-seek
1555336917.992895: debug: disabling command: guest-file-flush
1555336917.992895: debug: disabling command: guest-fsfreeze-freeze
1555336917.992895: debug: disabling command: guest-fsfreeze-freeze-list
1555336918.8538: debug: disabling command: guest-fstrim
1555336918.8538: debug: disabling command: guest-suspend-disk
1555336918.8538: debug: disabling command: guest-suspend-ram
1555336918.24129: debug: disabling command: guest-suspend-hybrid
1555336918.24129: debug: disabling command: guest-network-get-interfaces
1555336918.24129: debug: disabling command: guest-get-vcpus
1555336918.39745: debug: disabling command: guest-set-vcpus
1555336918.39745: debug: disabling command: guest-get-fsinfo
1555336918.39745: debug: disabling command: guest-set-user-password
1555336918.39745: debug: disabling command: guest-get-memory-blocks
1555336918.55385: debug: disabling command: guest-set-memory-blocks
1555336918.55385: debug: disabling command: guest-get-memory-block-info
1555336918.71013: debug: disabling command: guest-exec-status
1555336918.71013: debug: disabling command: guest-exec
1555336918.71013: debug: disabling command: guest-get-host-name
1555336918.71013: debug: disabling command: guest-get-users
1555336918.86623: debug: disabling command: guest-get-timezone
1555336918.86623: debug: disabling command: guest-get-osinfo
1555336918.86623: warning: disabling logging due to filesystem freeze


2. Then called {"execute":"guest-fsfreeze-thaw"}, after something like 90sec

Failed to pCatalog->GetCollection. (Error: 8000402a) The server started, but did not finish initializing in a timely fashion.

Failed to QGAProviderFind. (Error: 8000402a) The server started, but did not finish initializing in a timely fashion.

1555337011.415550: warning: logging re-enabled due to filesystem unfreeze
1555337011.415550: debug: enabling command: guest-get-time
1555337011.431221: debug: enabling command: guest-set-time
1555337011.431221: debug: enabling command: guest-shutdown
1555337011.446780: debug: enabling command: guest-file-close
1555337011.446780: debug: enabling command: guest-file-read
1555337011.446780: debug: enabling command: guest-file-write
1555337011.446780: debug: enabling command: guest-file-seek
1555337011.462420: debug: enabling command: guest-file-flush
1555337011.462420: debug: enabling command: guest-fsfreeze-freeze
1555337011.462420: debug: enabling command: guest-fsfreeze-freeze-list
1555337011.478067: debug: enabling command: guest-fstrim
1555337011.478067: debug: enabling command: guest-suspend-ram
1555337011.493690: debug: enabling command: guest-network-get-interfaces
1555337011.493690: debug: enabling command: guest-get-vcpus
1555337011.493690: debug: enabling command: guest-get-fsinfo
1555337011.493690: debug: enabling command: guest-set-user-password
1555337011.509292: debug: enabling command: guest-get-memory-block-info
1555337011.509292: debug: enabling command: guest-exec-status
1555337011.509292: debug: enabling command: guest-exec
1555337011.524941: debug: enabling command: guest-get-users
1555337011.524941: debug: enabling command: guest-get-timezone
1555337011.540561: debug: enabling command: guest-get-osinfo

It seems that the COM+ Application Server is malfunctioning on Windows Server 2019 for some reason.

Comment 12 Bishara AbuHattoum 2019-05-13 14:30:14 UTC
After testing, I found that the solution that the current qemu-ga implementation uses the COM+ System Application service to manage the QEMU GA VSS Provider service.
My first assessment was that invoking the "fsfreeze-thaw" tries to stop the QEMU GA VSS Provider service using COM+ System Application and fails, which causes the 90sec delay and failure in stopping the QEMU GA VSS Provider service.
After further inspection, I found that invoking "fsfreeze-freeze" stops the COM+ System Application service, and this causes the failure to stop the QEMU GA VSS Provider service on the "fsfreeze-thaw" invoke. This failure doesn't happen in 100% of the "fsfreeze-thaw" invokes, and this presents the possibility of a race condition.
Now, why invoking "fsfreeze-freeze" stops the COM+ System Application service? It needs further inspection.

Comment 15 xiagao 2020-04-10 03:06:54 UTC
Hit the similar issue on mingw-qemu-ga-win-101.1.0-1.el7ev with "guest-fsfreeze-freeze-list"

Only hit this issue on Win10-32/64,Win2019
Didn't hit it on Win2016,Win2012,Win8.1-64(not test other guest.)

100% reproduce with automation.

steps:
1. issue "guest-fsfreeze-freeze-list" without parameters
2020-04-10 02:46:51: {"execute": "guest-fsfreeze-freeze-list"}
2020-04-10 02:46:56: {"return": 3}
2020-04-10 02:46:56: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:46:56: {"return": "frozen"}
2020-04-10 02:47:04: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:04: {"return": "frozen"}

2. issue "guest-fsfreeze-thaw" and get the response in 1s.
2020-04-10 02:47:04: {"execute": "guest-fsfreeze-thaw"}
2020-04-10 02:47:05: {"return": 3}
2020-04-10 02:47:05: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:05: {"return": "thawed"}
2020-04-10 02:47:06: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:06: {"return": "thawed"}
2020-04-10 02:47:07: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:07: {"return": "thawed"}

3. issue "guest-fsfreeze-freeze-list" with parameters.
2020-04-10 02:47:07: {"execute": "guest-fsfreeze-freeze-list", "arguments": {"mountpoints": ["C:\\", "F:\\"]}}
2020-04-10 02:47:07: {"return": 2}
2020-04-10 02:47:07: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:07: {"return": "frozen"}
2020-04-10 02:47:15: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:47:15: {"return": "frozen"}

4. issue "guest-fsfreeze-thaw" and get the response in 91s.
2020-04-10 02:47:15: {"execute": "guest-fsfreeze-thaw"}

-------------------------------------------> get response takes 91s

2020-04-10 02:48:46: {"return": 2}
2020-04-10 02:48:46: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:48:46: {"return": "thawed"}
2020-04-10 02:48:46: {"execute": "guest-fsfreeze-status"}
2020-04-10 02:48:46: {"return": "thawed"}

Comment 18 xiagao 2020-07-02 06:56:48 UTC
It works with this build.

Test mingw-qemu-ga-win-101.1.0-1.el7ev for 10 times, can reproduce this issue twice.

Test the build from comment #16 for 20 times, all passed.

Comment 19 Basil Salman 2020-08-03 14:11:35 UTC
build with the fix:
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1275815

Comment 20 dehanmeng 2020-08-10 05:51:24 UTC
Reproduce with version mingw-qemu-ga-win-101.1.0-1.el7ev
Result is:
Version-Release number of selected component (if applicable):
mingw-qemu-ga-win-101.1.0-1.el7ev


Steps to Reproduce:
1.boot up win10-64 guest with virtio serial driver and qemu-ga-win installed.

2.issue fsfreeze cmd from host
[root@dell-per440-06 Bug_1746667]# nc -U /tmp/qga.sock
{"execute":"guest-ping"}
{"return": {}}

{"execute":"guest-fsfreeze-freeze"}
{"return": 3}

--->after 10s,issue thaw cmd

{"execute":"guest-fsfreeze-thaw"}  ###//Wait for a long time.about 90s .

{"return": 3}      

Actual results:
getting response from fsfreeze-thaw need 90s.

Expected results:
getting response should not be too long.




Verified with version mingw-qemu-ga-win-101.2.0-1.el7ev
Result is:

Version-Release number of selected component (if applicable):
mingw-qemu-ga-win-101.2.0-1.el7ev


Steps to Reproduce:
1.boot up win10-64 guest with virtio serial driver and qemu-ga-win installed.

2.issue fsfreeze cmd from host
[root@dell-per440-06 Bug_1746667]# nc -U /tmp/qga.sock
{"execute":"guest-ping"}
{"return": {}}
{"execute":"guest-fsfreeze-freeze"}
{"return": 3}
{"execute":"guest-fsfreeze-thaw"}
{"error": {"class": "GenericError", "desc": "couldn't hold writes: fsfreeze is limited up to 10 seconds: "}}
{"execute":"guest-fsfreeze-thaw"}
{"return": 0}                     ###//Not wait too long.

Actual results:
getting response from fsfreeze-thaw quickly

Expected results:
getting response from fsfreeze-thaw quickly

Comment 25 errata-xmlrpc 2020-11-04 04:17:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virtio-win bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4840