Bug 1719548

Summary: guest-agent has no response after read a big file with a large count
Product: Red Hat Enterprise Linux 8 Reporter: xiagao
Component: qemu-kvmAssignee: Marc-Andre Lureau <marcandre.lureau>
Status: CLOSED WONTFIX QA Contact: xiagao
Severity: low Docs Contact:
Priority: low    
Version: 8.1CC: jinzhao, juzhang, rbalakri, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1733018 (view as bug list) Environment:
Last Closed: 2020-02-03 13:31:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1733018    

Description xiagao 2019-06-12 05:38:48 UTC
Description of problem:
Issue {"execute": "guest-file-read", "arguments": {"handle": 1000, "count": 10000000000}} qemu-guest-agent command from host, can't have response and the handle is closed.

Version-Release number of selected component (if applicable):
host(rhel8.1 slow train):
qemu-kvm-2.12.0-74.module+el8.1.0+3227+57d66ad3.x86_64
kernel-4.18.0-100.el8.x86_64
seabios-bin-1.11.1-3.module+el8.1.0+2983+b2ae9c0a.noarch

guest(rhel8.1 slow train):
qemu-guest-agent-2.12.0-74.module+el8.1.0+3227+57d66ad3.x86_64
kernel-4.18.0-103.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot up guest with qemu-guest-agent needed command line.
-device virtio-serial-pci,id=virtio-serial1,max_ports=31,bus=pcie-root-port-4 \
-chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait  -device virtserialport,bus=virtio-serial1.0,chardev=channel2,name=org.qemu.guest_agent.0 \

2.create a 200k file in guest.
# dd if=/dev/urandom of=/tmp/big_file bs=1024 count=200

3.open the big file with default mode(read only), and read it with a large count.
# nc -U /tmp/helloworld2 
{"execute": "guest-file-open", "arguments": {"path": "/tmp/big_file"}}
{"return": 1000}
{"execute": "guest-file-read", "arguments": {"handle": 1000, "count": 10000000000}}

===========wait 20s and no response.

3.read the file again.
{"execute": "guest-file-read", "arguments": {"handle": 1000, "count": 10000000000}}
{"error": {"class": "GenericError", "desc": "handle '1000' has not been found"}}

===========handle can't be found.

Actual results:
as step 2 and step 3

Expected results:
after step 3, can get response,such as error info.
And the handle can be found.


Additional info:
1. Install qemu-guest-agent-4.0.0-3.module+el8.1.0+3265+26c4ed71.x86_64.rpm in guest, can have response.
{"error": {"class": "GenericError", "desc": "value '10000000000' is invalid for argument count"}}
2. Install qemu-guest-agent-3.1.0-27.module+el8.0.1+3253+c5371cb3.x86_64.rpm, also can get the above response.
3. In rhel7.7 with qemu-guest-agent-2.12.0-3.el7,also hit this issue.

Comment 1 Marc-Andre Lureau 2019-06-26 00:00:59 UTC
I am not able to reproduce, but I can imagine g_malloc() would abort in some cases.

We could backport:

commit 141b197408ab398c4f474ac1a728ab316e921f2b
Author: Prasad J Pandit <pjp>
Date:   Wed Jun 13 11:46:57 2018 +0530

    qga: check bytes count read by guest-file-read

That helps.

Please check with this brew build: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=22357644

Comment 2 xiagao 2019-06-26 03:22:45 UTC
(In reply to Marc-Andre Lureau from comment #1)
> I am not able to reproduce, but I can imagine g_malloc() would abort in some
> cases.
> 
> We could backport:
> 
> commit 141b197408ab398c4f474ac1a728ab316e921f2b
> Author: Prasad J Pandit <pjp>
> Date:   Wed Jun 13 11:46:57 2018 +0530
> 
>     qga: check bytes count read by guest-file-read
> 
> That helps.
> 
> Please check with this brew build:
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=22357644

It works,can get a reasonable response with the brew build.

{"execute": "guest-file-open", "arguments": {"path": "/tmp/big_file"}}
{"return": 1000}
{"execute": "guest-file-read", "arguments": {"handle": 1000, "count": 10000000000}}
{"error": {"class": "GenericError", "desc": "value '10000000000' is invalid for argument count"}}
{"execute": "guest-file-read", "arguments": {"handle": 1000, "count": 10000000000}}

Comment 3 xiagao 2019-06-26 03:24:00 UTC
Btw, 
In rhel7.7 with qemu-guest-agent-2.12.0-3.el7,also hit this issue, so do I need clone one bug in RHEL7.7?

Comment 4 Marc-Andre Lureau 2019-06-26 09:56:17 UTC
(In reply to xiagao from comment #3)
> Btw, 
> In rhel7.7 with qemu-guest-agent-2.12.0-3.el7,also hit this issue, so do I
> need clone one bug in RHEL7.7?

The impact is pretty low, and the abort() could still occur, even with that 4G limit. We could have an additional patch to use g_try_malloc() instead on those paths.

I guess we could backport to rhel7.7 in this case.

Comment 5 Marc-Andre Lureau 2020-01-17 12:52:32 UTC
imho, we should close this bug as WONTFIX. In short, this is qemu-ga aborting on OOM, which is expected.

You could argue that this is easily triggerable via QMP client, but there are potentially a lot of ways to make qemu-ga grow memory and get OOM or killed this way.

This can only get marginally better with fixes like 141b197408ab398c4f474ac1a728ab316e921f2b. Not sure that deserves backports.

Comment 6 xiagao 2020-02-02 04:15:32 UTC
(In reply to Marc-Andre Lureau from comment #5)
> imho, we should close this bug as WONTFIX. In short, this is qemu-ga
> aborting on OOM, which is expected.
> 
> You could argue that this is easily triggerable via QMP client, but there
> are potentially a lot of ways to make qemu-ga grow memory and get OOM or
> killed this way.
> 
> This can only get marginally better with fixes like
> 141b197408ab398c4f474ac1a728ab316e921f2b. Not sure that deserves backports.

yes, I know what you mean, but I have a question.
As mentioned in comment 0, only qemu-guest-agent-2.12.0-xx version has this issue, while qemu-guest-agent-3.xx andqemu-guest-agent-4.xx versions have no this issue(141b197408ab398c4f474ac1a728ab316e921f2b applied).

I think we should make them aligned.

Comment 7 Marc-Andre Lureau 2020-02-02 22:07:14 UTC
Aligning different qemu-ga behaviours isn't a strong reason, but I can send the patch, it'll eventually be reviewed..

Comment 8 John Ferlan 2020-02-03 13:31:20 UTC
I agree with Marc-Andre - closing this as WONTFIX

It's not worth generating a RHEL 8.2 based fix especially since any dependent product would be using RHEL AV which should have the fix.