Bug 1210246

Summary: [virtagent]The 'write' content is lost if 'read' it before flush through guest agent
Product: Red Hat Enterprise Linux 6 Reporter: Gu Nini <ngu>
Component: qemu-kvmAssignee: Marc-Andre Lureau <marcandre.lureau>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.7CC: ailan, areis, ashankar, chayang, codonell, hhuang, juzhang, michen, mkenneth, mnewsome, ngu, pfrankli, qzhang, rbalakri, rpacheco, virt-maint, weliao, xuhan, ypu, zhengtli, zhguo
Target Milestone: rc   
Target Release: 6.7   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.484.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-10 20:57:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gu Nini 2015-04-09 09:32:00 UTC
Description of problem:
For rhel6.7 guest, use guest agent to open then write(w+ or a+) to a file, try to read it before flush it; continue to flush/seek/re-read it, it's a failure to read the content, i.e. still no above new written content.

Version-Release number of selected component (if applicable):
Host kernel: 3.10.0-234.el7.ppc64
Qemu-kvm: qemu-kvm-rhev-2.2.0-8.el7.ppc64
Guest kernel: 2.6.32-548.el6.ppc64
Guest agent: 0.12.1.2-2.465.el6.ppc64

How reproducible:
100%

Steps to Reproduce:
1. Start a rhel6.7 guest with virtio serial guest agent:

usr/libexec/qemu-kvm -name virtioblkraw-6.7 -m 129707 ... -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/virtioblkraw-6.7.serial,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,id=channel0,name=virtioblkraw-6.7.serial ...

2. After the guest boots up, inside the guest, install guest agent and start the guest agent service with cmd 'qemu-ga -m virtio-serial -p /dev/virtio-ports/virtioblkraw-6.7.serial &'; comment out the 'BLACKLIST_RPC...' line in file '/etc/sysconfig/qemu-ga' to enable the file related opeartion
3. In the host, connect to the virtio serial guest agent channel: nc -U /var/lib/libvirt/qemu/virtioblkraw-6.7.serial
4. In above channel, issue guest agent open/write/read/flush/seek/read/close command to file /root/test.txt sequentially; and check the file content with cmd 'cat' inside the guest after each file operation :

{"execute":"guest-file-open", "arguments":{"path":"/root/test.txt","mode":"a+"}}
{"return": 1000}
{"execute":"guest-file-write", "arguments":{"handle":1000,"buf-b64":"aGVsbG8gd29ybGQhCg=="}}
{"return": {"count": 13, "eof": false}}
{"execute":"guest-file-read", "arguments":{"handle":1000}}
{"return": {"count": 0, "buf-b64": "", "eof": true}}
{"execute":"guest-file-flush","arguments":{"handle":1000}}
{"return": {}}
{"execute":"guest-file-seek", "arguments":{"handle":1000,"offset":0,"whence":0}}
{"return": {"eof": false, "position": 0}}
{"execute":"guest-file-read", "arguments":{"handle":1000}}
{"return": {"count": 0, "buf-b64": "", "eof": true}}
{"execute":"guest-file-close","arguments":{"handle":1000}}
{"return": {}}


Actual results:
In step4, for the 2nd time read, it's a failure to get the new written content; and from cmd 'cat /root/test.txt' output inside the guest, there is indeed no the new content in the file

Expected results:
It should be a success to save the written content even with an extra 'read' operation before 'flush'

Additional info:
1. If no the 'read' before 'flush', the write would be a success.
2. For rhel7.2 guest on the same host, there is no the bug problem, the guest-kernel/guest-agent version is: 3.10.0-229.el7.ppc64/2.1.0-4.el7.ppc64
3. For rhel7.0 guest on my rhel7.0 x86_64 host, there is the same problem as the bug, the sw version is:
Host kernel: 3.10.0-188.el7.x86_64
Qemu-kvm: qemu-kvm-rhev-2.1.2-7.el7.x86_64
Guest kernel: 3.10.0-123.el7.x86_64
Guest agent: 2.1.0-4.el7.x86_64

Additional questions:
Since file related operations for guest agent is in BLACKLIST of its configure file, and there is no corresponding command in 'virsh', do we still need to test them?

Comment 3 David Gibson 2015-06-17 02:37:31 UTC
Does the same problem occur on x86 or not?

Comment 4 Gu Nini 2015-06-19 05:58:45 UTC
(In reply to David Gibson from comment #3)
> Does the same problem occur on x86 or not?

Met the same problem on rhel6.7 guest x86_64 host, the detailed software versions are as follows:

Host kernel: 3.10.0-254.el7.x86_64
Guest kernel: 2.6.32-570.el6.x86_64
Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-2.el7.x86_64
Qemu-guest-agent: 0.12.1.2-2.478.el6.x86_64

Comment 5 David Gibson 2015-06-22 01:54:13 UTC
Re-assigning to default owner, since it doesn't appear to be Power specific.

Comment 6 Marc-Andre Lureau 2015-07-07 15:27:52 UTC
Interesting bug, you can reproduce the behaviour with this test:

#include <stdio.h>
#include <assert.h>

void main()
{
    FILE *f;
    char foo[4096];
    int n;

    f = fopen("/tmp/test", "a+"); /* qemu does fdopen */
    assert(f);
    fwrite("Hello World!\n", 1, 13, f);

    n = fread(foo, 1, READ_SIZE, f);
    fprintf(stderr, "eof?%d %d", feof(f), n);

    fseek(f, 0, SEEK_SET);
    n = fread(foo, 1, sizeof(foo), f);
    fprintf(stderr, "eof?%d %d", feof(f), n);
}

1) gcc test.c -DREAD_SIZE=4096 -o test (qemu-ga tries to read QGA_READ_COUNT_DEFAULT=4096 by default)
./test
first read: 0
read after seek 0: 0

2) gcc test.c -DREAD_SIZE=13 -o test
./test
first read: 0
read after seek 0: 13

glibc fileops.c has the following test during fread():

/* If we now want less than a buffer, underflow and repeat
 the copy.  Otherwise, _IO_SYSREAD directly to
 the user buffer. */
if (fp->_IO_buf_base 
     && want < (size_t) (fp->_IO_buf_end - fp->_IO_buf_base))
{
   if (__underflow (fp) == EOF)
       break;    


The default FILE buffer size is also 4096, so it does not go to overflow case and write to file.

I am not sure glibc behaviour is really correct, I wonder why it doesn't flush in all write/read switch cases. It would be worth asking glibc maintainers

Comment 7 Carlos O'Donell 2015-07-07 19:11:45 UTC
(In reply to Marc-Andre Lureau from comment #6)
> I am not sure glibc behaviour is really correct, I wonder why it doesn't
> flush in all write/read switch cases. It would be worth asking glibc
> maintainers

Siddhesh, 

Could you please have a look at this?

After the first fwrite the file handle is certainly active. The default initial read position for glibc in a+ streams is the start of the file (implementation defined). The first read should read 0 bytes before EOF. However, the subsequent fseek is adjusting the stream position so it should read 13 from the buffer, but with a large read it doesn't. Thus the behaviour in upstream master looks wrong to me. Thoughts? Is this again another case of a+ behaviour being wrong (we've already fixed 2 of them)? I haven't verified if we have a test case for this specific case of large vs. small buffer size.

Comment 8 Siddhesh Poyarekar 2015-07-11 07:32:22 UTC
The problem is not specific to a+ and is in fact unrelated to anything we have done recently.  Trouble is that when we do fread directly after fwrite (where the former has not flushed), we don't bother to switch to get mode right away and we depend on __underflow to do that for us.  Fix is to explicitly switch to get mode if we're currently putting.

I'm surprised that this bug went unnoticed for so long.  Surely qemu-ga is doing something wrong ;)

Comment 9 Siddhesh Poyarekar 2015-07-11 07:57:23 UTC
Patch posted upstream:

https://sourceware.org/ml/libc-alpha/2015-07/msg00355.html

Comment 10 Siddhesh Poyarekar 2015-07-11 08:27:49 UTC
(In reply to Siddhesh Poyarekar from comment #8)
> I'm surprised that this bug went unnoticed for so long.  Surely qemu-ga is
> doing something wrong ;)

I was kidding then, but it looks like qemu-ga is in fact doing it wrong:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/fopen.html

    However, the application shall ensure that output is not directly followed by 
    input without an intervening call to fflush() or to a file positioning
    function (fseek(), fsetpos(), or rewind()), and input is not directly 
    followed by output without an intervening call to a file positioning
    function, unless the input operation encounters end-of-file.

Reassigning to qemu-kvm, the fix is to explicitly flush when you're switching fro write to read.

Comment 11 Carlos O'Donell 2015-07-11 12:01:32 UTC
(In reply to Siddhesh Poyarekar from comment #10)
> (In reply to Siddhesh Poyarekar from comment #8)
> > I'm surprised that this bug went unnoticed for so long.  Surely qemu-ga is
> > doing something wrong ;)
> 
> I was kidding then, but it looks like qemu-ga is in fact doing it wrong:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/fopen.html
> 
>     However, the application shall ensure that output is not directly
> followed by 
>     input without an intervening call to fflush() or to a file positioning
>     function (fseek(), fsetpos(), or rewind()), and input is not directly 
>     followed by output without an intervening call to a file positioning
>     function, unless the input operation encounters end-of-file.
> 
> Reassigning to qemu-kvm, the fix is to explicitly flush when you're
> switching fro write to read.

Agreed. Thanks Siddhesh. I had not noticed there was a missing flush between read/write.

Comment 12 Marc-Andre Lureau 2015-11-24 16:38:07 UTC
oops, I didn't noticed the bug was reassign to us for so long,

sent fixes to the ML

Comment 13 Marc-Andre Lureau 2015-12-01 16:35:24 UTC
back for rhel6 ready, devel+

Comment 14 Jeff Nelson 2016-01-18 21:50:42 UTC
Fix included in qemu-kvm-0.12.1.2-2.484.el6

Comment 16 weliao 2016-01-28 02:28:11 UTC
Reproduce in below version:
Host:
2.6.32-607.el6.x86_64
qemu-kvm-0.12.1.2-2.485.el6.x86_64
Guest:
2.6.32-595.el6.x86_64
qemu-guest-agent-0.12.1.2-2.483.el6.x86_64

Step:
1.Launch guest with virtio serial guest agent:

/usr/libexec/qemu-kvm -name rhel6.8 -machine pc  -drive id=drive_image1,if=none,cache=none,snapshot=off,format=qcow2,file=/mnt/RHEL-Server-6.7-64-virtio.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,bootindex=0 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,mac=52:54:25:93:79:67,id=net0  -m 2048 -smp 4,maxcpus=8,cores=4,threads=1,sockets=1 -cpu SandyBridge -boot menu=on -enable-kvm -qmp tcp:0:5556,nowait,server -monitor stdio -spice port=5901,disable-ticketing -global qxl-vga.vram_size=67108864 -vga qxl -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0.

2. After the guest boots up, inside the guest, install guest agent and start the guest agent service; comment out the 'BLACKLIST_RPC...' line in file '/etc/sysconfig/qemu-ga' to enable the file related opeartion; setenforce 0 in guest to disable selinux.
3. In the host, connect to the virtio serial guest agent channel:
 nc -U /tmp/qga.sock
4. In above channel, issue guest agent open/write/read/flush/seek/read/close command to file /root/test.txt sequentially; and check the file content with cmd 'cat' inside the guest after each file operation :

{"execute":"guest-file-open", "arguments":{"path":"/root/test.txt","mode":"a+"}}
{"return": 1000}
{"execute":"guest-file-write", "arguments":{"handle":1000,"buf-b64":"aGVsbG8gd29ybGQhCg=="}}
{"return": {"count": 13, "eof": false}}
{"execute":"guest-file-read", "arguments":{"handle":1000}}
{"return": {"count": 0, "buf-b64": "", "eof": true}}
{"execute":"guest-file-flush","arguments":{"handle":1000}}
{"return": {}}
{"execute":"guest-file-seek", "arguments":{"handle":1000,"offset":0,"whence":0}}
{"return": {"eof": false, "position": 0}}
{"execute":"guest-file-read", "arguments":{"handle":1000}}
{"return": {"count": 0, "buf-b64": "", "eof": true}}
{"execute":"guest-file-close","arguments":{"handle":1000}}
{"return": {}}

So can reproduced this bug on qemu-guest-agent-0.12.1.2-2.483.el6.x86_64 version.

Verify this bug in below version:
Host:
2.6.32-607.el6.x86_64
qemu-kvm-0.12.1.2-2.485.el6.x86_64
Guest:
2.6.32-595.el6.x86_64
qemu-guest-agent-0.12.1.2-2.485.el6.x86_64

The same test steps.
step4 result:
{"execute":"guest-file-open", "arguments":{"path":"/root/test.txt","mode":"a+"}}
{"return": 1000}

{"execute":"guest-file-write", "arguments":{"handle":1000,"buf-b64":"aGVsbG8gd29ybGQhCg=="}}
{"return": {"count": 13, "eof": false}}

{"execute":"guest-file-read", "arguments":{"handle":1000}}
{"return": {"count": 0, "buf-b64": "", "eof": true}}

{"execute":"guest-file-flush","arguments":{"handle":1000}}
{"return": {}}
{"execute":"guest-file-seek", "arguments":{"handle":1000,"offset":0,"whence":0}}
{"return": {"eof": false, "position": 0}}

{"execute":"guest-file-read", "arguments":{"handle":1000}}
{"return": {"count": 13, "buf-b64": "aGVsbG8gd29ybGQhCg==", "eof": true}}

{"execute":"guest-file-close","arguments":{"handle":1000}}
{"return": {}}

# cat test.txt 
hello world!

According above test,this bug fixed.

Comment 18 Guo, Zhiyi 2016-03-22 08:54:22 UTC
Verify this bug in below version:
Host:
2.6.32-633.el6.x86_64
qemu-kvm-0.12.1.2-2.490.el6.x86_64
Guest:
2.6.32-633.el6.x86_64
qemu-guest-agent-0.12.1.2-2.490.el6.x86_64

The same test steps.
step4 result:
{"execute":"guest-file-open", "arguments":{"path":"/root/test.txt","mode":"a+"}}
{"return": 1001}
{"execute":"guest-file-write", "arguments":{"handle":1001,"buf-b64":"aGVsbG8gd29ybGQhCg=="}}    
{"return": {"count": 13, "eof": false}}
{"execute":"guest-file-read", "arguments":{"handle":1001}}
{"return": {"count": 0, "buf-b64": "", "eof": true}}
{"execute":"guest-file-flush","arguments":{"handle":1001}}
{"return": {}}
{"execute":"guest-file-seek", "arguments":{"handle":1001,"offset":0,"whence":0}}
{"return": {"eof": false, "position": 0}}
{"execute":"guest-file-read", "arguments":{"handle":1001}}
{"return": {"count": 13, "buf-b64": "aGVsbG8gd29ybGQhCg==", "eof": true}}
{"execute":"guest-file-close","arguments":{"handle":1001}}
{"return": {}}


# cat test.txt 
hello world!

Comment 20 errata-xmlrpc 2016-05-10 20:57:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0815.html