Bug 1243225
Summary: | Transfer data through chardev while do migration cause data broken | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Zhengtong <zhengtli> |
Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> |
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.2 | CC: | amit.shah, hannsj_uhl, huding, juzhang, knoel, michen, ngu, pbonzini, qzhang, thuth, virt-maint, xuhan, zhengtli |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-07-04 16:21:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1308609, 1359843 |
When you open the host side on the src, the host side will cache data that is sent by the guest. That means you'll have to read out all the data on the src, then read out all the data on the dest, concatenate the outputs, and then compare the md5sum. Please do that and let us know if that worked. (In reply to Amit Shah from comment #6) > When you open the host side on the src, the host side will cache data that > is sent by the guest. That means you'll have to read out all the data on > the src, then read out all the data on the dest, concatenate the outputs, > and then compare the md5sum. > > Please do that and let us know if that worked. There is no environment to do this test now, I will do this as soon as I got one. And still there is a problem, I don't know how to got the cache data on source host, do you have any idea about this? I just connect the socket with python module: >>> import socket >>> >>> sock = socket.socket(socket.AF_UNIX) >>> sock.connect("/tmp/foo") >>> thanks (In reply to Zhengtong from comment #7) > And still there is a problem, I don't know how to got the cache data on > source host, do you have any idea about this? After migration is complete, actually read from the src host and put contents into a file, e.g. dump1. You should already have contents from dest host in a file, e.g. dump2. Then, do something like: cat dump1 dump2 > dump3 and then check the md5 of dump3 and guest file. (In reply to Amit Shah from comment #8) > (In reply to Zhengtong from comment #7) > > And still there is a problem, I don't know how to got the cache data on > > source host, do you have any idea about this? > > After migration is complete, actually read from the src host and put > contents into a file, e.g. dump1. > > You should already have contents from dest host in a file, e.g. dump2. > > Then, do something like: > > cat dump1 dump2 > dump3 > > and then check the md5 of dump3 and guest file. Hi, Amit. Actual I want to know how to get dump1 file. Anyway , I have tried with method as below: 1. On src host. recv the channel data. after migration finished. >>> f=open("dump1",'wb') >>> f.write(sock.recv()) >>> f.close() 2. copy dump1 file to host2. connect the data with dump2 as you said. # cat dump1 dump2 > dump3 and the result is that the md5sum is not same. even the size is not same. Details: Received data: [root@ibm-p8-kvm-02-qe dump]# ls -ls total 4080168 16 -rw-r--r--. 1 root root 16384 Aug 14 08:06 dump1 2040068 -rw-r--r--. 1 root root 2089025536 Aug 14 07:15 dump2 2040084 -rw-r--r--. 1 root root 2089041920 Aug 14 08:27 dump3 Sent data: [root@dhcp71-167 ~]# ls -s 2g 2048000 2g The dump1 is only 16K size , but the gap between 2g(sent data in guest) and dump2(received on dst host) is 7.9M(2048000-2040084). (In reply to Zhengtong from comment #9) > Hi, Amit. Actual I want to know how to get dump1 file. > Anyway , I have tried with method as below: > 1. On src host. recv the channel data. after migration finished. Yes, that's what I meant. > >>> f=open("dump1",'wb') > >>> f.write(sock.recv()) What's the bufsize you give to recv()? Keep receiving till you get EOF. I didn't modified the bufsize during my test.
Check again in my host for default value.
>>> import socket
>>> sock=socket.socket(socket.AF_UNIX)
>>> sock.getsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF)
229376
So. the bufsize is 229376.
(In reply to Zhengtong from comment #5) > I tried on X86 host. But there is a different issue. Here is the step and > result test on X86. > And the migration failed with the error msg in destination as below: > > (qemu) red_dispatcher_loadvm_commands: > KVM internal error. Suberror: 1 > emulation failure > Then the guest freeze. > > > I tried without transferring data b/w host and guest. the migration could > finish successfully. That's strange; can you try this again? This could be due to a mismatch in the two host cpu types. Transferring data shouldn't trigger such an emulation failure. (In reply to Zhengtong from comment #12) > I didn't modified the bufsize during my test. > Check again in my host for default value. > > >>> import socket > >>> sock=socket.socket(socket.AF_UNIX) > >>> sock.getsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF) > 229376 > > > So. the bufsize is 229376. I mean from here: (In reply to Amit Shah from comment #11) > (In reply to Zhengtong from comment #9) > > > Hi, Amit. Actual I want to know how to get dump1 file. > > Anyway , I have tried with method as below: > > 1. On src host. recv the channel data. after migration finished. > > Yes, that's what I meant. > > > >>> f=open("dump1",'wb') > > >>> f.write(sock.recv()) sock.recv() needs to be passed a bufsize parameter. What param do you use here? In any case: > Keep receiving till you get EOF. just keep reading from sock.recv() till there's no more data left in the recv queue. Only that will ensure all data has been received and written to the file on disk. Ok, I see. In my test before. I didn't set sock.recv() with bufsize. the command you saw here is exactly the command I run in test("f.write(sock.recv())"). I test again with bufsize 65536000 , and the md5sum is still not right. step: 1. Boot guest on source host and connect with python tool: >>> import socket >>> sock = socket.socket(socket.AF_UNIX) >>> sock.connect("/root/test_home/liuzt/manually_test/serial-socket") 2. In guest. run #cat 2G > /dev/vport1p1 3. do migration 4. In destination host, get the file by #nc -U $socket > dump2 5. In source host, get the cached data by sock.recv(65536000) >>> f=open("/tmp/dump1",'wb') >>> f.write(sock.recv(65536000)) wait 5 seconds(just to ensure the data is received over) >>> f.close() # ls -l /tmp/dump1 -rw-r--r--. 1 root root 131072 Nov 30 05:44 /tmp/dump1 6. copy /tmp/dump1 file to destination host. and do #cat dump1 dump2 > dump3 compare the md5sum with dump3 in dest host and 2G file in guest, the md5sum is still different. [root@ibm-p8-kvm-01-qe ~]# md5sum dump3 4375ef49dafdb13f977259ce3807c440 dump3 [root@dhcp71-100 ~]# md5sum 2G f3444be3893c1b9b6fa9dd1acde671a9 2G I can't reproduce the issue in comment #5 any more, Not sure which part I did was wrong then, It seems like the problem is that I emulated Sandybridge cpu on my AMD host. I switched the cpu type to "Opteron_G3" and do test, there is no problems any more. In my test on X86. the result is good. I mean the md5sum of files in guest is the same as that in Host. So the problems only occurs on Powerpc. That's strange - maybe it's something in the virtio layer acting up on power? But still, that's a stretch. Assigning to David -- he might have more ideas. I think I've been able to reproduce the problem using only one host for simplicity: - start guest A with: ... -device virtio-serial-pci,id=serial0 \ -chardev socket,id=channel0,path=/tmp/serial_socket_1,server,nowait \ -device virtserialport,bus=serial0.0,nr=1,chardev=channel0,name=org.linux-kvm.port.1,id=port1 \ - start guest B with: ... -device virtio-serial-pci,id=serial0 \ -chardev socket,id=channel0,path=/tmp/serial_socket_2,server,nowait \ -device virtserialport,bus=serial0.0,nr=1,chardev=channel0,name=org.linux-kvm.port.1,id=port1 \ -incoming tcp:0:4444 -> we use two different unix sockets. - on guest A: 1- dd if=/dev/zero of=2g bs=1M count=2000 2- cat 2g > /dev/vport1p1 3- in QEMU monitor: migrate tcp:localhost:4444 once migration is terminated, on host: - sudo nc -U /tmp/serial_socket_2 > file2 Wait until the end of the "cat" on guest B (the shell is displayed again). The 2 files, 2g (guest) and file2 (host) have the same size, 2097152000, but the file received on the host is filled with 0 at the beginning of the file, exactly 0x40000 bytes of 0: 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00040000 e0 ff 81 fb 78 2b b9 7c 26 80 90 7d e8 ff a1 fb |....x+.|&..}....| 00040010 f0 ff c1 fb 79 33 de 7c 78 23 9d 7c f8 ff e1 fb |....y3.|x#.|....| 00040020 c0 ff 01 fb 78 1b 7f 7c 78 3b fc 7c d0 ff 41 fb |....x..|x;.|..A.| I've tested this with qemu-kvm-rhev-2.3.0-31.el7_2.8 and with upstream QEMU (331ac65) on POWER and the result is the same. I didn't use the python script to open the chardev to have this result. The behavior is exactly the same with QEMU x86_64 upstream (331ac65), beginning of the file is overwritten by 0x40000 bytes of 0. Forget my two previous comments (#18 and #19): in fact my file is corrupted during the migration (it should be filled entirely with 0), because I'm using "snapshot=on" so some disk datas are not transferred from a guest to another... With snapshot=off I'm not able to reproduce the bug on POWER or x86_64 (but I'm not using the python script to open the first serial port, I'll do more tests with that). Re-reading BZ description, it seems you are also on the same host, but for POWER you use two different socket names (serial-socket2 and serial-socket1), but on x86 you are using only one socket name (serial_socket). Could you recheck this BZ on POWER and X86? Thanks I've tested again using this time the python script, and I've been able to reproduce it but I think the problem is with the python script. Amit was right in comment 14: we must keep reading on sock.recv() but playing with the buffsize parameter is not enough to flush all the buffers: - we must loop on the sock.recv() and f.write() - we must quit the first guest once the migration is done (to exit the loop and flush all intermediate buffers). I'm using now this python script (flush.py): -------------------8<------------------ import socket sock = socket.socket(socket.AF_UNIX) sock.connect("/tmp/serial_socket_1") raw_input("Press Enter to continue...") f=open("file1",'wb') while 1: data = sock.recv(4096) if not data: break f.write(data) f.close() -------------------8<------------------ the test is: 0- clean all files (to be sure we don't re-use an existing unix socket): sudo rm -f file* /tmp/serial_socket_* 1- - start guest A with: ... -device virtio-serial-pci,id=serial0 \ -chardev socket,id=channel0,path=/tmp/serial_socket_1,server,nowait \ -device virtserialport,bus=serial0.0,nr=1,chardev=channel0,name=org.linux-kvm.port.1,id=port1 \ 2 - start guest B with: ... -device virtio-serial-pci,id=serial0 \ -chardev socket,id=channel0,path=/tmp/serial_socket_2,server,nowait \ -device virtserialport,bus=serial0.0,nr=1,chardev=channel0,name=org.linux-kvm.port.1,id=port1 \ -incoming tcp:0:4444 3- on the host, start the python script "flush.py": sudo python flush.py You should have the following message: "Press Enter to continue..." but don't press enter! This opens the socket but doesn't read any data until enter is pressed. 4- on guest A: log in as root dd if=/dev/zero of=2g bs=1M count=2000 cat 2g >/dev/vport1p1 5- on guest A: switch to monitor migrate tcp:localhost:4444 6- wait the end of the migration 7- on host: sudo nc -U /tmp/serial_socket_2 > file2 8- on guest B: wait the end of "cat" (shell prompt is displayed) 9- in the terminal of the python script flush.py, press enter (to flush the data) 10- in guest A monitor, type "quit" (this will allow to exit from the loop of the python script) 11- cat file1 file2 > file3 12- compare "md5sum 2g" and "md5sum file3" f3444be3893c1b9b6fa9dd1acde671a9 file3 f3444be3893c1b9b6fa9dd1acde671a9 2g So, I think this is not a bug, we had just some unflushed data (as guessed by Amit). Note: without the loop in the python script, my file1 size is 131072 (as in comment 15) whereas with the loop the size is 4325376. 28b258d52534be3a698cb6cdd41dd16e(In reply to Laurent Vivier from comment #20) > Forget my two previous comments (#18 and #19): in fact my file is corrupted > during the migration (it should be filled entirely with 0), because I'm > using "snapshot=on" so some disk datas are not transferred from a guest to > another... > > With snapshot=off I'm not able to reproduce the bug on POWER or x86_64 (but > I'm not using the python script to open the first serial port, I'll do more > tests with that). > > Re-reading BZ description, it seems you are also on the same host, but for > POWER you use two different socket names (serial-socket2 and > serial-socket1), but on x86 you are using only one socket name > (serial_socket). > > Could you recheck this BZ on POWER and X86? > > Thanks Hi Laurent, For POWERPC, I only use serial-socket1 port in this case test, although I attached two socket during guest boot. Maybe this make you confused, I use this suit of commands to test not only this case but also other test cases which need multiple serial sockets. I just paste all command lines here. for this case, I tested between two hosts, only used serial-socket1. and regards comment #21. I tested in my side following your steps. But I do migration between two physical hosts. I didn't get the right result. I am keeping trying more times. in my side , the process: 1. Boot guest in hostA and hostB A: qemu-kvm ... -chardev socket,path=/root/test_home/liuzt/Manuall_test/virtio-serail/serial-socket1,id=channel0,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=channel0,name=org.linux-kvm.port.1,id=port1 \ ... B: qemu-kvm .. -chardev socket,path=/root/test_home/liuzt/Manuall_test/virtio-serail/serial-socket1,id=channel0,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=channel0,name=org.linux-kvm.port.1,id=port1 \ ... -incoming tcp:0:5980 2. Run the python scripts provided by you on host A: [root@ibm-p8-firestone-01 script]# python test.py Press Enter to continue... (I didn't press Enter) 3. In guest # dd if=/dev/zero of=./2G bs=1M count=2000 # cat 2G > /dev/vport1p1 4. Do migrate on Host A (hmp) (qemu) migrate -d tcp:10.19.112.10:5980 5. While migration process finished. press "Enter" to trigger the receive action in python scripts. And wait for a long time when the file1's size doesn't increase, then terminate post-migration vm in Host A with "quit" command in hmp. 6. in Host B, get the data through serial socket. # nc -U /root/test_home/liuzt/Manuall_test/virtio-serail/serial-socket1 > file2 till the process(cat 2G>/dev/vport1p1) in guest got finished. 7." Ctrl + c " to terminate nc process in Host B. 8. scp file1 from Host A to Host B, and do data connection. cat file1 file2 > file3 9. compare md5sum [root@dhcp71-153 ~]# md5sum 2G f3444be3893c1b9b6fa9dd1acde671a9 2G [root@ibm-p8-rhevm-18 ~]# md5sum file3 268542feafe56109e1167d747b15fa66 file3 I am not sure where should the problem should be. Any ideas about this ? the file size: [root@ibm-p8-rhevm-18 ~]# ls -sl file* 4224 -rw-r--r--. 1 root root 4325376 Mar 18 02:47 file1 2040064 -rw-r--r--. 1 root root 2089025536 Mar 18 02:46 file2 2044288 -rw-r--r--. 1 root root 2093350912 Mar 18 02:47 file3 [root@dhcp71-153 ~]# ls -sl 2G 2048000 -rw-r--r--. 1 root root 2097152000 Mar 18 14:25 2G (In reply to Zhengtong from comment #22) > the file size: > [root@ibm-p8-rhevm-18 ~]# ls -sl file* > 4224 -rw-r--r--. 1 root root 4325376 Mar 18 02:47 file1 > 2040064 -rw-r--r--. 1 root root 2089025536 Mar 18 02:46 file2 > 2044288 -rw-r--r--. 1 root root 2093350912 Mar 18 02:47 file3 I have: -rw-r--r--. 1 root root 4325376 18 mars 06:41 file1 -rw-r--r--. 1 lvivier users 2092826624 18 mars 06:41 file2 -rw-r--r--. 1 lvivier users 2097152000 18 mars 06:41 file3 So the problem seems to be with file2, the file generated by the "nc" process. I've been able to reproduce [1] this on POWER _and_ on x86_64 with upstream QEMU (2538039 + [2]) , so change "architecture" flag to "all". [1] as described in comment 21 and comment 22 [2] https://patchwork.ozlabs.org/patch/599628/ I've added some traces in hw/char/virtio-console.c:flush_buf(). What happens is: - on guest A side, data are consumed only when the flush.py is connected (not reading), - on guest B side, just after the end of migration and while nothing is connected ("nc" is not started), data are consumed by qemu_chr_fe_write() and lost. Exactly 4194304 bytes (4 MiB). Then, when we start "nc" remaining data are read until the end and written in the file. The data are lost in tcp_chr_write(). If the socket is not connected, the data are trashed: static int tcp_chr_write(CharDriverState *chr, const uint8_t *buf, int len) ... if (s->connected) { ... } else { /* XXX: indicate an error ? */ return len; } } On migration, the guest B should check if data can be sent to the front-end before trying to write them (qemu_chr_fe_write()). Amit, any idea how to do that? (In reply to Laurent Vivier from comment #26) > The data are lost in tcp_chr_write(). > > If the socket is not connected, the data are trashed: > > static int tcp_chr_write(CharDriverState *chr, const uint8_t *buf, int len) > ... > if (s->connected) { > ... > } else { > /* XXX: indicate an error ? */ > return len; > } > } > > On migration, the guest B should check if data can be sent to the front-end > before trying to write them (qemu_chr_fe_write()). > > Amit, any idea how to do that? To test this, how about dropping 'nowait' from the chardev attributes? That way, qemu will only accept incoming connection after we connect the socket. Then, if the guest does a write, we will have all the data the guest sent. What could also be happening is this: If the dest host socket is not connected, upon migration end, the 'host_connected' property of the virtio-serial port becomes false (hw/char/virtio-serial-bus.c:668). And so, any data that is sent by the guest is discarded (hw/char/virtio-serial-bus.c:478). I think the only way to mitigate this would be to ensure we don't use 'nowait' on the dest. Another interesting experiment would be to try this using libvirt. libvirt will do all the setup before accepting migration, so we won't actually see this there. (In reply to Amit Shah from comment #27) > (In reply to Laurent Vivier from comment #26) > > The data are lost in tcp_chr_write(). > > > > If the socket is not connected, the data are trashed: > > > > static int tcp_chr_write(CharDriverState *chr, const uint8_t *buf, int len) > > ... > > if (s->connected) { > > ... > > } else { > > /* XXX: indicate an error ? */ > > return len; > > } > > } > > > > On migration, the guest B should check if data can be sent to the front-end > > before trying to write them (qemu_chr_fe_write()). > > > > Amit, any idea how to do that? > > To test this, how about dropping 'nowait' from the chardev attributes? That > way, qemu will only accept incoming connection after we connect the socket. > Then, if the guest does a write, we will have all the data the guest sent. If we remove the 'nowait', we can't migrate as the guest B is waiting the connection and is not ready to receive migration data. And in fact, data are lost even if there is a "reader" on the socket of the guest B (with or without "nowait") while the migration is being done. The only case we don't loose data during migration is if socket is not opened by a reader on guest A. (In reply to Laurent Vivier from comment #28) > (In reply to Amit Shah from comment #27) > > (In reply to Laurent Vivier from comment #26) > > > The data are lost in tcp_chr_write(). > > > > > > If the socket is not connected, the data are trashed: > > > > > > static int tcp_chr_write(CharDriverState *chr, const uint8_t *buf, int len) > > > ... > > > if (s->connected) { > > > ... > > > } else { > > > /* XXX: indicate an error ? */ > > > return len; > > > } > > > } > > > > > > On migration, the guest B should check if data can be sent to the front-end > > > before trying to write them (qemu_chr_fe_write()). > > > > > > Amit, any idea how to do that? > > > > To test this, how about dropping 'nowait' from the chardev attributes? That > > way, qemu will only accept incoming connection after we connect the socket. > > Then, if the guest does a write, we will have all the data the guest sent. > > If we remove the 'nowait', we can't migrate as the guest B is waiting the > connection and is not ready to receive migration data. > > And in fact, data are lost even if there is a "reader" on the socket of the > guest B (with or without "nowait") while the migration is being done. The > only case we don't loose data during migration is if socket is not opened by > a reader on guest A. A hint can be given by what happens with qemu upstream when a reader is already connected on the guest B socket: while we loose some data on migration with qemu-kvm-rhev, we have a crash of the guest B in the case of upstream qemu (de1d099): virtio: unexpected memory split Paolo, as this error is raised by code you have added [1], do you understand what happens? [1] commit 3b3b0628217e2726069990ff9942a5d6d9816bd7 Author: Paolo Bonzini <pbonzini> Date: Sun Jan 31 11:29:01 2016 +0100 virtio: slim down allocation of VirtQueueElements Build the addresses and s/g lists on the stack, and then copy them to a VirtQueueElement that is just as big as required to contain this particular s/g list. The cost of the copy is minimal compared to that of a large malloc. When virtqueue_map is used on the destination side of migration or on loadvm, the iovecs have already been split at memory region boundary, so we can just reuse the out_num/in_num we find in the file. Wow, so I have actually improved the situation! :-O The "memory split" error means that a descriptor spans multiple non-adjacent RAM regions. It means that the memory map is somehow different between the source and destination. Here is how VirtQueueElements are built and migrated. The source uses virtqueue_map_desc to convert a descriptor (which is a s/g entry, i.e. address+length) into a series of "struct iov". The iov_base field is a host address, thus it differs between source and destination. But we cannot just rebuild the array of "struct iov" because the device might be storing indices in the source's array. In fact this is true of virtio-serial, which migrates an (iov_idx, iov_offset) pair together with each VirtQueueElement! So the destination uses virtqueue_map_iovec to fill in that field. virtqueue_map_iovec takes the base address from the descriptors and the length from the "struct iov"s. It then uses cpu_physical_memory_map to retrieve the iov_base. If cpu_physical_memory_map returns a different length than the iov_len, it means that it behaved differently between virtqueue_map_desc and virtqueue_map_iovec. That's a "memory split" and, well, it's unexpected. :) Before I introduced this new algorithm, virtio would rebuild the entire iovec. I introduced the check almost as an assertion failure, to simplify the loading process, and it would be in principle possible to rewrite qemu_get_virtqueue_element to deal with a memory split. But then we'd go back to the situation where iov_idx and iov_offset now point in the woods: the failure of this check can definitely explain the older corruption and I wonder if there are weird security consequences. The new algorithm is not _necessary_. We can remove the error and fix virtio-serial: use the saved VirtQueueElement to map (iov_idx,iov_offset) to a byte offset, and reconstruct a new index/offset pair as it is loaded. But this would not be too easy and we might have other bugs looming in other virtio devices. So I would keep the check and appreciate the unexpected checking it provides. This is also because, while really weird, the failure should be easy to debug. Just call "info mtree" on the source; on the destination, invoke "call mtree_info(fprintf, stdout)" from the debugger when it errors out. You can then correlate It might happen because qemu_get_virtqueue_element is called before the memory map has been transferred completely (it is split between multiple devices). One way to fix it would be to move the virtqueue_map call from qemu_get_virtqueue_element to virtio_vmstate_change. (In reply to Paolo Bonzini from comment #30) > Wow, so I have actually improved the situation! :-O Thank you for your help and this detailed explanation. Sadly, I'm not able to reproduce anymore the crash (on x86_64 or ppc64le), even with the same commit number... The problem looks like a flow control problem on the virtio-serial port, rather a migration one. It appears with migration because migration induces latency in the flow. I've written a little C program to see which data are lost: /* cc -o gen gen.c */ #include <unistd.h> static unsigned buf[1048576]; /* 1 MiB */ int main(void) { unsigned i, j; for (i = 0; i < 2000; i++) { /* 2000 MiB */ for (j = 0; j < 1048576 / sizeof(unsigned); j++) { buf[j] = i * 1048576 / sizeof(unsigned) + j; } write(STDOUT_FILENO, buf, 1048576); } return 0; } But as this program runs much faster than the "cat 2g" (x6), we have corrupted data after the first block (block has the size of the I/O buffer), i.e. after the first 32768 bytes. on the host: $ sudo nc -U /tmp/serial_socket_1 > file1 on the guest: $ cc -o gen gen.c $ time ./gen >/dev/vport1p1 real 0m1.353s user 0m1.343s sys 0m0.009s Compare result on the host: $ ./gen > 2g $ ls -l 2g file1 ... 2097152000 ... 2g ... 65536000 ... file1 $ cmp 2g file1 2g file1 differ: byte 32770, line 289 FYI: time cat 2g > /dev/vport1p1 real 0m8.007s user 0m0.015s sys 0m1.281s Amit, do you know if this behavior is normal? Should we expect to have flow control? That's very interesting. My first reaction to this was that it's a guest bug. Trying it out, it's suspicious that my file size too is 65536000. In the guest: # cat /sys/kernel/debug/virtio-ports/vport0p5 name: guest_connected: 0 host_connected: 1 outvq_full: 0 bytes_sent: 65536000 bytes_received: 6564 bytes_discarded: 0 is_console: no console_vtermno: 1 which shows that indeed the guest has only written out those many bytes to the port, and qemu has written them all out to the socket. So I think we can, for the purpose of this test, say qemu is not at fault. Now on to the guest: there's no reason for the guest to drop any buffers, so I was quite sure that write() returns an error when the vq is full, and we aren't retrying those writes. So I added a do..while loop to retry on error, but the result was the same. I then checked the return value from write(), and found the culprit: we only write 32768 bytes per write(), and this program drops the remaining bytes. 32768 * 2000 = 65536000, so this explains it. It's legal for the kernel to do short writes, and the userspace app has to ensure it writes everything else, so we're within our rights to do that in the guest kernel. If we didn't do that, we'd have to allocate huge buffers in the kernel to do our writes, which isn't always a good idea. I think 32K per call is fine, but if someone thinks we should do better, I'm all ears. However, that's a different problem. Thank you Amit. To be sure we manage correctly the "write()" stuf, I dump the result of "./gen" in a file, and then send it to /dev/vport1p1 with cat. I test the case where the "nc" is already attached to the guest 2 before the migration: The result is now: ... 4423680 ... file1 ... 2096922624 ... file2 ... 2101346304 ... file3 $ cmp 2g file1 cmp: EOF on file1 So file1 has been transfered correctly. We can check the end of the file: $ hexdump -e '"%07.7_ax " 4/4 "%08x " "\n"' file1|tail -2 0437fe0 0010dff8 0010dff9 0010dffa 0010dffb 0437ff0 0010dffc 0010dffd 0010dffe 0010dfff and 10dffc * 4 = 437ff0. But if we check the start of file2, the result is very strange: $ hexdump -e '"%07.7_ax " 4/4 "%08x " "\n"' file2|head -2 0000000 0000e000 0000e001 0000e002 0000e003 0000010 0000e004 0000e005 0000e006 0000e007 we should have instead of e000: (437ff0 + 10) / 4 = 10e000 But the end of the file is OK: $ hexdump -e '"%07.7_ax " 4/4 "%08x " "\n"' 2g|tail -2 7cffffe0 1f3ffff8 1f3ffff9 1f3ffffa 1f3ffffb 7cfffff0 1f3ffffc 1f3ffffd 1f3ffffe 1f3fffff $ hexdump -e '"%07.7_ax " 4/4 "%08x " "\n"' file2|tail -2 7cfc7fe0 1f3ffff8 1f3ffff9 1f3ffffa 1f3ffffb 7cfc7ff0 1f3ffffc 1f3ffffd 1f3ffffe 1f3fffff and: $ ls -l 2g file3 ... 2097152000 ... 2g ... 2101346304 ... file3 and 2101346304 - 2097152000 = 0x400000, 0x400000 / 4 = 0x100000 and so it looks like the second guest re-emit the 0x40000 last bytes of the guest 1. Hi Zhengtong, I've tried to reproduce this issue with latest qemu [1] on ppc64le, but the problem seems fixed. Could you try to reproduce it on your own, please? [1] qemu-kvm-rhev-2.6.0-8.el7.ppc64le kernel-3.10.0-445.el7.ppc64le Hi , Laurent , I will give the result next week. hi Laurent, I tried in my test environment here. The result seems no any difference. The problem is still there. I tested following the steps in comment #22. Still, the md5sum is not same. Guest: [root@dhcp113-129 ~]# md5sum 2G f3444be3893c1b9b6fa9dd1acde671a9 2G Host2: [root@dhcp113-127 home]# md5sum file3 268542feafe56109e1167d747b15fa66 file3 [root@dhcp113-127 home]# ls -lS total 4088580 -rw-r--r--. 1 root root 2093350912 Jun 27 02:05 file3 -rw-r--r--. 1 root root 2089025536 Jun 27 02:04 file2 -rw-r--r--. 1 root root 4325376 Jun 27 02:05 file1 Both hosts kernel: 3.10.0-445.el7.ppc64le qemu-kvm-rhev-2.6.0-8.el7 Have you tried with the steps in comment #22 . Is the steps in comment #22 is correct , laurent ? (In reply to Zhengtong from comment #37) > Have you tried with the steps in comment #22 . Is the steps in comment #22 > is correct , laurent ? Yes / Yes But it seems the probability to have the problem is not 100%. I'm able to reproduce it, thanks. Data are lost when the transfer is resumed on the destination guest. Data seem to be lost in hw/char/virtio-serial-bus.c:handle_output() because port->host_connected is false and virtqueue elements are dropped. Data are dropped since this commit: ommit 3ecb45f893d09a97b8f24399b5e40808a708261f Author: Amit Shah <amit.shah> Date: Tue Apr 27 18:04:04 2010 +0530 virtio-serial: Send out guest data to ports only if port is opened Data should be written only when ports are open. Signed-off-by: Amit Shah <amit.shah> Signed-off-by: Anthony Liguori <aliguori.com> diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 3a09f0d..6befd4d 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -335,6 +335,11 @@ static void handle_output(VirtIODevice *vdev, VirtQueue *vq) goto next_buf; } + if (!port->host_connected) { + ret = 0; + goto next_buf; + } + /* * A port may not have any handler registered for consuming the * data that the guest sends or it may not have a chardev associated And cleaned up by: a69c760 virtio-serial: Discard data that guest sends us when ports aren't connected Amit, do you think we should not drop data or this behavior is normal (we can loose data if the host is not ready)? I think what happens is: 1- kernel of source guest fill the virtqueue with data to transfer (because host_connected == true) 2- the guest is migrated to destination guest 3- qemu read data from the virtqueue, and as the port is not connected to anything (host_connected == false), data are discarded. I don't think there is any solution to this bug: 1- the source guest virtio-serial port is hanging because the port is connected but data are not read, 2- the destination guest virtio-serial port cannot flush its data because there is nothing connected to the port, I think this configuration is a not a viable one because the guests configuration is not symmetric: if there is a consumer connected to the source guest port, we should have a consumer connected to the destination guest port. [It's like migrating from a source guest with a disk backend to a destination guest without a disk backend. Is it possible?] I don't know if we can apply this to the virtio-serial port, but normally a serial port is not reliable: data can be lost... and this is what happens in this case. I propose to close this BZ as WONTFIX. Any comments? (In reply to Laurent Vivier from comment #42) > I don't think there is any solution to this bug: > > 1- the source guest virtio-serial port is hanging because the port is > connected > but data are not read, > > 2- the destination guest virtio-serial port cannot flush its data because > there > is nothing connected to the port, > > I think this configuration is a not a viable one because the guests > configuration is not symmetric: if there is a consumer connected to the > source guest port, we should have a consumer connected to the destination > guest port. Right. If there were a consumer on the host on the dest, any data it reads would not be consistent unless the consumer on the src had read everything out from its buffers (i.e. any buffering the src Linux does -- not the virtqueue buffers!). We could let the data remain in the vq, and we do do that in case the consumer isn't reading -- but as you note, this is an inconsistent configuration on the dest host during migration, and we really can't provide guarantees this way. > [It's like migrating from a source guest with a disk backend to a > destination guest without a disk backend. Is it possible?] > > I don't know if we can apply this to the virtio-serial port, but normally a > serial port is not reliable: data can be lost... and this is what happens in > this case. It is supposed to be reliable, however since this is an inconsistent config, I agree we could close it as WONTFIX. Thank you Amit. So I close this BZ. |
Description of problem: Do transfer data b/w host and guest while doing migration will cause data broken Version-Release number of selected component (if applicable): Host kernel 3.10.0-292.el7.ppc64le qemu: qemu-kvm-rhev-2.3.0-9.el7 How reproducible: 3/3 Steps to Reproduce: 1.Boot up guest with chardev attached /usr/libexec/qemu-kvm -name liuzt-RHEL-7.1-20150219.1_LE -machine pseries,accel=kvm,usb=off -m 32768 -realtime mlock=off -smp 64,sockets=1,cores=16,threads=4 \ -monitor stdio \ -monitor unix:tt,server,nowait \ -rtc base=localtime,clock=host \ -no-shutdown \ -boot strict=on \ -device usb-ehci,id=usb,bus=pci.0,addr=0x2 \ -device pci-ohci,id=usb1,bus=pci.0,addr=0x1 \ -device spapr-vscsi,id=scsi0,reg=0x1000 \ -drive file=/root/test_home/liuzt/vdisk/rhel_le.img,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \ -serial pty \ -device usb-kbd,id=input0 \ -device usb-mouse,id=input1 \ -device usb-tablet,id=input2 \ -vnc 0:16 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x4 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \ -msg timestamp=on \ -netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device spapr-vlan,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:83,reg=0x2000 \ -device virtio-serial-pci,id=virtio-serial0 \ -chardev socket,path=/root/test_home/liuzt/Manuall_test/virtio-serail/serial-socket1,id=channel0,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=channel0,name=org.linux-kvm.port.1,id=port1 \ -chardev socket,path=/root/test_home/liuzt/Manuall_test/virtio-serail/serial-socket2,id=channel1,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=channel1,name=org.linux-kvm.port.2,id=port2 \ -qmp tcp:0:4444,server,nowait \ 2.on the host, open the chardev but don't read from it. python Python 2.7.3 (default, Aug 9 2012, 17:23:57) [GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import socket >>> >>> sock = socket.socket(socket.AF_UNIX) >>> sock.connect("/tmp/foo") >>> 3.create 2G size a big file in guest #dd if=/dev/zero of=2g bs=1M count=2000 send data from guest #cat 2g >/dev/vport0p1 4.boot up dst :B guest /usr/libexec/qemu-kvm -name liuzt-RHEL-7.1-20150219.1_LE -machine pseries,accel=kvm,usb=off -m 32768 -realtime mlock=off -smp 64,sockets=1,cores=16,threads=4 \ -monitor stdio \ -rtc base=localtime,clock=host \ -no-shutdown \ -boot strict=on \ -device usb-ehci,id=usb,bus=pci.0,addr=0x2 \ -device pci-ohci,id=usb1,bus=pci.0,addr=0x1 \ -device spapr-vscsi,id=scsi0,reg=0x1000 \ -drive file=/root/test_home/liuzt/vdisk/rhel_le.img,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none \ -drive file=/root/test_home/liuzt/iso/RHEL-LE-7.1-20150219.1-Server-ppc64le-dvd1.iso,if=none,id=drive-scsi0-0-1-0,readonly=on,format=raw,cache=none \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \ -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0,bootindex=2 \ -serial pty \ -device usb-kbd,id=input0 \ -device usb-mouse,id=input1 \ -device usb-tablet,id=input2 \ -vnc 0:17 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x4 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \ -msg timestamp=on \ -netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device spapr-vlan,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:83,reg=0x2000 \ -device virtio-serial-pci,id=virtio-serial0 \ -chardev socket,path=/root/test_home/liuzt/Manuall_test/virtio-serail/serial-socket1,id=channel0,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=channel0,name=org.linux-kvm.port.1,id=port1 \ -chardev socket,path=/root/test_home/liuzt/Manuall_test/virtio-serail/serial-socket2,id=channel1,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=channel1,name=org.linux-kvm.port.2,id=port2 \ -qmp tcp:0:4454,server,nowait \ -incoming tcp:0:5980 5.migrate from A to B (qemu) migrate -d tcp:10.16.67.19:5980 6.start listener on B #nc -U $host_socket > 2g-host 7.do md5sum for 2g and 2g-host. Actual results: md5sum is not same Expected results: md5sum should be same Additional info: