Bug 1262866

Summary: [RHEL6] Package is 100% lost when ping from host to Win2012r2 guest with 64000 size
Product: Red Hat Enterprise Linux 6 Reporter: Vlad Yasevich <vyasevic>
Component: qemu-kvmAssignee: Vlad Yasevich <vyasevic>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.8CC: ailan, chayang, huding, juzhang, knoel, meyang, mkenneth, pmatouse, rbalakri, shuang, virt-bugs, virt-maint, vyasevic, weliao, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.483.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1251379 Environment:
Last Closed: 2016-05-10 21:00:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1251379    
Bug Blocks: 1252757    

Comment 1 Chao Yang 2015-12-16 03:28:43 UTC
Wei,

Would please try to reproduce this issue?

Comment 2 weliao 2015-12-16 05:18:09 UTC
Tested with rtl8139 NIC, seems can't reproduce this bug,
<1> host info
2.6.32-592.el6.x86_64
qemu-kvm-0.12.1.2-2.481.el6.x86_64

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 27
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 6784.67
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

<2> qemu-kvm command:
/usr/libexec/qemu-kvm     -S      \
-name 'virt-tests-vm1'       \
-machine pc      \
-nodefaults      -vga std      \
-chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20150807-143123-2OZwwnr3,server,nowait     \
-mon chardev=qmp_id_qmpmonitor1,mode=control      \
-chardev socket,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20150807-143123-2OZwwnr3,server,nowait     \
-mon chardev=qmp_id_catch_monitor,mode=control      \
-chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150807-143123-2OZwwnr3,server,nowait     \
-device isa-serial,chardev=serial_id_serial0      \
-chardev socket,id=seabioslog_id_20150807-143123-2OZwwnr3,path=/tmp/seabios-20150807-143123-2OZwwnr3,server,nowait     \
-device isa-debugcon,chardev=seabioslog_id_20150807-143123-2OZwwnr3,iobase=0x402   \
-device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03     \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=04     \
-drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/home/win2012-64r2-virtio-scsi.qcow2     \
-device scsi-hd,id=image1,drive=drive_image1     \
-device rtl8139,mac=9a:79:7a:7b:7c:7d,id=idCwN3Ey,netdev=idqrOgQm,bus=pci.0,addr=05\
-netdev tap,id=idqrOgQm     \
-m 8192      \
-smp 8,maxcpus=8,cores=4,threads=1,sockets=2      \
-cpu 'SandyBridge'     \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1      \
-vnc :0      -rtc base=localtime,clock=host,driftfix=slew      \
-boot order=cdn,once=c,menu=off,strict=off     \
-enable-kvm     \
-monitor stdio

<3> step
launch a win2012R2 guest with rtl8139 NIC, begin send packet from host to guest

[root@dhcp-8-139 mnt]# ping 10.66.10.3 -c 1 -s 65000
PING 10.66.10.3 (10.66.10.3) 65000(65028) bytes of data.
65008 bytes from 10.66.10.3: icmp_seq=1 ttl=128 time=16.3 ms

--- 10.66.10.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 16ms
rtt min/avg/max/mdev = 16.313/16.313/16.313/0.000 ms
[root@dhcp-8-139 mnt]# ping 10.66.10.3 -c 1 -s 64000
PING 10.66.10.3 (10.66.10.3) 64000(64028) bytes of data.
64008 bytes from 10.66.10.3: icmp_seq=1 ttl=128 time=7.40 ms

--- 10.66.10.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 7ms
rtt min/avg/max/mdev = 7.405/7.405/7.405/0.000 ms
[root@dhcp-8-139 mnt]# ping 10.66.10.3 -c 1 -s 63000
PING 10.66.10.3 (10.66.10.3) 63000(63028) bytes of data.
63008 bytes from 10.66.10.3: icmp_seq=1 ttl=128 time=4.36 ms

--- 10.66.10.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 4ms
rtt min/avg/max/mdev = 4.361/4.361/4.361/0.000 ms

send packet from guest to host still OK

Comment 3 Chao Yang 2015-12-17 05:21:57 UTC
Hi Vlad,

Could you please double confirm this issue cause QE is not able to reproduce the original bug on latest RHEL 6.8 builds.

Comment 4 Vlad Yasevich 2015-12-21 19:04:45 UTC
It does appear like the issue doesn't happen on RHEL6.8 builds.  I am still trying to understand why, but from observations it appears that windows is
handling the interrupt earlier when running on rhel6.8.  As a result, the
receiver buffer starts getting drained before the last fragment is received,
and we never enter the overflow condition that actually triggers the is bug.

Here are snippets from the 6.8 and 7.2 logs that show the difference.

RHEL6.8:
>>> RTL8139: received len=1514
>>> RTL8139: physical address matching packet received
RTL8139: in ring Rx mode ================
   received: rx buffer length 65536 head 0x285c read 0x2860
RTL8139: Set IRQ to 1 (0001 405f)
RTL8139: IntrStatus read(w) val=0x0001   <<< VXY: OS processing interrupt and
RTL8139: IntrMask write(w) val=0x0000    <<< VXY: starts reading data.
RTL8139: Set IRQ to 0 (0001 0000)
RTL8139: IntrStatus write(w) val=0x0011
RTL8139: Set IRQ to 0 (0000 0000)
RTL8139: Set IRQ to 0 (0000 0000)
RTL8139: receiver buffer data available 0xfffc
RTL8139: ChipCmd read val=0x000c
RTL8139: RxBufPtr write val=0x2e44
 CAPR write: rx buffer length 65536 head 0x285c read 0x2e54
>>> RTL8139: received len=1402            <<< VXY: Last fragment received.
>>> RTL8139: physical address matching packet received
RTL8139: in ring Rx mode ================
   received: rx buffer length 65536 head 0x2de0 read 0x2e54
RTL8139: Set IRQ to 0 (0001 0000)
RTL8139: receiver buffer data available 0xff8c

RHEL7.2:
RTL8139: >>> received len=1514
RTL8139: >>> physical address matching packet received
RTL8139: in ring Rx mode ================
RTL8139: received: rx buffer length 65536 head 0x08ec read 0x08f0
RTL8139: Set IRQ to 1 (0001 405f)
RTL8139: >>> received len=1402          <<< VXY: last fragment received
RTL8139: >>> physical address matching packet received
RTL8139: in ring Rx mode ================
RTL8139: rx overflow: rx buffer length 65536 head 0x08ec read 0x08f0 === available 0x0004 need 0x0582            <<< VXY: fragment dropped
RTL8139: Set IRQ to 1 (0011 405f)
RTL8139: entered rtl8139_set_next_tctr_time
RTL8139: IntrStatus read(w) val=0x0011   <<< VXY: OS handles interrupt.
RTL8139: IntrMask write(w) val=0x0000
RTL8139: entered rtl8139_set_next_tctr_time
RTL8139: Set IRQ to 0 (0011 0000)
RTL8139: IntrStatus write(w) val=0x0011
RTL8139: Set IRQ to 0 (0000 0000)
RTL8139: entered rtl8139_set_next_tctr_time
RTL8139: Set IRQ to 0 (0000 0000)
RTL8139: receiver buffer data available 0xfffc
RTL8139: ChipCmd read val=0x000c
RTL8139: RxBufPtr write val=0x0ed4
RTL8139:  CAPR write: rx buffer length 65536 head 0x08ec read 0x0ee4
RTL8139: receiver buffer data available 0xfa08

I potential for the overflow condition in the qemu-kvm-0.12.1.2-2.482 is
still there, it just appears to be a lot harder to trigger.

-vlad

Comment 7 Jeff Nelson 2016-01-05 20:08:20 UTC
Fix included in qemu-kvm-0.12.1.2-2.483.el6

Comment 12 errata-xmlrpc 2016-05-10 21:00:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0815.html