Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1072800

Summary: The 1st icmp_seq lost for the first ping with big size after boot up
Product: Red Hat Enterprise Linux 7 Reporter: langfang <flang>
Component: kernelAssignee: Jiri Pirko <jpirko>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: medium    
Version: 7.0CC: acathrow, flang, hhuang, jasowang, jpirko, juzhang, mst, qiguo, qzhang, rhod, rkhan, virt-maint, vyasevic, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-10 07:43:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
capture the icmp wire when ping with 65507 size
none
tcpdump of working ping (<~40000 size)
none
tcpdump of not working ping - first request frags cut (>~40000 size) none

Description langfang 2014-03-05 09:06:06 UTC
Description of problem:

20% packages  loss for the first ping after "set_link vnet0 off" -->reboot guest--> "set_link vnet0 on"

Version-Release number of selected component (if applicable):
Host:
# uname -r
3.10.0-99.el7.x86_64
# rpm -q qemu-kvm-rhev
qemu-kvm-rhev-1.5.3-50.el7.x86_64
# rpm -q seabios
seabios-1.7.2.2-11.el7.x86_64
ipxe-roms-qemu-20130517-4.gitc4bce43.el7.noarch

Guest:RHEL7 
3.10.0-99.el7.x86_64

Preparation:

Host:network infra: bridge

How reproducible:

100%

Steps to Reproduce:
1.Boot guest
 /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -cpu SandyBridge -enable-kvm -m 2048 -smp 4,sockets=2,cores=2,threads=1 -no-kvm-pit-reinjection -usb -device usb-tablet,id=input0 -name RHEL7.0 -uuid bf76b557-17a7-4fc8-aa14-1b1890bd6937 -rtc base=localtime,clock=host,driftfix=slew -device ahci,id=ahci0 -drive if=none,file=/home/rhel7base.raw,format=raw,id=drive-sata0-0-0 -device ide-drive,bus=ahci0.0,drive=drive-sata0-0-0,id=sata0-0-0 -spice port=5900,disable-ticketing,ipv6 -global qxl-vga.vram_size=67108864 -vga std -netdev tap,id=hostdev0,vhost=on,script=/etc/qemu-ifup,queues=4 -device virtio-net-pci,netdev=hostdev0,vectors=10,mq=on,mac=00:1a:4a:42:0b:00,id=vnet0,status=on -device virtio-balloon-pci,bus=pci.0,id=balloon0 -monitor stdio

2.After guest boot up ,tried guest network and ping host ,work well
# ping6 2001::7646:a0ff:fe8e:81d9
PING 2001::7646:a0ff:fe8e:81d9(2001::7646:a0ff:fe8e:81d9) 56 data bytes
64 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=1 ttl=64 time=0.162 ms
64 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=2 ttl=64 time=0.182 ms
64 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=3 ttl=64 time=0.183 ms
64 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=4 ttl=64 time=0.188 ms
64 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=5 ttl=64 time=0.172 ms
^C
--- 2001::7646:a0ff:fe8e:81d9 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.162/0.177/0.188/0.015 ms

3.(qemu)set_link vnet0 off

4.In guest
# cat /sys/class/net/eth0/operstate 
down

5.reboot guest

6.After guest boot up ,check guest network
# cat /sys/class/net/eth0/operstate 
down

7.(qemu)set_link vnet0 on

# cat /sys/class/net/eth0/operstate 
up

8.ping host from guest

# ping6 2001::7646:a0ff:fe8e:81d9 -c 5 -s 65507
PING 2001::7646:a0ff:fe8e:81d9(2001::7646:a0ff:fe8e:81d9) 65507 data bytes
65515 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=2 ttl=64 time=0.452 ms
65515 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=3 ttl=64 time=0.462 ms
65515 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=4 ttl=64 time=0.533 ms
65515 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=5 ttl=64 time=0.492 ms

--- 2001::7646:a0ff:fe8e:81d9 ping statistics ---
5 packets transmitted, 4 received, 20% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.452/0.484/0.533/0.041 ms


9.ping the second time.

# ping6 2001::7646:a0ff:fe8e:81d9 -c 5 -s 65507
PING 2001::7646:a0ff:fe8e:81d9(2001::7646:a0ff:fe8e:81d9) 65507 data bytes
65515 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=1 ttl=64 time=0.417 ms
65515 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=2 ttl=64 time=0.466 ms
65515 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=3 ttl=64 time=0.437 ms
65515 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=4 ttl=64 time=0.541 ms
65515 bytes from 2001::7646:a0ff:fe8e:81d9: icmp_seq=5 ttl=64 time=0.462 ms

--- 2001::7646:a0ff:fe8e:81d9 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.417/0.464/0.541/0.048 ms


Actual results:

After step 8,lost 20% packet for the first ping.

Expected results:

work well

Additional info:

Comment 3 Vlad Yasevich 2014-03-05 14:11:07 UTC
A few questions:
 1) Is the guest manually assigning IPv6 addresses or using autoconfig?
 2) Do you see the same problem with IPv4?

My first suspect right now in multicast snooping on the bridge.

-vlad

Comment 6 Ronen Hod 2014-03-11 16:00:48 UTC
Unfortunately, didn't make it into 7.0.0.

QE,
MacVTap should work as of today's kernel build, so it might be interesting to test it.

Comment 7 Qian Guo 2014-03-13 07:24:41 UTC
It is easy reproduced when just boot guest and firt ping host, then will have 20% pkg lost, the first icmp lost, so no related with set_link.

Comment 8 Qian Guo 2014-03-13 07:32:14 UTC
(In reply to Ronen Hod from comment #6)
> Unfortunately, didn't make it into 7.0.0.
> 
> QE,
> MacVTap should work as of today's kernel build, so it might be interesting
> to test it.

Test with macvtap, hit same issue:

host# ip -d link show v1
84: v1@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 500
    link/ether 16:4a:71:96:37:d9 brd ff:ff:ff:ff:ff:ff promiscuity 0 
    macvtap  mode vepa 

After guest bootup, ping guest to an external host that in same subnet with the hypervisor:

# ping 10.66.4.218 -s 65507 -c 5
PING 10.66.4.218 (10.66.4.218) 65507(65535) bytes of data.
65515 bytes from 10.66.4.218: icmp_seq=2 ttl=64 time=1.81 ms
65515 bytes from 10.66.4.218: icmp_seq=3 ttl=64 time=1.69 ms
65515 bytes from 10.66.4.218: icmp_seq=4 ttl=64 time=1.68 ms
65515 bytes from 10.66.4.218: icmp_seq=5 ttl=64 time=1.69 ms

--- 10.66.4.218 ping statistics ---
5 packets transmitted, 4 received, 20% packet loss, time 4002ms
rtt min/avg/max/mdev = 1.687/1.720/1.810/0.072 ms

Comment 9 Qian Guo 2014-04-18 08:49:24 UTC
Test this case with bare metal installed rhel7 system, hit this issue, so this bug is not a qemu-kvm bug.

And found that just the 1st icmp_seq lost, however counts I set, only lost the 1st one.

Components:
# uname -r
3.10.0-121.el7.x86_64

Steps:
1.Boot host

2.Ping any system with size 65507 and any counts

#ping 10.66.10.230 -s 65507 -c 

PING 10.66.10.230 (10.66.10.230) 65507(65535) bytes of data.
65515 bytes from 10.66.10.230: icmp_seq=2 ttl=64 time=1.65 ms
65515 bytes from 10.66.10.230: icmp_seq=3 ttl=64 time=1.59 ms
65515 bytes from 10.66.10.230: icmp_seq=4 ttl=64 time=1.58 ms
65515 bytes from 10.66.10.230: icmp_seq=5 ttl=64 time=1.55 ms
65515 bytes from 10.66.10.230: icmp_seq=6 ttl=64 time=1.58 ms
65515 bytes from 10.66.10.230: icmp_seq=7 ttl=64 time=1.71 ms
65515 bytes from 10.66.10.230: icmp_seq=8 ttl=64 time=1.37 ms
65515 bytes from 10.66.10.230: icmp_seq=9 ttl=64 time=1.72 ms
65515 bytes from 10.66.10.230: icmp_seq=10 ttl=64 time=1.54 ms
65515 bytes from 10.66.10.230: icmp_seq=11 ttl=64 time=1.71 ms
65515 bytes from 10.66.10.230: icmp_seq=12 ttl=64 time=1.51 ms
65515 bytes from 10.66.10.230: icmp_seq=13 ttl=64 time=1.71 ms
65515 bytes from 10.66.10.230: icmp_seq=14 ttl=64 time=1.55 ms
65515 bytes from 10.66.10.230: icmp_seq=15 ttl=64 time=1.68 ms
65515 bytes from 10.66.10.230: icmp_seq=16 ttl=64 time=1.50 ms
65515 bytes from 10.66.10.230: icmp_seq=17 ttl=64 time=1.36 ms
65515 bytes from 10.66.10.230: icmp_seq=18 ttl=64 time=1.75 ms
65515 bytes from 10.66.10.230: icmp_seq=19 ttl=64 time=1.65 ms
65515 bytes from 10.66.10.230: icmp_seq=20 ttl=64 time=1.50 ms

--- 10.66.10.230 ping statistics ---
20 packets transmitted, 19 received, 5% packet loss, time 19032ms
rtt min/avg/max/mdev = 1.369/1.594/1.755/0.117 ms

So according to above, this bug is not a qemu bug, I will change the component to kernel, feel free to fix me if anything wrong I made,

Thanks,

Comment 11 Jiri Pirko 2014-05-14 07:20:41 UTC
langfang, Qian Guo.

Would you be able to record the wire using tcpdump? That is believe would show us what is going wrong here.

Thanks.

Comment 13 Qian Guo 2014-05-14 07:46:01 UTC
Created attachment 895393 [details]
capture the icmp wire when ping with 65507 size

Comment 14 Jiri Pirko 2014-05-14 12:40:15 UTC
The same problem is in rhel6. Upstream fixes this in between v3.13 and v3.14. Continuing the investigation.

Comment 15 Jiri Pirko 2014-05-14 13:05:00 UTC
Created attachment 895480 [details]
tcpdump of working ping (<~40000 size)

Comment 16 Jiri Pirko 2014-05-14 13:08:08 UTC
Created attachment 895481 [details]
tcpdump of not working ping - first request frags cut (>~40000 size)

As you can see, after arp is done, the first icmp packet is with offset 25160. So it looks like first couple of frags were lost.

Comment 17 Jiri Pirko 2014-05-14 14:21:35 UTC
This is fixed somewhere in between v3.13 and v3.14

Comment 18 Jiri Pirko 2014-05-29 05:59:52 UTC
Please ignore comment 17. I just reproduced this on latest net-next kernel.

Comment 19 Jiri Pirko 2014-06-10 07:43:11 UTC
I digged into this. Found out the cause. Since neigh is unresolved at the beginning, ping fragments are put into neigh->arp_queue. However, the length of this queue is limited by arp_queue_len_bytes (/proc/sys/net/ipv4/neigh/*/unres_qlen_bytes). For more info see __neigh_event_send(). Since the fragment truesize with 1500 mtu is 2304 bytes, only 28 fragments fit in and the rest is dropped. That is consistent with what we see in tcpdump.

Long story short, this is not a bug. Feel free to adjust unres_qlen_bytes value which would allow even long pings to come through.