| Summary: | Missing ipv6 route mtu information | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Angelo <angelo> | ||||||||||
| Component: | iproute | Assignee: | Phil Sutter <psutter> | ||||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | BaseOS QE Security Team <qe-baseos-security> | ||||||||||
| Severity: | urgent | Docs Contact: | |||||||||||
| Priority: | urgent | ||||||||||||
| Version: | 7.2 | CC: | aloughla, angelo, atragler | ||||||||||
| Target Milestone: | rc | Keywords: | Reopened | ||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | x86_64 | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2017-02-13 10:21:08 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Attachments: |
|
||||||||||||
Hi, (In reply to Angelo from comment #0) > I see output like this: > > [root@nmo-els-201 ~]# ip route get 2001:910:9ed:30::2 > 2001:910:9ed:30::2 via 2001:889:2095:303::1 dev eth0 src > 2001:889:2095:303::187 metric 0 > cache > > The cache entry seems empty, there's no information regarding mtu's, expiry, > etc. The 'cache' keyword being printed here seems to be a bug in RHEL7 kernels, I'll investigate this later. Though this is unrelated to the issue you are having, with an upstream kernel no cache entry is printed per see, either. > I would expect output like this, with a cache entry with mtu information. > This is from a CentOS6 machine: > > [root@nmo-tic-01 html]# ip route get 2001:910:9ed:30::2 > 2001:910:9ed:30::2 via 2001:889:2095:303::1 dev eth0 src > 2001:889:2095:303::198 metric 0 > cache expires 595sec mtu 1280 advmss 1440 hoplimit 4294967295 Like upstream, RHEL7 creates a cache entry only after a PMTU update has occurred for that destination. Testing with kernel-3.10.0-349.el7 this seems to work fine. Here's what I did: My local RHEL7 VM is connected via a tap interface enslaved in virbr0, inside the VM the interface is called eth0. The following commands are prefixed by either 'host' for commands run locally or 'rhel' for commands run on the VM. First the plain IPv6 connection: host # ip addr add feed:babe::1/64 dev virbr0 rhel # ip addr add feed:babe::2/64 dev eth0 Next a network namespace on the host with smaller MTU: host # ip netns add test host # ip link add veth0 mtu 1300 type veth peer name veth1 netns test mtu 1300 host # ip -net test link set veth1 up host # ip -net test addr add feed:babe:2::2/64 dev veth1 host # ip -net test -6 route add default via feed:babe:2::1 host # ip link set veth0 up host # ip addr add feed:babe:2::1/64 dev veth0 Make the network known in the VM: rhel # ip -6 route add feed:babe:2::/64 via feed:babe::1 Testing base connectivity first: rhel # ping6 feed:babe:2::2 PING feed:babe:2::2(feed:babe:2::2) 56 data bytes 64 bytes from feed:babe:2::2: icmp_seq=1 ttl=63 time=0.165 ms 64 bytes from feed:babe:2::2: icmp_seq=2 ttl=63 time=0.368 ms ^C --- feed:babe:2::2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1010ms rtt min/avg/max/mdev = 0.165/0.266/0.368/0.102 ms All looks normal: rhel # ip -6 r g feed:babe:2::2 feed:babe:2::2 via feed:babe::1 dev eth0 src feed:babe::7 metric 0 cache Now trigger the PMTU update: rhel # ping6 -s 1400 feed:babe:2::2 PING feed:babe:2::2(feed:babe:2::2) 1400 data bytes From feed:babe::1 icmp_seq=1 Packet too big: mtu=1300 1408 bytes from feed:babe:2::2: icmp_seq=2 ttl=63 time=0.449 ms 1408 bytes from feed:babe:2::2: icmp_seq=3 ttl=63 time=0.406 ms ^C --- feed:babe:2::2 ping statistics --- 3 packets transmitted, 2 received, +1 errors, 33% packet loss, time 2085ms rtt min/avg/max/mdev = 0.406/0.427/0.449/0.029 ms Note the PMTU update before the first PONG above. The routing cache then shows as expected: rhel # ip -6 r g feed:babe:2::2 feed:babe:2::2 via feed:babe::1 dev eth0 src feed:babe::7 metric 0 cache expires 592sec mtu 1300 > This problem I am encountering is part of me trying to find out why pmtu > seems broken on our RHEL72/CentOS72 machines. The kernel does not seem to > process 'packet too big' packets. See > https://bugs.centos.org/view.php?id=10490 I verified the above using the same kernel you reported the issue for, namely 3.10.0-327.10.1.el7.x86_64, works as well. Can you please verify above steps on your system? Thanks, Phil (In reply to Angelo from comment #0) > This problem I am encountering is part of me trying to find out why pmtu > seems broken on our RHEL72/CentOS72 machines. The kernel does not seem to > process 'packet too big' packets. See > https://bugs.centos.org/view.php?id=10490 I just had a look at the centos ticket and noticed that in the working case, PMTU updates are sent back to 2001:889:2095:303::198 while in the non-working case they are sent to 2001:889:2095:303::187. Is this intentional? Cheers, Phil Hey Phil, thanks for you reply!
In short: I can reproduce the correct behaviour on my 7 machines. So this is good to know! However, the next step is investigating why my machines do respond in this case, and not in the other. (Perhaps the 'packet too big' packets that our cisco firewalls or our provider's juniper routers send are different). I hope to get to that the upcoming days.
Longer info on what I did (mostly for myself as well).
I don't understand much about the RHEL virtualisation technology you're using and you're using some commands that are new to me, but I tried to rebuild a simple solution that I do understand, and that looks about the same as what you are doing, creating some routing.
I've got three servers (in the same L2 network and ipv4 range, for easy access).
I gave the middle server 2 nics and enabled ipv6 forwarding. I configured feed:babe:1::/64 on the left side, and feed:babe:2::/64 on the right side:
+----------------+ +------------------------------------+ +----------------+
| server1 | | server2 | | server3 |
| | | | | |
| ens32 +-----+ ens32 ens192 +--+ ens32 |
| feed:babe:1::2 | | feed:babe:1::1 feed:babe:2::1 | | feed:babe:2::2 |
+----------------+ +------------------------------------+ +----------------+
This is about as simple as it goes without tunnels, taps and virtualisation, I though this would be easy. On the left side I set the mtu to 1300 on server1 and on server2's left interface.
Output of the interfaces:
[root@ipv6test-server1 ~]# ip -6 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qlen 1000
inet6 feed:babe:1::2/64 scope site
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe87:5188/64 scope link
valid_lft forever preferred_lft forever
[root@ipv6test-server2 ~]# ip -6 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 feed:babe:2::1/64 scope site
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe87:741e/64 scope link
valid_lft forever preferred_lft forever
3: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qlen 1000
inet6 feed:babe:1::1/64 scope site
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe87:5671/64 scope link
valid_lft forever preferred_lft forever
[root@ipv6test-server3 ~]# ip -6 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 feed:babe:2::2/64 scope site
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe87:227f/64 scope link
valid_lft forever preferred_lft forever
I added routes, enabled forwarding, disabled firewalls, etc. Basic ping works fine.
On server3 I get the route:
[root@ipv6test-server3 ~]# ip -6 r g feed:babe:1::2
feed:babe:1::2 via feed:babe:2::1 dev ens32 src feed:babe:2::2 metric 0
cache
Then on server1 I ping server3:
[root@ipv6test-server1 ~]# ping6 -s 1400 feed:babe:2::2
PING feed:babe:2::2(feed:babe:2::2) 1400 data bytes
1408 bytes from feed:babe:2::2: icmp_seq=2 ttl=63 time=1.06 ms
1408 bytes from feed:babe:2::2: icmp_seq=3 ttl=63 time=0.920 ms
In tcpdump I see a packet too big packet being sent:
22:54:14.640973 IP6 feed:babe:2::2 > feed:babe:1::2: ICMP6, echo reply, seq 1, length 1408
22:54:14.641111 IP6 feed:babe:2::1 > feed:babe:2::2: ICMP6, packet too big, mtu 1300, length 1240
22:54:15.638906 IP6 feed:babe:1::2 > feed:babe:2::2: frag (0|1248) ICMP6, echo request, seq 2, length 1248
22:54:15.638941 IP6 feed:babe:1::2 > feed:babe:2::2: frag (1248|160)
22:54:15.639493 IP6 feed:babe:2::2 > feed:babe:1::2: frag (0|1248) ICMP6, echo reply, seq 2, length 1248
22:54:15.639520 IP6 feed:babe:2::2 > feed:babe:1::2: frag (1248|160)
And then I see that server3 knows the mtu to this destination should use the mtu 1300:
[root@ipv6test-server3 ~]# ip -6 r g feed:babe:1::2
feed:babe:1::2 via feed:babe:2::1 dev ens32 src feed:babe:2::2 metric 0
cache expires 565sec mtu 1300
So that's cool.
Now on to more debugging and tracing..
Hi Angelo, Glad to see things *should* be working. Your test description seems a bit odd, though: I would expect you to ping from server3 to server1, because the latter sits at the low MTU line which server3 is not supposed to know about. Maybe just a typo? As said, please check destination IP address of packet too big messages: If your logs in that Centos ticket are correct, they are sent to the wrong destination so PMTU update won't occur. Cheers, Phil > Glad to see things *should* be working. Your test description seems a > bit odd, though: I would expect you to ping from server3 to server1, > because the latter sits at the low MTU line which server3 is not > supposed to know about. Maybe just a typo? You're right. The direct should not really matter though. I ping from server1 to server3, and the echo response from server3 to server1 should trigger the pmtu reply, right? The big issues I'm trying to solve here, is that clients from allover the world that have lower mtu's than 1500, are trying to connect to my servers. My servers and network infrastructure is configured with the standard mtu of 1500. For example, at home I have sixxs tunnel with an mtu of 1280. If I go to one of the site host, http://new.asap-foundation.org, I can't connect to that website because of mtu issues. The request arrives at the server, but the response never comes back. > As said, please check destination IP address of packet too big > messages: If your logs in that Centos ticket are correct, they are > sent to the wrong destination so PMTU update won't occur. In that centos ticket are two different servers. The CentOS6 server with ip 2001:889:2095:303::198 works fine, the CentOS7 server with ip 2001:889:2095:303::187 does not. My search continues.. (In reply to Angelo from comment #6) > > Glad to see things *should* be working. Your test description seems a > > bit odd, though: I would expect you to ping from server3 to server1, > > because the latter sits at the low MTU line which server3 is not > > supposed to know about. Maybe just a typo? > > You're right. The direct should not really matter though. I ping from > server1 to server3, and the echo response from server3 to server1 should > trigger the pmtu reply, right? Uhm, still not there yet. :) If you ping from a network with a smaller MTU to one with a bigger one, no PMTU update is expected since the packets should be fragmented locally. PMTU is relevant if you ping into a remote network with smaller MTU (and your packet exceeds the max size, of course). In that case the boundary router (i.e. the node designated to forwarding into the small MTU network) is supposed to send that notification back so your local box knows to use smaller MTU than is possible in the local network. > The big issues I'm trying to solve here, is that clients from allover the > world that have lower mtu's than 1500, are trying to connect to my servers. > My servers and network infrastructure is configured with the standard mtu of > 1500. For example, at home I have sixxs tunnel with an mtu of 1280. If I go > to one of the site host, http://new.asap-foundation.org, I can't connect to > that website because of mtu issues. The request arrives at the server, but > the response never comes back. Uhm, for TCP things work differently as the MSS is negotiated during the handshake. Can you please provide dumps of the failing communication both from client and server side? > > As said, please check destination IP address of packet too big > > messages: If your logs in that Centos ticket are correct, they are > > sent to the wrong destination so PMTU update won't occur. > > In that centos ticket are two different servers. The CentOS6 server with ip > 2001:889:2095:303::198 works fine, the CentOS7 server with ip > 2001:889:2095:303::187 does not. Ah, I see. Cheers, Phil Angelo, May I ask what the current status of this issue is? From my point of view there is no issue with RHEL7, backed by my own tests and yours on dedicated test equipment. Can you please validate? Thanks, Phil Correct functionality of PMTU using reporter's kernel version has been verified, cause of the issue is expected to reside either within networking infrastructure and/or incorrect test setup. Therefore closing this ticket, feel free to reopen if above assumptions prove wrong. Created attachment 1193446 [details]
client tcpdump
Created attachment 1193447 [details]
server tcpdump
Here you see a tcpdump of the server in question. When I open it in wireshark I see a log of 'packet too big' messages and the server stubbornly retransmitting.
Hi Angelo, (In reply to Angelo from comment #11) > Created attachment 1193447 [details] > server tcpdump > > Here you see a tcpdump of the server in question. When I open it in > wireshark I see a log of 'packet too big' messages and the server stubbornly > retransmitting. Looking at the dumps, I notice a few oddities: - During handshake, in each dump the sender submits an MSS of 1440 and receives an MSS of 1380. So this looks like there is a transparent proxy in between which mangles the MSS value. This is also backed by the fact that initial sequence numbers don't match, either. - The server submits larger packets than the MSS value itself proposed. I wouldn't expect it to send 2834 byte frames at all, given the proposed MSS value of 1440. - Despite the Packet Too Big messages stating a maximum MTU of 1280 bytes, the server tries to transmit packets with 1380 bytes TCP payload. - The server's HTTP header contains 'Server: Apache/2.4.6 (CentOS)', although this ticket addresses RHEL7. The conclusions I draw from this are: - This looks like a server issue, it doesn't properly react upon the ICMP messages it receives. - The server seems to be a CentOS machine, so it's out of scope for this ticket. This might also explain why I couldn't reproduce the issue you're seeing. Can you repeat the test and paste the output of 'ip -6 r g 2001:610:600:8319:709d:264:4f0e:f6b9' on server side? What is the exact kernel version running on the server? Is it a CentOS or RHEL7 machine? Created attachment 1199729 [details]
20160910-client.cap
Created attachment 1199730 [details]
20160910-server-rhel.cap
Phil,
Ah, my bad, I had the same problem on both RHEL and CentOS (we're a CentOS shop mostly). I created a new RHEL machine to show the problem there.
In this example, the client has ip 2001:610:600:8319:655e:b0b6:233e:1c06 and server has ip 2001:888:2085:11::66, and I attached new captures. I still see the same behaviour.
[root@nmt-web-107test ~]# ip -6 r g 2001:610:600:8319:655e:b0b6:233e:1c06
2001:610:600:8319:655e:b0b6:233e:1c06 via 2001:888:2085:11::1 dev eth0 src 2001:888:2085:11::66 metric 0
cache
[root@nmt-web-107test ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@nmt-web-107test ~]# uname -a
Linux nmt-web-107test.netmatchcolo1.local 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Thanks for your time again.
Angelo.
And I created a 7.3 beta machine, because it uses a newer kernel and I would expect more output of the mtu cache table. Same issue, looks like mtu table is not updated (did not create new captures): 06:26:25.680569 IP6 2001:610:600:319::1 > 2001:888:2085:11::67: ICMP6, packet too big, mtu 1280, length 1240 06:26:32.606439 IP6 2001:888:2085:11::67.http > 2001:610:600:8319:655e:b0b6:233e:1c06.56541: Flags [.], seq 1:1369, ack 89, win 224, options [nop,nop,TS val 330976 ecr 771752564], length 1368 06:26:32.608260 IP6 2001:610:600:319::1 > 2001:888:2085:11::67: ICMP6, packet too big, mtu 1280, length 1240 06:26:33.406444 IP6 2001:888:2085:11::67.http > 2001:610:600:8319:655e:b0b6:233e:1c06.56533: Flags [.], seq 1:1369, ack 1, win 224, options [nop,nop,TS val 331776 ecr 771749844], length 1368 06:26:33.408063 IP6 2001:610:600:319::1 > 2001:888:2085:11::67: ICMP6, packet too big, mtu 1280, length 1240 06:26:36.542472 IP6 2001:610:600:8319:655e:b0b6:233e:1c06.56532 > 2001:888:2085:11::67.http: Flags [.], ack 1116832299, win 4104, length 0 06:26:36.542521 IP6 2001:888:2085:11::67.http > 2001:610:600:8319:655e:b0b6:233e:1c06.56532: Flags [.], ack 1, win 224, options [nop,nop,TS val 334912 ecr 771710036], length 0 ^C 45 packets captured 45 packets received by filter 0 packets dropped by kernel [root@localhost asap-foundation.org]# ip -6 r g 2001:610:600:8319:655e:b0b6:233e:1c06 2001:610:600:8319:655e:b0b6:233e:1c06 via 2001:888:2085:11::1 dev eth0 src 2001:888:2085:11::67 metric 1 [root@localhost asap-foundation.org]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.3 Beta (Maipo) [root@localhost asap-foundation.org]# uname -a Linux localhost.localdomain 3.10.0-493.el7.x86_64 #1 SMP Tue Aug 16 11:45:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux [root@localhost asap-foundation.org]# Hi Angelo, In your new dumps it still looks like there is a TCP proxy in between, not sure how much influence that has on the PMTU mechanism though. I'm not quite convinced the server completely ignores the PMTU updates though. In my own tests, I got fooled by TSO/GSO once; so could you try turning those off on server side? Just to make sure we really see the packet sizes as they leave the interface. This should do the trick: # ethtool -K ethX tso off # ethtool -K ethX gso off Thanks, Phil I'm closing this ticket due to lack of feedback from reporter. Please feel free to reopen in case you can provide further information. Thanks, Phil I haven't found the time the past few years to dive into this further, and I won't have the time anytime soon. Replaying to fix the needing tag. |
Description of problem: As part of debugging an ipv6 mtu issue, I did a clean install of RHEL72. I'm running into a bug/feature: I cannot find cached mtu's for routes on RHEL7x machines or CentOS7 machines. CentOS6x works fine. Version-Release number of selected component (if applicable): [root@ipv6test-server2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) [root@ipv6test-server2 ~]# uname -a Linux ipv6test-server2.zoovercolo.local 3.10.0-327.10.1.el7.x86_64 #1 SMP Sat Jan 23 04:54:55 EST 2016 x86_64 x86_64 x86_64 GNU/Linux [root@ipv6test-server2 ~]# rpm -qa | grep iproute iproute-3.10.0-54.el7.x86_64 How reproducible: Make sure you can communicate with an ipv6-enabled target, then show the route info Steps to Reproduce: 1. ping6 www.kame.net 2. ip route get <ipv6 address from above> 3. Actual results: I see output like this: [root@nmo-els-201 ~]# ip route get 2001:910:9ed:30::2 2001:910:9ed:30::2 via 2001:889:2095:303::1 dev eth0 src 2001:889:2095:303::187 metric 0 cache The cache entry seems empty, there's no information regarding mtu's, expiry, etc. Expected results: I would expect output like this, with a cache entry with mtu information. This is from a CentOS6 machine: [root@nmo-tic-01 html]# ip route get 2001:910:9ed:30::2 2001:910:9ed:30::2 via 2001:889:2095:303::1 dev eth0 src 2001:889:2095:303::198 metric 0 cache expires 595sec mtu 1280 advmss 1440 hoplimit 4294967295 Additional info: This problem I am encountering is part of me trying to find out why pmtu seems broken on our RHEL72/CentOS72 machines. The kernel does not seem to process 'packet too big' packets. See https://bugs.centos.org/view.php?id=10490