Bug 1321134

Summary: Missing ipv6 route mtu information
Product: Red Hat Enterprise Linux 7 Reporter: Angelo <angelo>
Component: iprouteAssignee: Phil Sutter <psutter>
Status: CLOSED WORKSFORME QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.2CC: aloughla, angelo, atragler
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-13 10:21:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
client tcpdump
none
server tcpdump
none
20160910-client.cap
none
20160910-server-rhel.cap none

Description Angelo 2016-03-24 18:23:34 UTC
Description of problem:

As part of debugging an ipv6 mtu issue, I did a clean install of RHEL72. I'm running into a bug/feature: I cannot find cached mtu's for routes on RHEL7x machines or CentOS7 machines. CentOS6x works fine.

Version-Release number of selected component (if applicable):

[root@ipv6test-server2 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@ipv6test-server2 ~]# uname -a
Linux ipv6test-server2.zoovercolo.local 3.10.0-327.10.1.el7.x86_64 #1 SMP Sat Jan 23 04:54:55 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@ipv6test-server2 ~]# rpm -qa | grep iproute
iproute-3.10.0-54.el7.x86_64

How reproducible:

Make sure you can communicate with an ipv6-enabled target, then show the route info

Steps to Reproduce:
1. ping6 www.kame.net
2. ip route get <ipv6 address from above>
3.

Actual results:

I see output like this:

[root@nmo-els-201 ~]# ip route get 2001:910:9ed:30::2
2001:910:9ed:30::2 via 2001:889:2095:303::1 dev eth0 src 2001:889:2095:303::187 metric 0
cache 

The cache entry seems empty, there's no information regarding mtu's, expiry, etc.


Expected results:

I would expect output like this, with a cache entry with mtu information. This is from a CentOS6 machine:

[root@nmo-tic-01 html]# ip route get 2001:910:9ed:30::2
2001:910:9ed:30::2 via 2001:889:2095:303::1 dev eth0 src 2001:889:2095:303::198 metric 0 
    cache expires 595sec mtu 1280 advmss 1440 hoplimit 4294967295

Additional info:

This problem I am encountering is part of me trying to find out why pmtu seems broken on our RHEL72/CentOS72 machines. The kernel does not seem to process 'packet too big' packets. See https://bugs.centos.org/view.php?id=10490

Comment 2 Phil Sutter 2016-03-31 12:00:08 UTC
Hi,

(In reply to Angelo from comment #0)
> I see output like this:
> 
> [root@nmo-els-201 ~]# ip route get 2001:910:9ed:30::2
> 2001:910:9ed:30::2 via 2001:889:2095:303::1 dev eth0 src
> 2001:889:2095:303::187 metric 0
> cache 
> 
> The cache entry seems empty, there's no information regarding mtu's, expiry,
> etc.

The 'cache' keyword being printed here seems to be a bug in RHEL7 kernels, I'll investigate this later. Though this is unrelated to the issue you are having, with an upstream kernel no cache entry is printed per see, either.

> I would expect output like this, with a cache entry with mtu information.
> This is from a CentOS6 machine:
> 
> [root@nmo-tic-01 html]# ip route get 2001:910:9ed:30::2
> 2001:910:9ed:30::2 via 2001:889:2095:303::1 dev eth0 src
> 2001:889:2095:303::198 metric 0 
>     cache expires 595sec mtu 1280 advmss 1440 hoplimit 4294967295

Like upstream, RHEL7 creates a cache entry only after a PMTU update has occurred for that destination. Testing with kernel-3.10.0-349.el7 this seems to work fine. Here's what I did:

My local RHEL7 VM is connected via a tap interface enslaved in virbr0, inside the VM the interface is called eth0. The following commands are prefixed by either 'host' for commands run locally or 'rhel' for commands run on the VM.

First the plain IPv6 connection:

host # ip addr add feed:babe::1/64 dev virbr0
rhel # ip addr add feed:babe::2/64 dev eth0

Next a network namespace on the host with smaller MTU:

host # ip netns add test
host # ip link add veth0 mtu 1300 type veth peer name veth1 netns test mtu 1300
host # ip -net test link set veth1 up
host # ip -net test addr add feed:babe:2::2/64 dev veth1
host # ip -net test -6 route add default via feed:babe:2::1
host # ip link set veth0 up
host # ip addr add feed:babe:2::1/64 dev veth0

Make the network known in the VM:

rhel # ip -6 route add feed:babe:2::/64 via feed:babe::1

Testing base connectivity first:

rhel # ping6 feed:babe:2::2
PING feed:babe:2::2(feed:babe:2::2) 56 data bytes
64 bytes from feed:babe:2::2: icmp_seq=1 ttl=63 time=0.165 ms
64 bytes from feed:babe:2::2: icmp_seq=2 ttl=63 time=0.368 ms
^C
--- feed:babe:2::2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1010ms
rtt min/avg/max/mdev = 0.165/0.266/0.368/0.102 ms

All looks normal:

rhel # ip -6 r g feed:babe:2::2
feed:babe:2::2 via feed:babe::1 dev eth0  src feed:babe::7  metric 0 
    cache 

Now trigger the PMTU update:

rhel # ping6 -s 1400 feed:babe:2::2
PING feed:babe:2::2(feed:babe:2::2) 1400 data bytes
From feed:babe::1 icmp_seq=1 Packet too big: mtu=1300
1408 bytes from feed:babe:2::2: icmp_seq=2 ttl=63 time=0.449 ms
1408 bytes from feed:babe:2::2: icmp_seq=3 ttl=63 time=0.406 ms
^C
--- feed:babe:2::2 ping statistics ---
3 packets transmitted, 2 received, +1 errors, 33% packet loss, time 2085ms
rtt min/avg/max/mdev = 0.406/0.427/0.449/0.029 ms

Note the PMTU update before the first PONG above. The routing cache then shows as expected:

rhel # ip -6 r g feed:babe:2::2
feed:babe:2::2 via feed:babe::1 dev eth0  src feed:babe::7  metric 0 
    cache  expires 592sec mtu 1300


> This problem I am encountering is part of me trying to find out why pmtu
> seems broken on our RHEL72/CentOS72 machines. The kernel does not seem to
> process 'packet too big' packets. See
> https://bugs.centos.org/view.php?id=10490

I verified the above using the same kernel you reported the issue for, namely 3.10.0-327.10.1.el7.x86_64, works as well. Can you please verify above steps on your system?

Thanks, Phil

Comment 3 Phil Sutter 2016-03-31 12:12:19 UTC
(In reply to Angelo from comment #0)
> This problem I am encountering is part of me trying to find out why pmtu
> seems broken on our RHEL72/CentOS72 machines. The kernel does not seem to
> process 'packet too big' packets. See
> https://bugs.centos.org/view.php?id=10490

I just had a look at the centos ticket and noticed that in the working case, PMTU updates are sent back to 2001:889:2095:303::198 while in the non-working case they are sent to 2001:889:2095:303::187. Is this intentional?

Cheers, Phil

Comment 4 Angelo 2016-04-16 20:58:29 UTC
Hey Phil, thanks for you reply!

In short: I can reproduce the correct behaviour on my 7 machines. So this is good to know! However, the next step is investigating why my machines do respond in this case, and not in the other. (Perhaps the 'packet too big' packets that our cisco firewalls or our provider's juniper routers send are different). I hope to get to that the upcoming days.


Longer info on what I did (mostly for myself as well).

I don't understand much about the RHEL virtualisation technology you're using and you're using some commands that are new to me, but I tried to rebuild a simple solution that I do understand, and that looks about the same as what you are doing, creating some routing.

I've got three servers (in the same L2 network and ipv4 range, for easy access).

I gave the middle server 2 nics and enabled ipv6 forwarding. I configured feed:babe:1::/64 on the left side, and feed:babe:2::/64 on the right side: 

+----------------+     +------------------------------------+  +----------------+
|     server1    |     |            server2                 |  |    server3     |
|                |     |                                    |  |                |
| ens32          +-----+ ens32               ens192         +--+ ens32          |
| feed:babe:1::2 |     | feed:babe:1::1      feed:babe:2::1 |  | feed:babe:2::2 |
+----------------+     +------------------------------------+  +----------------+

This is about as simple as it goes without tunnels, taps and virtualisation, I though this would be easy. On the left side I set the mtu to 1300 on server1 and on server2's left interface.

Output of the interfaces:

[root@ipv6test-server1 ~]# ip -6 addr 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qlen 1000
    inet6 feed:babe:1::2/64 scope site 
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe87:5188/64 scope link 
       valid_lft forever preferred_lft forever

[root@ipv6test-server2 ~]# ip -6 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 feed:babe:2::1/64 scope site 
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe87:741e/64 scope link 
       valid_lft forever preferred_lft forever
3: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qlen 1000
    inet6 feed:babe:1::1/64 scope site 
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe87:5671/64 scope link 
       valid_lft forever preferred_lft forever

[root@ipv6test-server3 ~]# ip -6 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 feed:babe:2::2/64 scope site 
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe87:227f/64 scope link 
       valid_lft forever preferred_lft forever

I added routes, enabled forwarding, disabled firewalls, etc. Basic ping works fine.

On server3 I get the route:

[root@ipv6test-server3 ~]# ip -6 r g feed:babe:1::2
feed:babe:1::2 via feed:babe:2::1 dev ens32  src feed:babe:2::2  metric 0 
    cache 

Then on server1 I ping server3:

[root@ipv6test-server1 ~]# ping6 -s 1400 feed:babe:2::2
PING feed:babe:2::2(feed:babe:2::2) 1400 data bytes
1408 bytes from feed:babe:2::2: icmp_seq=2 ttl=63 time=1.06 ms
1408 bytes from feed:babe:2::2: icmp_seq=3 ttl=63 time=0.920 ms

In tcpdump I see a packet too big packet being sent:

22:54:14.640973 IP6 feed:babe:2::2 > feed:babe:1::2: ICMP6, echo reply, seq 1, length 1408
22:54:14.641111 IP6 feed:babe:2::1 > feed:babe:2::2: ICMP6, packet too big, mtu 1300, length 1240
22:54:15.638906 IP6 feed:babe:1::2 > feed:babe:2::2: frag (0|1248) ICMP6, echo request, seq 2, length 1248
22:54:15.638941 IP6 feed:babe:1::2 > feed:babe:2::2: frag (1248|160)
22:54:15.639493 IP6 feed:babe:2::2 > feed:babe:1::2: frag (0|1248) ICMP6, echo reply, seq 2, length 1248
22:54:15.639520 IP6 feed:babe:2::2 > feed:babe:1::2: frag (1248|160)

And then I see that server3 knows the mtu to this destination should use the mtu 1300:

[root@ipv6test-server3 ~]# ip -6 r g feed:babe:1::2
feed:babe:1::2 via feed:babe:2::1 dev ens32  src feed:babe:2::2  metric 0 
    cache  expires 565sec mtu 1300

So that's cool. 

Now on to more debugging and tracing..

Comment 5 Phil Sutter 2016-04-18 11:08:44 UTC
Hi Angelo,

Glad to see things *should* be working. Your test description seems a bit odd, though: I would expect you to ping from server3 to server1, because the latter sits at the low MTU line which server3 is not supposed to know about. Maybe just a typo?

As said, please check destination IP address of packet too big messages: If your logs in that Centos ticket are correct, they are sent to the wrong destination so PMTU update won't occur.

Cheers, Phil

Comment 6 Angelo 2016-04-18 11:27:25 UTC
> Glad to see things *should* be working. Your test description seems a
> bit odd, though: I would expect you to ping from server3 to server1,
> because the latter sits at the low MTU line which server3 is not
> supposed to know about. Maybe just a typo?

You're right. The direct should not really matter though. I ping from server1 to server3, and the echo response from server3 to server1 should trigger the pmtu reply, right?

The big issues I'm trying to solve here, is that clients from allover the world that have lower mtu's than 1500, are trying to connect to my servers. My servers and network infrastructure is configured with the standard mtu of 1500. For example, at home I have sixxs tunnel with an mtu of 1280. If I go to one of the site host, http://new.asap-foundation.org, I can't connect to that website because of mtu issues. The request arrives at the server, but the response never comes back. 

> As said, please check destination IP address of packet too big
> messages: If your logs in that Centos ticket are correct, they are
> sent to the wrong destination so PMTU update won't occur.

In that centos ticket are two different servers. The CentOS6 server with ip  2001:889:2095:303::198 works fine, the CentOS7 server with ip 2001:889:2095:303::187 does not. 


My search continues..

Comment 7 Phil Sutter 2016-04-18 12:14:14 UTC
(In reply to Angelo from comment #6)
> > Glad to see things *should* be working. Your test description seems a
> > bit odd, though: I would expect you to ping from server3 to server1,
> > because the latter sits at the low MTU line which server3 is not
> > supposed to know about. Maybe just a typo?
> 
> You're right. The direct should not really matter though. I ping from
> server1 to server3, and the echo response from server3 to server1 should
> trigger the pmtu reply, right?

Uhm, still not there yet. :)

If you ping from a network with a smaller MTU to one with a bigger one, no PMTU update is expected since the packets should be fragmented locally.

PMTU is relevant if you ping into a remote network with smaller MTU (and your packet exceeds the max size, of course). In that case the boundary router (i.e. the node designated to forwarding into the small MTU network) is supposed to send that notification back so your local box knows to use smaller MTU than is possible in the local network.

> The big issues I'm trying to solve here, is that clients from allover the
> world that have lower mtu's than 1500, are trying to connect to my servers.
> My servers and network infrastructure is configured with the standard mtu of
> 1500. For example, at home I have sixxs tunnel with an mtu of 1280. If I go
> to one of the site host, http://new.asap-foundation.org, I can't connect to
> that website because of mtu issues. The request arrives at the server, but
> the response never comes back.

Uhm, for TCP things work differently as the MSS is negotiated during the handshake. Can you please provide dumps of the failing communication both from client and server side?

> > As said, please check destination IP address of packet too big
> > messages: If your logs in that Centos ticket are correct, they are
> > sent to the wrong destination so PMTU update won't occur.
> 
> In that centos ticket are two different servers. The CentOS6 server with ip 
> 2001:889:2095:303::198 works fine, the CentOS7 server with ip
> 2001:889:2095:303::187 does not.

Ah, I see.

Cheers, Phil

Comment 8 Phil Sutter 2016-04-26 16:10:33 UTC
Angelo,

May I ask what the current status of this issue is? From my point of view there is no issue with RHEL7, backed by my own tests and yours on dedicated test equipment. Can you please validate?

Thanks, Phil

Comment 9 Phil Sutter 2016-05-10 12:18:33 UTC
Correct functionality of PMTU using reporter's kernel version has been verified, cause of the issue is expected to reside either within networking infrastructure and/or incorrect test setup. Therefore closing this ticket, feel free to reopen if above assumptions prove wrong.

Comment 10 Angelo 2016-08-23 23:29:45 UTC
Created attachment 1193446 [details]
client tcpdump

Comment 11 Angelo 2016-08-23 23:32:30 UTC
Created attachment 1193447 [details]
server tcpdump

Here you see a tcpdump of the server in question. When I open it in wireshark I see a log of 'packet too big' messages and the server stubbornly retransmitting.

Comment 12 Phil Sutter 2016-09-02 16:52:24 UTC
Hi Angelo,

(In reply to Angelo from comment #11)
> Created attachment 1193447 [details]
> server tcpdump
> 
> Here you see a tcpdump of the server in question. When I open it in
> wireshark I see a log of 'packet too big' messages and the server stubbornly
> retransmitting.

Looking at the dumps, I notice a few oddities:

- During handshake, in each dump the sender submits an MSS of 1440 and receives an MSS of 1380. So this looks like there is a transparent proxy in between which mangles the MSS value. This is also backed by the fact that initial sequence numbers don't match, either.

- The server submits larger packets than the MSS value itself proposed. I wouldn't expect it to send 2834 byte frames at all, given the proposed MSS value of 1440.

- Despite the Packet Too Big messages stating a maximum MTU of 1280 bytes, the server tries to transmit packets with 1380 bytes TCP payload.

- The server's HTTP header contains 'Server: Apache/2.4.6 (CentOS)', although this ticket addresses RHEL7.


The conclusions I draw from this are:

- This looks like a server issue, it doesn't properly react upon the ICMP messages it receives.

- The server seems to be a CentOS machine, so it's out of scope for this ticket. This might also explain why I couldn't reproduce the issue you're seeing.

Can you repeat the test and paste the output of 'ip -6 r g 2001:610:600:8319:709d:264:4f0e:f6b9' on server side?

What is the exact kernel version running on the server? Is it a CentOS or RHEL7 machine?

Comment 13 Angelo 2016-09-10 09:32:59 UTC
Created attachment 1199729 [details]
20160910-client.cap

Comment 14 Angelo 2016-09-10 09:33:32 UTC
Created attachment 1199730 [details]
20160910-server-rhel.cap

Comment 15 Angelo 2016-09-10 09:34:37 UTC
Phil,

Ah, my bad, I had the same problem on both RHEL and CentOS (we're a CentOS shop mostly). I created a new RHEL machine to show the problem there.

In this example, the client has ip 2001:610:600:8319:655e:b0b6:233e:1c06 and server has ip 2001:888:2085:11::66, and I attached new captures. I still see the same behaviour. 

[root@nmt-web-107test ~]# ip -6 r g 2001:610:600:8319:655e:b0b6:233e:1c06
2001:610:600:8319:655e:b0b6:233e:1c06 via 2001:888:2085:11::1 dev eth0  src 2001:888:2085:11::66  metric 0
    cache
[root@nmt-web-107test ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@nmt-web-107test ~]# uname -a
Linux nmt-web-107test.netmatchcolo1.local 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

Thanks for your time again.

Angelo.

Comment 16 Angelo 2016-09-10 10:29:24 UTC
And I created a 7.3 beta machine, because it uses a newer kernel and I would expect more output of the mtu cache table. Same issue, looks like mtu table is not updated (did not create new captures):

06:26:25.680569 IP6 2001:610:600:319::1 > 2001:888:2085:11::67: ICMP6, packet too big, mtu 1280, length 1240
06:26:32.606439 IP6 2001:888:2085:11::67.http > 2001:610:600:8319:655e:b0b6:233e:1c06.56541: Flags [.], seq 1:1369, ack 89, win 224, options [nop,nop,TS val 330976 ecr 771752564], length 1368
06:26:32.608260 IP6 2001:610:600:319::1 > 2001:888:2085:11::67: ICMP6, packet too big, mtu 1280, length 1240
06:26:33.406444 IP6 2001:888:2085:11::67.http > 2001:610:600:8319:655e:b0b6:233e:1c06.56533: Flags [.], seq 1:1369, ack 1, win 224, options [nop,nop,TS val 331776 ecr 771749844], length 1368
06:26:33.408063 IP6 2001:610:600:319::1 > 2001:888:2085:11::67: ICMP6, packet too big, mtu 1280, length 1240
06:26:36.542472 IP6 2001:610:600:8319:655e:b0b6:233e:1c06.56532 > 2001:888:2085:11::67.http: Flags [.], ack 1116832299, win 4104, length 0
06:26:36.542521 IP6 2001:888:2085:11::67.http > 2001:610:600:8319:655e:b0b6:233e:1c06.56532: Flags [.], ack 1, win 224, options [nop,nop,TS val 334912 ecr 771710036], length 0
^C
45 packets captured
45 packets received by filter
0 packets dropped by kernel
[root@localhost asap-foundation.org]# ip -6 r g 2001:610:600:8319:655e:b0b6:233e:1c06
2001:610:600:8319:655e:b0b6:233e:1c06 via 2001:888:2085:11::1 dev eth0  src 2001:888:2085:11::67  metric 1 
[root@localhost asap-foundation.org]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.3 Beta (Maipo)
[root@localhost asap-foundation.org]# uname -a
Linux localhost.localdomain 3.10.0-493.el7.x86_64 #1 SMP Tue Aug 16 11:45:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost asap-foundation.org]#

Comment 17 Phil Sutter 2016-09-20 20:26:13 UTC
Hi Angelo,

In your new dumps it still looks like there is a TCP proxy in between, not sure how much influence that has on the PMTU mechanism though.

I'm not quite convinced the server completely ignores the PMTU updates though. In my own tests, I got fooled by TSO/GSO once; so could you try turning those off on server side? Just to make sure we really see the packet sizes as they leave the interface. This should do the trick:

# ethtool -K ethX tso off
# ethtool -K ethX gso off

Thanks, Phil

Comment 18 Phil Sutter 2017-02-13 10:21:08 UTC
I'm closing this ticket due to lack of feedback from reporter. Please feel free to reopen in case you can provide further information.

Thanks, Phil

Comment 19 Angelo 2018-08-11 09:44:20 UTC
I haven't found the time the past few years to dive into this further, and I won't have the time anytime soon. Replaying to fix the needing tag.