Bug 1624656
| Summary: | ip -6 route crashes when adding 37 nexthop routes | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jonathan Maxwell <jmaxwell> | |
| Component: | iproute | Assignee: | Phil Sutter <psutter> | |
| Status: | CLOSED ERRATA | QA Contact: | Jaroslav Aster <jaster> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 7.5 | CC: | atragler, jaster, jmaxwell, psutter, salmy, stalexan, sukulkar | |
| Target Milestone: | rc | Keywords: | ZStream | |
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | iproute-4.11.0-16.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1625358 1679911 1679996 (view as bug list) | Environment: | ||
| Last Closed: | 2019-08-06 12:54:26 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1679911, 1679996 | |||
|
Description
Jonathan Maxwell
2018-09-02 22:17:38 UTC
Getting the segment fault varies from machine to machine depending on how many nethop entries are added. The example in the description is on a machine with 4GB RAM. To generate a segfault on a lab machine: for i in `seq 600`; do nhs="nexthop via 1111::$i "$nhs done ip -6 route add 3333::/64 $nhs rta_addattr_l: Error! max allowed bound 4096 exceeded rta_addattr_l: Error! max allowed bound 4096 exceeded rta_addattr_l: Error! max allowed bound 4096 exceeded rta_addattr_l: Error! max allowed bound 4096 exceeded rta_addattr_l: Error! max allowed bound 4096 exceeded rta_addattr_l: Error! max allowed bound 4096 exceeded Segmentation fault # echo $? 139 The "ip -6 route" command should have a bounds check for the buffer I think. Fix sent upstream: https://marc.info/?l=linux-netdev&m=153608136132035&w=2 Note that this does merely prevent the segfault from happening and establishes a clean error path in that case - it is still not possible to use more than 36 nexthop statements. Please let me know if that is a requirement, I'll then follow-up to have the respective buffers increased. Ah, since we didn't update iproute package in RHEL7.6, aiming at RHEL7.7 for this one as well. Z-stream backporting shouldn't be a problem, though. I had to respin the patch anyway, but while doing so increased the buffers as well. In a simple case, I could add over 140 nexthops. Though please note that any other parameter reduces the buffer space left for nexthops, e.g. 'src <addr>'. Though since those shouldn't appear in such large numbers, I guess the customer should be able to add up to 128 nexthops in about any case. https://marc.info/?l=linux-netdev&m=153624072816713&w=2 (In reply to Phil Sutter from comment #7) > I had to respin the patch anyway, but while doing so increased the buffers > as well. In a simple case, I could add over 140 nexthops. Though please note > that any other parameter reduces the buffer space left for nexthops, e.g. > 'src <addr>'. Though since those shouldn't appear in such large numbers, I > guess the customer should be able to add up to 128 nexthops in about any > case. > > https://marc.info/?l=linux-netdev&m=153624072816713&w=2 That is awesome thanks Phil. If any one wants to add more that ~140 nexthop entries they can use "ip -6 route append". jaroslav, please consider this ticket for qa_ack+, thanks! Patches to backport:
commit bd59e5b1517b09b6f26d59f38fe6077d953c2396
Author: Phil Sutter <phil>
Date: Thu Sep 6 15:31:51 2018 +0200
ip-route: Fix segfault with many nexthops
It was possible to crash ip-route by adding an IPv6 route with 37
nexthop statements. A simple reproducer is:
| for i in `seq 37`; do
| nhs="nexthop via 1111::$i "$nhs
| done
| ip -6 route add 3333::/64 $nhs
The related code was broken in multiple ways:
* parse_one_nh() assumed that rta points to 4kB of storage but caller
provided just 1kB. Fixed by passing 'len' parameter with the correct
value.
* Error checking of rta_addattr*() calls in parse_one_nh() and called
functions was completely absent, so with above fix in place output
flood would occur due to parser looping forever.
While being at it, increase message buffer sizes to 4k. This allows for
at most 144 nexthops.
Signed-off-by: Phil Sutter <phil>
Signed-off-by: Stephen Hemminger <stephen>
commit e5da392ff8e3979b86cad04b238ffbbc8076e005
Author: Phil Sutter <phil>
Date: Thu Oct 18 14:30:31 2018 +0200
ip-route: Fix for memleak in error path
If call to rta_addattr_l() failed, parse_encap_seg6() would leak memory.
Fix this by making sure calls to free() are not skipped.
Fixes: bd59e5b1517b0 ("ip-route: Fix segfault with many nexthops")
Signed-off-by: Phil Sutter <phil>
Signed-off-by: Stephen Hemminger <stephen>
commit 6cd959bb125c50a04ab6671645fa38c5b07426f4
Author: Phil Sutter <phil>
Date: Tue Nov 13 16:55:13 2018 +0100
man: ip-route.8: Document nexthop limit
Add a note to 'nexthop' description stating the maximum number of
nexthops per command and pointing at 'append' command as a workaround.
Signed-off-by: Phil Sutter <phil>
Signed-off-by: Stephen Hemminger <stephen>
commit 05d978e0850a6a3bae1e6c5392d82f7b1496f86a
Author: Phil Sutter <phil>
Date: Tue Nov 13 13:39:04 2018 +0100
ip-route: Fix nexthop encap parsing
When parsing nexthop parameters, a buffer of 4k bytes is provided. Yet,
in lwt_parse_encap() and some functions called by it, buffer size was
assumed to be 1k despite the actual size was provided. This led to
spurious buffer size errors if the buffer was filled by previous nexthop
parameters to exceed that 1k boundary.
Fixes: 1e5293056a02c ("lwtunnel: Add encapsulation support to ip route")
Fixes: 5866bddd9aa9e ("ila: Add support for ILA lwtunnels")
Fixes: ed67f83806538 ("ila: Support for checksum neutral translation")
Fixes: 86905c8f057c0 ("ila: support for configuring identifier and hook types")
Fixes: b15f440e78373 ("lwt: BPF support for LWT")
Signed-off-by: Phil Sutter <phil>
Signed-off-by: Stephen Hemminger <stephen>
Hi Phil,
sometimes I see 'netlink allocation error', when I try to add route with a lot of nexthops. Do you have any idea where could be a problem? My test uses dummy interface.
{ ip -6 route add 3004::/64 nexthop via 1111::142 nexthop via 1111::141 nexthop via 1111::140 nexthop via 1111::139 nexthop via 1111::138 nexthop via 1111::137 nexthop via 1111::136 nexthop via 1111::135 nexthop via 1111::134 nexthop via 1111::133 nexthop via 1111::132 nexthop via 1111::131 nexthop via 1111::130 nexthop via 1111::129 nexthop via 1111::128 nexthop via 1111::127 nexthop via 1111::126 nexthop via 1111::125 nexthop via 1111::124 nexthop via 1111::123 nexthop via 1111::122 nexthop via 1111::121 nexthop via 1111::120 nexthop via 1111::119 nexthop via 1111::118 nexthop via 1111::117 nexthop via 1111::116 nexthop via 1111::115 nexthop via 1111::114 nexthop via 1111::113 nexthop via 1111::112 nexthop via 1111::111 nexthop via 1111::110 nexthop via 1111::109 nexthop via 1111::108 nexthop via 1111::107 nexthop via 1111::106 nexthop via 1111::105 nexthop via 1111::104 nexthop via 1111::103 nexthop via 1111::102 nexthop via 1111::101 nexthop via 1111::100 nexthop via 1111::99 nexthop via 1111::98 nexthop via 1111::97 nexthop via 1111::96 nexthop via 1111::95 nexthop via 1111::94 nexthop via 1111::93 nexthop via 1111::92 nexthop via 1111::91 nexthop via 1111::90 nexthop via 1111::89 nexthop via 1111::88 nexthop via 1111::87 nexthop via 1111::86 nexthop via 1111::85 nexthop via 1111::84 nexthop via 1111::83 nexthop via 1111::82 nexthop via 1111::81 nexthop via 1111::80 nexthop via 1111::79 nexthop via 1111::78 nexthop via 1111::77 nexthop via 1111::76 nexthop via 1111::75 nexthop via 1111::74 nexthop via 1111::73 nexthop via 1111::72 nexthop via 1111::71 nexthop via 1111::70 nexthop via 1111::69 nexthop via 1111::68 nexthop via 1111::67 nexthop via 1111::66 nexthop via 1111::65 nexthop via 1111::64 nexthop via 1111::63 nexthop via 1111::62 nexthop via 1111::61 nexthop via 1111::60 nexthop via 1111::59 nexthop via 1111::58 nexthop via 1111::57 nexthop via 1111::56 nexthop via 1111::55 nexthop via 1111::54 nexthop via 1111::53 nexthop via 1111::52 nexthop via 1111::51 nexthop via 1111::50 nexthop via 1111::49 nexthop via 1111::48 nexthop via 1111::47 nexthop via 1111::46 nexthop via 1111::45 nexthop via 1111::44 nexthop via 1111::43 nexthop via 1111::42 nexthop via 1111::41 nexthop via 1111::40 nexthop via 1111::39 nexthop via 1111::38 nexthop via 1111::37 nexthop via 1111::36 nexthop via 1111::35 nexthop via 1111::34 nexthop via 1111::33 nexthop via 1111::32 nexthop via 1111::31 nexthop via 1111::30 nexthop via 1111::29 nexthop via 1111::28 nexthop via 1111::27 nexthop via 1111::26 nexthop via 1111::25 nexthop via 1111::24 nexthop via 1111::23 nexthop via 1111::22 nexthop via 1111::21 nexthop via 1111::20 nexthop via 1111::19 nexthop via 1111::18 nexthop via 1111::17 nexthop via 1111::16 nexthop via 1111::15 nexthop via 1111::14 nexthop via 1111::13 nexthop via 1111::12 nexthop via 1111::11 nexthop via 1111::10 nexthop via 1111::9 nexthop via 1111::8 nexthop via 1111::7 nexthop via 1111::6 nexthop via 1111::5 nexthop via 1111::4 nexthop via 1111::3 nexthop via 1111::2 nexthop via 1111::1 nexthop via 1111::143 encap ip id 42 dst 10.0.0.2; } |& tee /tmp/tmp.GW3v7e64lv'
RTNETLINK answers: Cannot allocate memory
I saw it on rhel-7.5 and 7.6 and all architectures, but not always.
Hi Jaroslav, (In reply to Jaroslav Aster from comment #26) > Hi Phil, > > sometimes I see 'netlink allocation error', when I try to add route with a > lot of nexthops. Do you have any idea where could be a problem? My test uses > dummy interface. > > { ip -6 route add 3004::/64 nexthop via 1111::142 nexthop via 1111::141 > nexthop via 1111::140 nexthop via 1111::139 nexthop via 1111::138 nexthop > via 1111::137 nexthop via 1111::136 nexthop via 1111::135 nexthop via > 1111::134 nexthop via 1111::133 nexthop via 1111::132 nexthop via 1111::131 > nexthop via 1111::130 nexthop via 1111::129 nexthop via 1111::128 nexthop > via 1111::127 nexthop via 1111::126 nexthop via 1111::125 nexthop via > 1111::124 nexthop via 1111::123 nexthop via 1111::122 nexthop via 1111::121 > nexthop via 1111::120 nexthop via 1111::119 nexthop via 1111::118 nexthop > via 1111::117 nexthop via 1111::116 nexthop via 1111::115 nexthop via > 1111::114 nexthop via 1111::113 nexthop via 1111::112 nexthop via 1111::111 > nexthop via 1111::110 nexthop via 1111::109 nexthop via 1111::108 nexthop > via 1111::107 nexthop via 1111::106 nexthop via 1111::105 nexthop via > 1111::104 nexthop via 1111::103 nexthop via 1111::102 nexthop via 1111::101 > nexthop via 1111::100 nexthop via 1111::99 nexthop via 1111::98 nexthop via > 1111::97 nexthop via 1111::96 nexthop via 1111::95 nexthop via 1111::94 > nexthop via 1111::93 nexthop via 1111::92 nexthop via 1111::91 nexthop via > 1111::90 nexthop via 1111::89 nexthop via 1111::88 nexthop via 1111::87 > nexthop via 1111::86 nexthop via 1111::85 nexthop via 1111::84 nexthop via > 1111::83 nexthop via 1111::82 nexthop via 1111::81 nexthop via 1111::80 > nexthop via 1111::79 nexthop via 1111::78 nexthop via 1111::77 nexthop via > 1111::76 nexthop via 1111::75 nexthop via 1111::74 nexthop via 1111::73 > nexthop via 1111::72 nexthop via 1111::71 nexthop via 1111::70 nexthop via > 1111::69 nexthop via 1111::68 nexthop via 1111::67 nexthop via 1111::66 > nexthop via 1111::65 nexthop via 1111::64 nexthop via 1111::63 nexthop via > 1111::62 nexthop via 1111::61 nexthop via 1111::60 nexthop via 1111::59 > nexthop via 1111::58 nexthop via 1111::57 nexthop via 1111::56 nexthop via > 1111::55 nexthop via 1111::54 nexthop via 1111::53 nexthop via 1111::52 > nexthop via 1111::51 nexthop via 1111::50 nexthop via 1111::49 nexthop via > 1111::48 nexthop via 1111::47 nexthop via 1111::46 nexthop via 1111::45 > nexthop via 1111::44 nexthop via 1111::43 nexthop via 1111::42 nexthop via > 1111::41 nexthop via 1111::40 nexthop via 1111::39 nexthop via 1111::38 > nexthop via 1111::37 nexthop via 1111::36 nexthop via 1111::35 nexthop via > 1111::34 nexthop via 1111::33 nexthop via 1111::32 nexthop via 1111::31 > nexthop via 1111::30 nexthop via 1111::29 nexthop via 1111::28 nexthop via > 1111::27 nexthop via 1111::26 nexthop via 1111::25 nexthop via 1111::24 > nexthop via 1111::23 nexthop via 1111::22 nexthop via 1111::21 nexthop via > 1111::20 nexthop via 1111::19 nexthop via 1111::18 nexthop via 1111::17 > nexthop via 1111::16 nexthop via 1111::15 nexthop via 1111::14 nexthop via > 1111::13 nexthop via 1111::12 nexthop via 1111::11 nexthop via 1111::10 > nexthop via 1111::9 nexthop via 1111::8 nexthop via 1111::7 nexthop via > 1111::6 nexthop via 1111::5 nexthop via 1111::4 nexthop via 1111::3 nexthop > via 1111::2 nexthop via 1111::1 nexthop via 1111::143 encap ip id 42 dst > 10.0.0.2; } |& tee /tmp/tmp.GW3v7e64lv' > RTNETLINK answers: Cannot allocate memory These "RTNETLINK answers:" messages report error status in kernel. I don't know where it comes from, but it seems not directly related to iproute. > I saw it on rhel-7.5 and 7.6 and all architectures, but not always. Do you have any statistics how often it happens? Does it only happen with a certain amount of nexthop arguments? If so, what is the minimum nexthop count at which results become unstable? Thanks, Phil It happens everytime I run the test, but not always in the same command, but always in ipv6 part of the test. I do not know where is the limit, because I test the maximum possible nexthops for this bug. It happens only in beaker. I'm not able to reproduce it on in 1minutetip. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2131 |