Bug 1083153
| Summary: | Configuring network device that has been dhcp activated in initramfs in anaconda text mode tracebacks. | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Radek Vykydal <rvykydal> | ||||||||||||
| Component: | NetworkManager | Assignee: | Dan Williams <dcbw> | ||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Desktop QE <desktop-qa-list> | ||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | 7.0 | CC: | danw, dcbw, dracut-maint-list, harald, jklimes, jstodola, kdube, ljozsa, lsmid, rvykydal, thaller, tpelka, trondham, vbenes | ||||||||||||
| Target Milestone: | rc | ||||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | dracut-033-160.el7 / NetworkManager-0.9.9.1-13.git20140326.4dba720.el7 | Doc Type: | Bug Fix | ||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2014-06-13 09:57:22 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Bug Depends On: | |||||||||||||||
| Bug Blocks: | 782468, 1086237, 1086812 | ||||||||||||||
| Attachments: |
|
||||||||||||||
The cause is NM not matching ifcfg file created in initramfs with connection from initramfs it is trying to take over. The connection from initramfs is ipv6 autoconfigured (ipv6 method "auto" is detected by NM) while the ifcfg file does not contain IPV6INIT=yes (which means ipv6 method "ignore"). Harald, is ipv6 autoconfiguration of iface when using ip=<iface>:dhcp expected? Created attachment 881385 [details]
proposed patch
This patch should fix the issue (if ipv6 autoconfiguration in initramfs is expected). Hopefully it wouldn't break another cases.
(In reply to Radek Vykydal from comment #1) > The cause is NM not matching ifcfg file created in initramfs with connection > from initramfs it is trying to take over. The connection from initramfs is > ipv6 autoconfigured (ipv6 method "auto" is detected by NM) while the ifcfg > file does not contain IPV6INIT=yes (which means ipv6 method "ignore"). > > Harald, is ipv6 autoconfiguration of iface when using ip=<iface>:dhcp > expected? no it isn't. (In reply to Radek Vykydal from comment #1) > The cause is NM not matching ifcfg file created in initramfs with connection > from initramfs it is trying to take over. The connection from initramfs is > ipv6 autoconfigured (ipv6 method "auto" is detected by NM) while the ifcfg > file does not contain IPV6INIT=yes (which means ipv6 method "ignore"). > > Harald, is ipv6 autoconfiguration of iface when using ip=<iface>:dhcp > expected? What is the auto configuration? # ip -6 addr | fgrep dynamic -q && echo 'ipv6 autoconfiguration was turned on' && ip -6 addr Created attachment 881662 [details]
journalctl -a with rd.debug on
Output of ip -6 addr in pre-pivot:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp63s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2620:52:0:2266:f6ce:46ff:fe2c:447a/64 scope global dynamic
valid_lft 2591860sec preferred_lft 604660sec
inet6 fe80::f6ce:46ff:fe2c:447a/64 scope link
valid_lft forever preferred_lft forever
(In reply to Radek Vykydal from comment #6) > Created attachment 881662 [details] > journalctl -a with rd.debug on > > Output of ip -6 addr in pre-pivot: > > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 > inet6 ::1/128 scope host > valid_lft forever preferred_lft forever > 2: enp63s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 > inet6 2620:52:0:2266:f6ce:46ff:fe2c:447a/64 scope global dynamic > valid_lft 2591860sec preferred_lft 604660sec > inet6 fe80::f6ce:46ff:fe2c:447a/64 scope link > valid_lft forever preferred_lft forever dracut did not touch any ipv6 configuration. systemd has loaded the ipv6 kernel module and sysctl ran with the system defaults. So, it seems auto6 is enabled by default. Proposal: strstr "$(ip -6 addr show dev $netif)" 'inet6' && echo "IPV6INIT=yes" (In reply to Harald Hoyer from comment #7) > Proposal: > strstr "$(ip -6 addr show dev $netif)" 'inet6' && echo "IPV6INIT=yes" Sounds good, thanks! dracut-033-160.el7 Retested with dracut-033-160.el7, but anaconda is still failing with the traceback. there are two connections reported by network manager: [anaconda root@localhost ~]# nmcli c NAME UUID TYPE DEVICE eth0 154057ad-74b0-4912-9df8-099e784aa6d2 802-3-ethernet eth0 eth0 00b372d0-c4fb-4f0e-8d17-5588fc745df7 802-3-ethernet -- [anaconda root@localhost ~]# Details of the connections will be attached. Moving back to ASSIGNED. Created attachment 886092 [details]
nmcli_c_show_00b372d0
Created attachment 886093 [details]
nmcli_c_show_154057ad
nmcli c show uuid 154057ad-74b0-4912-9df8-099e784aa6d2
Jirka, any idea why the connections don't match in this case? (ipv6.routes?) See comment #13, comment #14, I'm adding also output of ifcfg.log: 10:06:38,358 DEBUG ifcfg: content of files (network initialization): 10:06:38,358 DEBUG ifcfg: /etc/sysconfig/network-scripts/ifcfg-eth0: 10:06:38,358 DEBUG ifcfg: # Generated by dracut initrd 10:06:38,359 DEBUG ifcfg: DEVICE="eth0" 10:06:38,359 DEBUG ifcfg: ONBOOT=yes 10:06:38,359 DEBUG ifcfg: NETBOOT=yes 10:06:38,359 DEBUG ifcfg: UUID="00b372d0-c4fb-4f0e-8d17-5588fc745df7" 10:06:38,359 DEBUG ifcfg: IPV6INIT=yes 10:06:38,359 DEBUG ifcfg: BOOTPROTO=dhcp 10:06:38,359 DEBUG ifcfg: HWADDR="52:54:00:1e:82:55" 10:06:38,359 DEBUG ifcfg: TYPE=Ethernet 10:06:38,359 DEBUG ifcfg: NAME="eth0" 10:06:38,373 DEBUG ifcfg: all settings: [{'802-3-ethernet': {'s390-options': {}, 'mac-address': [82, 84, 0, 30, 130, 85]}, 'connection': {'interface-name': 'eth0', 'type': '802-3-ethernet', 'id': 'eth0', 'uuid': '00b372d0-c4fb-4f0e-8d17-5588fc745df7'}, 'ipv4': {'routes': [], 'addresses': [], 'dns': [], 'method': 'auto'}, 'ipv6': {'routes': [], 'addresses': [], 'dns': [], 'method': 'auto'}}, {'802-3-ethernet': {'s390-options': {}, 'mac-address': [82, 84, 0, 30, 130, 85]}, 'connection': {'autoconnect': False, 'interface-name': 'eth0', 'timestamp': 1397469979L, 'type': '802-3-ethernet', 'id': 'eth0', 'uuid': '154057ad-74b0-4912-9df8-099e784aa6d2'}, 'ipv4': {'routes': [], 'addresses': [], 'dns': [23374016L], 'method': 'auto'}, 'ipv6': {'routes': [([38, 32, 0, 82, 0, 0, 34, 48, 0, 0, 0, 0, 10, 34, 48, 241], 128L, [254, 128, 0, 0, 0, 0, 0, 0, 80, 84, 0, 255, 254, 188, 177, 87], 0L)], 'addresses': [], 'dns': [], 'method': 'auto'}}] Jirka, here are some additional outputs you asked for:
# ip -6 r
2620:52:0:2230::a22:30f1 via fe80::5054:ff:febc:b157 dev eth0 proto static metric 1
fc00::/64 dev eth0 proto kernel metric 256 expires 3477sec
fe80::/64 dev eth0 proto kernel metric 256
default via fe80::5054:ff:febc:b157 dev eth0 proto static metric 1024
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:1e:82:55 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.145/24 brd 192.168.100.255 scope global dynamic eth0
valid_lft 3493sec preferred_lft 3493sec
inet6 fc00::5054:ff:fe1e:8255/64 scope global dynamic
valid_lft 3474sec preferred_lft 3474sec
inet6 fe80::5054:ff:fe1e:8255/64 scope link
valid_lft forever preferred_lft forever
(In reply to Radek Vykydal from comment #15) > Jirka, any idea why the connections don't match in this case? (ipv6.routes?) > See comment #13, comment #14, I'm adding also output of ifcfg.log: > Yes, the problem is comparison of IPv6 routes: ipv6.routes: { dst = 2620:52:0:2230::a22:30f1/128, nh = fe80::5054:ff:febc:b157, mt = 0 } As you can see in comment #16 there is a static route entry for 2620:52:0:2230::a22:30f1 via link local address. But there no global IPv6 address 2620:52... So the question is who configured this. Because in case of a auto-configuration there should not be "static" keyword, but rather "kernel", "ra" or the like. The following bugs seem like the same or similar issue: bug 1086237 and bug 1086812. Ladislav helped me to debug and get some output in bug 1086812. The strange thing is that 'ip -6 route' differs in initramfs and after switchroot. I guess the comparison of IPv6 routes might be enhanced in NetworkManager. But first we need to understand the interaction between dracut and NM and what changed from previous snapshots. So it seems to me that the route is added by NetworkManager itself as it handles NDP.
In my case I have this route:
2620:52:0:2200::/64 dev enp0s25 proto static metric 1
and it is added by this NM code path:
#0 ip6_route_add (platform=0x7fe51c55a820, ifindex=2, network=..., plen=64, gateway=..., metric=1, mss=0)
at platform/nm-linux-platform.c:3056
#1 0x00007fe51a3240ca in nm_platform_ip6_route_add (ifindex=2, network=..., plen=64, gateway=..., metric=1, mss=0)
at platform/nm-platform.c:1576
#2 0x00007fe51a324371 in nm_platform_ip6_route_sync (ifindex=2, known_routes=0x7fe508007660) at platform/nm-platform.c:1778
#3 0x00007fe51a343594 in nm_ip6_config_commit (config=0x7fe51c6205f0, ifindex=2, priority=<optimized out>) at nm-ip6-config.c:291
#4 0x00007fe51a2fdeb2 in nm_device_set_ip6_config (self=0x7fe51c600020, new_config=0x7fe51c6205f0, commit=1, reason=0x7fffb3b23e34)
at devices/nm-device.c:5242
#5 0x00007fe51a2ff3ba in ip6_config_merge_and_apply (self=0x7fe51c600020, commit=1, out_reason=0x7fffb3b23e34)
at devices/nm-device.c:3066
#6 0x00007fe51a308ed9 in nm_device_activate_ip6_config_commit (user_data=<optimized out>) at devices/nm-device.c:4570
#7 0x00007fe516479ac6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#8 0x00007fe516479e48 in g_main_context_iterate.isra.22 () from /lib64/libglib-2.0.so.0
#9 0x00007fe51647a25a in g_main_loop_run () from /lib64/libglib-2.0.so.0
#10 0x00007fe51a2f67fa in main (argc=1, argv=0x7fffb3b24458) at main.c:644
So I think we should track in NM the origin of routes/addresses and add routes with "proto ra" if they are from autoconfiguration (using rtnl_route_set_protocol()).
Jirka, should we clone the bz for NM or reassign to NM? Is 2620:52:0:2230::a22:30f1 a DNS server provided by either Router Advertisements or by a DHCPv6 server on the link? The /128 via <router> routes are added automatically by the kernel whenever any process tries to talk to that IP address. These routes are kept around for a short period of time. So, if some process (dracut) configured IPv6 and then some other process tries to do anything on the network before NM starts, you'll see these cache routes in the kernel. It appears that NetworkManager picks those "cache" routes up when generating the initial connection. They cause the initial connection matching to fail as shown here. These routes are marked as RTM_F_CLONED by the kernel, and NetworkManager should likely be filtering these routes out when reading the interface's current configuration at startup. I think this will take care of the current problem of connection matching. ----- The next enhancement should be, as Jirka suggests, to tag the routes that NM gets from DHCP or RA with RTPROT_RA or RTPROT_DHCP and *additionally* ignore these routes when reading the interface's current configuration. That will make connection matching work better on NM restart since NM will not load these "automatic" routes into the current interface configuration. ----- I will prepare testing packages with a candidate fix for this issue, which will probably also fix bug 1086812. Brew build here with fix to ignore cloned routes: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=7380134 With this build we should no longer see NM tripping over the "X/128 via Y" routes on startup *unless* they are legitimate static routes added by DHCP or from the RA. For those routes, we'll have to handle them as Jirka suggests in a separate patch, but those are much less common. Please let me know if this fixes the problem! I have created new installation images with NM from comment 21, and I'm no longer able to reproduce this issue. [anaconda root@localhost ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 52:54:00:1e:82:55 brd ff:ff:ff:ff:ff:ff inet 192.168.100.145/24 brd 192.168.100.255 scope global dynamic eth0 valid_lft 3373sec preferred_lft 3373sec inet6 fc00::5054:ff:fe1e:8255/64 scope global dynamic valid_lft 3448sec preferred_lft 3448sec inet6 fe80::5054:ff:fe1e:8255/64 scope link valid_lft forever preferred_lft forever [anaconda root@localhost ~]# ip -6 r fc00::/64 dev eth0 proto static metric 1 fc00::/64 dev eth0 proto kernel metric 256 expires 3346sec fe80::/64 dev eth0 proto kernel metric 256 default via fe80::5054:ff:febc:b157 dev eth0 proto static metric 1024 [anaconda root@localhost ~]# [anaconda root@localhost ~]# nmcli c NAME UUID TYPE DEVICE eth0 578c636c-665c-4798-9ad3-cd3683376567 802-3-ethernet eth0 [anaconda root@localhost ~]# (In reply to Jan Stodola from comment #22) > I have created new installation images with NM from comment 21, and I'm no > longer able to reproduce this issue. > > [anaconda root@localhost ~]# ip a > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > inet 127.0.0.1/8 scope host lo > valid_lft forever preferred_lft forever > inet6 ::1/128 scope host > valid_lft forever preferred_lft forever > 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state > UP qlen 1000 > link/ether 52:54:00:1e:82:55 brd ff:ff:ff:ff:ff:ff > inet 192.168.100.145/24 brd 192.168.100.255 scope global dynamic eth0 > valid_lft 3373sec preferred_lft 3373sec > inet6 fc00::5054:ff:fe1e:8255/64 scope global dynamic > valid_lft 3448sec preferred_lft 3448sec > inet6 fe80::5054:ff:fe1e:8255/64 scope link > valid_lft forever preferred_lft forever > > [anaconda root@localhost ~]# ip -6 r > fc00::/64 dev eth0 proto static metric 1 > fc00::/64 dev eth0 proto kernel metric 256 expires 3346sec > fe80::/64 dev eth0 proto kernel metric 256 > default via fe80::5054:ff:febc:b157 dev eth0 proto static metric 1024 > [anaconda root@localhost ~]# > > [anaconda root@localhost ~]# nmcli c > NAME UUID TYPE DEVICE > eth0 578c636c-665c-4798-9ad3-cd3683376567 802-3-ethernet eth0 > [anaconda root@localhost ~]# Jan, could you QE ack this bug then so we can proceed with getting the fix either into RC or 0day? Thanks! FWIW, the patch is just: diff --git a/src/platform/nm-linux-platform.c b/src/platform/nm-linux-platform.c index 7ec3da6..9d90c28 100644 --- a/src/platform/nm-linux-platform.c +++ b/src/platform/nm-linux-platform.c @@ -3141,14 +3141,16 @@ ip_route_mark_all (NMPlatform *platform, int family, int ifindex) continue; if (rtnl_route_get_protocol (rtnlroute) == RTPROT_KERNEL) continue; if (rtnl_route_get_family (rtnlroute) != family) continue; if (rtnl_route_get_nnexthops (rtnlroute) != 1) continue; + if (rtnl_route_get_flags (rtnlroute) & RTM_F_CLONED) + continue; nexthop = rtnl_route_nexthop_n (rtnlroute, 0); if (rtnl_route_nh_get_ifindex (nexthop) != ifindex) continue; nl_object_mark (object); count++; } *** Bug 1086812 has been marked as a duplicate of this bug. *** The patch in comment #24 looks fine to me. looks good to me too Updated patch; danw pointed out on IRC that the changes applied to both AF_INET and AF_INET6, while we only want to ignore AF_INET6 cloned routes. diff --git a/src/platform/nm-linux-platform.c b/src/platform/nm-linux-platform.c index 7ec3da6..9d90c28 100644 --- a/src/platform/nm-linux-platform.c +++ b/src/platform/nm-linux-platform.c @@ -3141,14 +3141,16 @@ ip_route_mark_all (NMPlatform *platform, int family, int ifindex) continue; if (rtnl_route_get_protocol (rtnlroute) == RTPROT_KERNEL) continue; if (rtnl_route_get_family (rtnlroute) != family) continue; if (rtnl_route_get_nnexthops (rtnlroute) != 1) continue; + if ((family == AF_INET6) && (rtnl_route_get_flags (rtnlroute) & RTM_F_CLONED)) + continue; nexthop = rtnl_route_nexthop_n (rtnlroute, 0); if (rtnl_route_nh_get_ifindex (nexthop) != ifindex) continue; nl_object_mark (object); count++; } Updated candidate builds here: http://people.redhat.com/dcbw/NetworkManager/7.0/ reproducer:
1. configure network in /etc/libvirt/qemu/networks/ipv4_6.xml
<network>
<name>ipv4_6</name>
<uuid>3d473d9a-97c3-479c-90b8-bbfb910ce2cf</uuid>
<forward mode='nat'/>
<bridge name='virbr2' stp='on' delay='0' />
<mac address='52:54:00:23:a6:3d'/>
<domain name='ipv4_6'/>
<ip address='192.168.199.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.199.128' end='192.168.199.254' />
</dhcp>
</ip>
<ip family='ipv6' address='fc00::1' prefix='64'>
</ip>
<route family='ipv6' address='fc00::' prefix='64' gateway='fc00::1'/>
</network>
2. start the network in virt-manager
3. create a new VM via http installation using a tree and in URL options add text
4. select ipv4_6 network under Advanced options
5. Click Finish
6. in anaconda select 2 as text mode
7. select 5 to configure network
8 select 2 to configure eth0
crash occurs here if unfixed otherwise device configuration is shown
reproduced with:
NetworkManager-0.9.9.1-12.git20140326.4dba720.el7.x86_64.rpm
fixed in:
NetworkManager-0.9.9.1-13.git20140326.4dba720.el7.x86_64
(In reply to Dan Williams from comment #30) > Updated patch; danw pointed out on IRC that the changes applied to both > AF_INET and AF_INET6, while we only want to ignore AF_INET6 cloned routes. > > diff --git a/src/platform/nm-linux-platform.c > b/src/platform/nm-linux-platform.c > index 7ec3da6..9d90c28 100644 > --- a/src/platform/nm-linux-platform.c > +++ b/src/platform/nm-linux-platform.c > @@ -3141,14 +3141,16 @@ ip_route_mark_all (NMPlatform *platform, int family, > int ifindex) > continue; > if (rtnl_route_get_protocol (rtnlroute) == RTPROT_KERNEL) > continue; > if (rtnl_route_get_family (rtnlroute) != family) > continue; > if (rtnl_route_get_nnexthops (rtnlroute) != 1) > continue; > + if ((family == AF_INET6) && (rtnl_route_get_flags (rtnlroute) & > RTM_F_CLONED)) > + continue; > nexthop = rtnl_route_nexthop_n (rtnlroute, 0); > if (rtnl_route_nh_get_ifindex (nexthop) != ifindex) > continue; > nl_object_mark (object); > count++; > } LGTM, ACK *** Bug 1086237 has been marked as a duplicate of this bug. *** *** Bug 1102101 has been marked as a duplicate of this bug. *** *** Bug 1102112 has been marked as a duplicate of this bug. *** This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |
Created attachment 881383 [details] the traceback If you try to configure a network device, that has been activated in initramfs using dhcp, in text mode, attached traceback is hit. Snapshot 13