Bug 1578668
| Summary: | [NMCI] nmcli_general_correct_profile_activated_after_restart test failure | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Vladimir Benes <vbenes> | ||||||
| Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | ||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | Desktop QE <desktop-qa-list> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 7.5 | CC: | aokuliar, atragler, bgalvani, fgiudici, lrintel, rkhan, sukulkar, thaller, vbenes | ||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-09-17 12:14:34 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Vladimir Benes
2018-05-16 06:32:10 UTC
Created attachment 1441448 [details]
NM logfile from test failure (colorized, watch with `less -R`)
It seems, after restart of NM, assuming the connection on device "testG" fails.
There are several issues.
Issue 1.):
# start assuming connection
[1527230126.9200] manager: (testG): assume: will attempt to assume matching connection 'con_general' (2f22bc2f-ef70-4b5e-9f77-14bf5678d1ea) (indicated)
[1527230127.1760] device (testG): state change: config -> ip-config (reason 'none', sys-iface-state: 'assume')
# start slaac/DHCPv6
[1527230128.5580] ndisc-lndp[0x5569a2ead200,"testG"]: received router advertisement at 3
[1527230128.5581] ndisc[0x5569a2ead200,"testG"]: dhcp-level managed
[1527230128.6013] dhcp6 (testG): dhclient started with pid 2072
# DHCPv6 fails
[1527230128.6169] dhcp6 (testG): client pid 2072 exited with status 1
[1527230128.6170] device[0x5569a2e727d0] (testG): DHCPv6 failed
# at this point, no more DHCPv6 is tried, until we eventually fail with:
[1527230608.9838] device (testG): DHCPv6: grace period expired
[1527230608.9839] device (testG): state change: ip-config -> failed (reason 'ip-config-expired', sys-iface-state: 'assume')
Bug: having the grace period, only if DHCP is still running.
Issue 2.):
in the logfile, we receive very often router advertisements, for example:
[1527230598.5162] ndisc-lndp[0x5569a2ead200,"testG"]: received router advertisement at 473
whenever that happens, we remove and add IPv6 addresses
[1527230598.5164] platform: address: deleting IPv6 address fe80::780c:1459:5c81:4527/64, ifindex 38 dev testG
this seems wrong, because repeated RAs should not result in any changes (unless, of course, a router goes away or appears or change config, which does not seem the case here). Needs investigation.
Issue 3.)
Figure out why DHCPv6 fails in the first place at
[1527230128.6169] dhcp6 (testG): client pid 2072 exited with status 1
> Issue 3.) > > Figure out why DHCPv6 fails in the first place at > > [1527230128.6169] dhcp6 (testG): client pid 2072 exited with status 1 It seems the reason is that a conflicting dhclient instance is running. But I couldn't figure out why that is happening, because we should kill dhclient via the PID file... I think the problem is: [1527665185.7301] device (testG): ---- IPv6 config merge and apply [1527665185.7302] ---- composite after merging AC [1527665185.7302] a: 2620:dead:beaf:0:34fa:fb:a0a6:730d/64 lft 1800sec pref 1800sec lifetime 6-6[1800,1800] dev 35 flags noprefixroute src ndisc [1527665185.7303] ---- composite after merging EXT [1527665185.7303] a: 2620:dead:beaf:0:34fa:fb:a0a6:730d/64 lft 1800sec pref 1800sec lifetime 6-6[1800,1800] dev 35 flags noprefixroute src ndisc [1527665185.7303] a: fe80::3bf3:e4ef:bbd:3b43/64 lft forever pref forever lifetime 6-0[4294967295,4294967295] dev 35 flags permanent,noprefixroute src kernel [1527665185.7303] a: 2620:dead:beaf::18b/128 lft 23sec pref 23sec lifetime 6-4[25,25] dev 35 flags noprefixroute,tentative src kernel [1527665185.7303] ---- composite after merging CON [1527665185.7304] a: 2620:dead:beaf:0:34fa:fb:a0a6:730d/64 lft 1800sec pref 1800sec lifetime 6-6[1800,1800] dev 35 flags noprefixroute src ndisc [1527665185.7304] a: fe80::3bf3:e4ef:bbd:3b43/64 lft forever pref forever lifetime 6-0[4294967295,4294967295] dev 35 flags permanent,noprefixroute src kernel [1527665185.7304] a: 2620:dead:beaf::18b/128 lft 23sec pref 23sec lifetime 6-4[25,25] dev 35 flags noprefixroute,tentative src kernel [1527665185.7304] device[0x23c74d0] (testG): ip6-config: update (commit=1, new-config=0x23d3e50) [1527665185.7304] platform: ---- ip6 address sync, full 0 ---- [1527665185.7304] platform: (known) 2620:dead:beaf:0:34fa:fb:a0a6:730d/64 lft 1800sec pref 1800sec lifetime 6-6[1800,1800] dev 35 flags noprefixroute src ndisc [1527665185.7304] platform: (known) fe80::3bf3:e4ef:bbd:3b43/64 lft forever pref forever lifetime 6-0[4294967295,4294967295] dev 35 flags permanent,noprefixroute src kernel [1527665185.7305] platform: (known) 2620:dead:beaf::18b/128 lft 23sec pref 23sec lifetime 6-4[25,25] dev 35 flags noprefixroute,tentative src kernel [1527665185.7305] platform: - [1527665185.7305] platform: (plat) 2620:dead:beaf:0:34fa:fb:a0a6:730d/64 lft 1766sec pref 1766sec lifetime 6-4[1768,1768] dev 35 flags noprefixroute src kernel [1527665185.7305] platform: (plat) 2620:dead:beaf::18b/128 lft 23sec pref 23sec lifetime 6-4[25,25] dev 35 flags noprefixroute,tentative src kernel [1527665185.7305] platform: (plat) fe80::3bf3:e4ef:bbd:3b43/64 lft forever pref forever lifetime 6-0[4294967295,4294967295] dev 35 flags permanent,noprefixroute src kernel [1527665185.7305] platform: ### deleting address because doesn't match: [1527665185.7306] platform: ### plat : fe80::3bf3:e4ef:bbd:3b43/64 lft forever pref forever lifetime 6-0[4294967295,4294967295] dev 35 flags permanent,noprefixroute src kernel [1527665185.7306] platform: ### known: 2620:dead:beaf:0:34fa:fb:a0a6:730d/64 lft 1800sec pref 1800sec lifetime 6-6[1800,1800] dev 35 flags noprefixroute src ndisc After restart the IPv6 LL address is added to the composite config we want to apply in the wrong order because it is added as part of the captured external IPv6 config and so at each commit NM removes and adds again addresses. Therefore, the link-local address stays in tentative state, which causes problems with DHCP and NDP. Created attachment 1445908 [details]
[PATCH] platform: sort known IPv6 addresses by scope before a sync
+ NMPlatformIP6Address *x = NMP_OBJECT_CAST_IP6_ADDRESS (*(void **) a); "const NMPlatformIP6Address *x" (In reply to Thomas Haller from comment #8) > + NMPlatformIP6Address *x = NMP_OBJECT_CAST_IP6_ADDRESS (*(void **) a); > > "const NMPlatformIP6Address *x" otherwise, patch from comment 7 lgtm. I assuem, this fixes issue #2 (from comment 4). Can you confirm, that the tentative IPv6 addresss caused dhclient to exit immediately? (issue #3 from comment 4) It seems, issue #1 still needs fixing (although, it doesn't happen now anymore). (In reply to Thomas Haller from comment #9) > (In reply to Thomas Haller from comment #8) > > + NMPlatformIP6Address *x = NMP_OBJECT_CAST_IP6_ADDRESS (*(void **) a); > > > > "const NMPlatformIP6Address *x" Ok. > otherwise, patch from comment 7 lgtm. > > I assuem, this fixes issue #2 (from comment 4). Yes. > Can you confirm, that the tentative IPv6 addresss caused dhclient to exit > immediately? (issue #3 from comment 4) Yes: root@fedora ~ # echo 40 > /proc/sys/net/ipv6/conf/ens4/dad_transmits root@fedora ~ # ip l set ens4 down root@fedora ~ # ip l set ens4 up root@fedora ~ # ip a show dev ens4 3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:a2:8a:9a brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:fea2:8a9a/64 scope link tentative valid_lft forever preferred_lft forever root@fedora ~ # dhclient -v -6 ens4 Internet Systems Consortium DHCP Client 4.3.6 Copyright 2004-2017 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Can't bind to dhcp address: Cannot assign requested address ... exiting. Applied the patch to master branch: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=851d89dc6ab59cd19de03fc842893fabe8c33036 > It seems, issue #1 still needs fixing (although, it doesn't happen now > anymore). Correct, I'll investigate this other issue in the next days. *** Bug 1592830 has been marked as a duplicate of this bug. *** Hi Vladimir, the test is now passing on all branches. Can this be closed now? I'll file a separate bug for the issue #1 suggested by Thomas. Filed bug 1625901 for issue #1. |