Bug 1292613 - Race condition between NetworkManager and anaconda on IPv6-only hosts
Summary: Race condition between NetworkManager and anaconda on IPv6-only hosts
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: anaconda
Version: 7.1
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Jiri Konecny
QA Contact: Release Test Team
Depends On:
TreeView+ depends on / blocked
Reported: 2015-12-17 22:39 UTC by Bryan Wann
Modified: 2019-12-25 02:13 UTC (History)
4 users (show)

Fixed In Version: anaconda-
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2016-11-03 23:20:56 UTC
Target Upstream Version:
bwann: needinfo-

Attachments (Terms of Use)
Anaconda .treeinfo retry patch (2.79 KB, patch)
2015-12-17 22:39 UTC, Bryan Wann
no flags Details | Diff
Log output after patch applied (16.29 KB, text/plain)
2015-12-17 22:40 UTC, Bryan Wann
no flags Details

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2158 0 normal SHIPPED_LIVE anaconda bug fix and enhancement update 2016-11-03 13:13:55 UTC

Description Bryan Wann 2015-12-17 22:39:27 UTC
Created attachment 1106893 [details]
Anaconda .treeinfo retry patch

Description of problem:
During non-interactive kickstart installation on a IPv6-only host, package source+selection fails in Anaconda due to race condition with NetworkManager bringing up the NIC and learning a v6 gateway via router solicitation.

In our network we rely on learning the v6 default gateway from layer3 rack switches via ICMP6 router solicitation/router advertisements. For installing hosts we assign a static IPv6 addresses (for DNS mapping) and disable SLAAC.

Anaconda starts, which then starts NetworkManager. Anaconda immediately tries to fetch treeinfo from defined repos. However, NetworkManager has not yet finished bringing up the NIC to a CONNECTED_GLOBAL state. Thus anaconda's treeinfo download fails with a "Network is unreachable" error, removes the repo(s) from consideration. 

Anaconda fails 1-2 seconds before NetworkManager has finished and has installed a v6 default gateway.

Version-Release number of selected component (if applicable):
anaconda 19.31.123-1
NetworkManager 1.0.0-14.git20150121.b4ea599c.el7

How reproducible:
Very reproducible, virtually all v6-only kickstart installations

Steps to Reproduce:
1. Build a kickstart configuration with v6-only base/repo urls, and v6-only network configuration:
  url --url http://[2401:db00:11:df:face:b00c:0:134]/yum/centos/7.x/os/x86_64
  network --noipv4 --hostname=aux1.prn1.facebook.com --bootproto=static --ipv6=2401:db00:19:5a:face:0:31:0 --device=eth0 --nameserver=2401:db00:f0:a53::,2401:db00:f0:b53::

2. Kickstart the host on an IPv6-only network (e.g. via iPXE or UEFI), specifying options for a static v6 address, disabling SLAAC, and specifying no gateway on the kernel/dracut command-line:

  ip=[2401:db00:11:815a:face:0:31:0]:::64:::none noipv4

3. Watch /tmp/packaging.log vs /tmp/syslog for NetworkManager progress on the host being installed

Actual results:
Anaconda starts, fails to fetch repo tree data, considers the repos unusable. On console this results in:

3) [!] Software selection (Installation source not set up)
4) [!] Installation source (Error setting up software source)

Expected results:
No software selection/source errors, anaconda finishes the installation.

Additional info:
I've been able to fix this problem by adding a retry mechanism to the .treeinfo download function in pyanaconda/packaging/__init__.py.  This is attached as fb-anaconda-package-treeinfo.patch.

There's an upstream Anaconda patch that retries package repo metadata downloads. I basically did exactly this in my treeinfo fix:

Logs of failures:

Logs of success after workaround attached as fixed-anaconda-packaging.log


Comment 1 Bryan Wann 2015-12-17 22:40:10 UTC
Created attachment 1106894 [details]
Log output after patch applied

Comment 3 Bryan Wann 2015-12-17 23:20:12 UTC
Issue originally reported on Anaconda's github page, but my workaround there was incorrect:


Comment 4 Martin Banas 2016-03-01 08:23:43 UTC
Hi Bryan,
would you be able to help with testing of this bug once the fix is available?


Comment 5 Bryan Wann 2016-03-01 20:32:28 UTC
Sure thing

Comment 6 Jiri Konecny 2016-03-17 13:53:43 UTC
Hello Bryan,

from your first comment I understand that your issue should be fixed now by the commit you have mentioned.


If I am correct could you please test your issue in RHEL 7.2 where this patch should be included.

Thank you

Comment 7 Bryan Wann 2016-03-17 18:59:44 UTC
No, it's not the same thing. The code that's already in Anaconda handles retries for package repo metadata. My issue happens earlier when Anaconda is fetching .treeinfo since it's the first download operation that happens during the install. My fix replicated the same code from that commit and applied it to packaging/__init__.py so we retry fetching there.

This gives us enough time for v6 to have gone through things like duplicate address detection, RS/RA and have a usable gateway. This all could take 1-4 seconds to complete. Otherwise we will likely fail downloading .treeinfo and mark the repo as unusable.

I was looking at the NetworkManager code in Anaconda yesterday. It looks like the root cause is that we wait for NM to signal any sort of 'connected' state, i.e. local, site, global before allowing Anaconda to continue. This seems kind of broken because if we have to go outside our local network for package repos/etc but proceed on a connected_local state we could miss out. (The FIXME comment in the code alludes to this)

NM code where this happens:

Unfortunately a more rigorous fix for this seems pretty thorny, we'd have to figure out what repos/resources during installation are remote. Perhaps this retry mechanism for .treeinfo is the best compromise.

Comment 8 Jiri Konecny 2016-03-18 09:00:46 UTC
Sorry for my misunderstanding of your issue and thank you for your patch and for explanation.

I'll look on this soon.

Comment 9 Jiri Konecny 2016-03-22 10:00:43 UTC
PR: https://github.com/rhinstaller/anaconda/pull/561

I've created patch based on your patch Bryan. Thank you for your work on the patch.

The final solution (Network Manager state) seems to me too invasive for the RHEL but I'm going to create that fix to master branch later.

Comment 10 Mike McCune 2016-03-28 22:46:32 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 12 Martin Banas 2016-08-31 11:22:48 UTC
Hi Brian,
this issue should be fixed in RHEL-7.3 Beta compose. Could you please retest that the issue is fixed for you?


Comment 13 Martin Banas 2016-09-21 07:55:48 UTC
any update?


Comment 16 errata-xmlrpc 2016-11-03 23:20:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.