Bug 873468

Summary: F18 Beta TC7: sometimes when booting netinst, hub comes up showing 'Nothing selected' for Installation Source and Software Selection
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: anacondaAssignee: Radek Vykydal <rvykydal>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: akostadi, g.kaviyarasu, jonathan, mfabian, nonamedotc, pschindl, robatino, rvykydal, vanmeeuwen+fedora
Target Milestone: ---Keywords: CommonBugs, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedNTH RejectedBlocker https://fedoraproject.org/wiki/Common_F18_bugs#install-source-nothing
Fixed In Version: anaconda-18.32-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-05 17:51:51 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 752665    
Attachments:
Description Flags
pre-hub network connection screen showing connected.
none
anaconda.log
none
program.log
none
syslog
none
X.log
none
ifcfg.log
none
packaging.log
none
anaconda-yum.conf
none
anaconda.log from 18.29 fail
none
ifcfg.log from 18.29 fail
none
packaging.log from 18.29 fail
none
program.log from 18.29 fail
none
storage.log from 18.29 fail
none
X.log from 18.29 fail
none
a patch for issues from comment #24
none
syslog from fail with 18.29 (TC9)
none
Output from 'journalctl -a'
none
anaconda.log
none
ifcfg.log
none
program.log none

Description Adam Williamson 2012-11-05 18:16:50 EST
There seems to be some kind of race condition with network start and remote repo connection in F18 Beta TC7.

Three people - me, nonamedotc and smr54 at least - have seen this. When booting netinst with a wired, DHCP-able connection, sometimes when you first land on the hub, Installation Source and Software Selection both show up as 'Nothing selected' with the warning triangle. They ought to show 'Closest mirror' and 'GNOME Desktop'. Sometimes, they do (it works correctly) - suggesting that this is a race of some kind.

If you hit the bug, it's annoying to work around: you have to go into Installation Source and change it somehow - just changing it to 'http://foo' is enough - and hit Done, wait for the hub to process, then go back into Installation Source and change back to 'Closest mirror' and hit Done again. At that point, things work normally.

For me this happens only fairly rarely - one in five tries so far. I think smr54 said he saw it 2 times in 6 tries or something like that.
Comment 1 Adam Williamson 2012-11-05 18:18:40 EST
I guessed this might  be somehow related to the fix for 871129. http://jkeating.fedorapeople.org/updates.img has that fix reverted, so it'd be interesting for someone who can reproduce this more than I can to try some boots with that updates.img and see if the problem still occurs. Of course, the updates.img loading process itself may affect the bug :(
Comment 2 Adam Williamson 2012-11-05 18:58:35 EST
Tom Lane reckons this may be related to https://bugzilla.redhat.com/show_bug.cgi?id=870570 ...I'm not so sure. That bug is why *working around* this problem is a pain, but I'm not sure it causes this bug to happen in the first place.
Comment 3 Mukundan Ragavan 2012-11-05 22:02:05 EST
Log files for this bug pulled from /tmp are being attached. One screenshot is also attached which shows the pre-hub network connection screen although it shows that the network is already connected.

Additional details:

Wired Connection, DHCP, anaconda 18.24 on TC7 image written to a USB drive using dd.

No additional parameters at boot time.
Comment 4 Mukundan Ragavan 2012-11-05 22:02:57 EST
Created attachment 639062 [details]
pre-hub network connection screen showing connected.
Comment 5 Mukundan Ragavan 2012-11-05 22:03:53 EST
Created attachment 639063 [details]
anaconda.log
Comment 6 Mukundan Ragavan 2012-11-05 22:04:24 EST
Created attachment 639064 [details]
program.log
Comment 7 Mukundan Ragavan 2012-11-05 22:04:52 EST
Created attachment 639065 [details]
syslog
Comment 8 Mukundan Ragavan 2012-11-05 22:05:37 EST
Created attachment 639066 [details]
X.log
Comment 9 Mukundan Ragavan 2012-11-05 22:06:02 EST
Created attachment 639067 [details]
ifcfg.log
Comment 10 Mukundan Ragavan 2012-11-05 22:07:20 EST
Created attachment 639068 [details]
packaging.log
Comment 11 Mukundan Ragavan 2012-11-05 22:09:37 EST
Created attachment 639069 [details]
anaconda-yum.conf
Comment 12 Radek Vykydal 2012-11-06 11:07:28 EST
This is a race caused by slower DHCP and our poorer event logic.

DHCP default auto connection is run by NM.

1) connection is checked to decide if pre-hub network spoke should be used
2) DHCP may succeed now (e.g. while in language pre-hub spoke, or while initializing network pre-hub spoke)
3) pre-hub network spoke is displayed (may be in "Connected" as in comment #4 because DHCP succeeded in 2))
4) connection is checked
5) DHCP may succeed now (or a connection can be activated now in pre-hub network spoke by user)
6) Source spoke is updated if now there is connection and there was not connection in 4)

There are at least two issues rising from this logic:
A) pre-hub network spoke can be displayed in "Connected" state (see 3) above)
B) Source spoke is not updated because in 4) there was a connection. The workaround described in 3rd paragraph of description is needed.

A) is something that I'd leave to a separate bug. We'd need to check connection when the spoke is actually displayed (not just when all spokes are gathered) and skip the spoke in this case. This might also require better logic or connected event handling for Source spoke refresh.
B) can be fixed in some cases by moving the check from 4) (very close) to 1). 

Sending a patch for B) to anaconda-patches.
Comment 13 Adam Williamson 2012-11-07 20:21:15 EST
For a), I actually filed a bug on that (well, what's probably that) already:

https://bugzilla.redhat.com/show_bug.cgi?id=874279

I doubt this changes your diagnosis, but I just noticed, when this bug happens, in packaging.log, I see three messages of the form:

'failed to grab repo metadata for (repo): Cannot retrieve metalink for repository: (repo)/18/x86_64. Please verify its path and try again'

for fedora, updates, and updates-testing.
Comment 14 Fedora Update System 2012-11-07 22:34:13 EST
anaconda-18.27-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.27-1.fc18
Comment 15 Fedora Update System 2012-11-08 22:21:28 EST
Package anaconda-18.27-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-18.27-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-17823/anaconda-18.27-1.fc18
then log in and leave karma (feedback).
Comment 16 Fedora Update System 2012-11-10 14:36:26 EST
Package anaconda-18.28-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-18.28-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-17823/anaconda-18.28-1.fc18
then log in and leave karma (feedback).
Comment 17 Adam Williamson 2012-11-15 15:32:10 EST
I just hit this bug, somehow, with 18.29. I think it may have something to do with sitting at the language screen for a long time, as I've seen it happen in that case before. This was with an image built for me by tflink with 18.29 - http://dl.fedoraproject.org/pub/alt/qa/18/20121115_f18b-systemd.noudevkill.x86_64.boot.iso .
Comment 18 Adam Williamson 2012-11-15 15:36:16 EST
Created attachment 645935 [details]
anaconda.log from 18.29 fail
Comment 19 Adam Williamson 2012-11-15 15:36:43 EST
Created attachment 645938 [details]
ifcfg.log from 18.29 fail
Comment 20 Adam Williamson 2012-11-15 15:37:11 EST
Created attachment 645944 [details]
packaging.log from 18.29 fail
Comment 21 Adam Williamson 2012-11-15 15:37:41 EST
Created attachment 645950 [details]
program.log from 18.29 fail
Comment 22 Adam Williamson 2012-11-15 15:38:05 EST
Created attachment 645952 [details]
storage.log from 18.29 fail
Comment 23 Adam Williamson 2012-11-15 15:38:31 EST
Created attachment 645953 [details]
X.log from 18.29 fail
Comment 24 Radek Vykydal 2012-11-16 10:50:21 EST
Adam, would you remember if you went through pre-hub network spoke?

In TC9, with (1) pre-hub network spoke displayed *and* (2) waiting for connection to succed before continuing, it works for me.

Also I'd like to see syslog or anaconda.syslog byt I suppose it is gone.

ad (2): Leaving the spoke without waiting for connection really leads to Nothing selected, but this belongs to a wider issue of update of source spoke after connecting to network in hub. I'd prefer not to do this automatically. It's too much of guessing (i.e. you might want to connect because of ntp server, network storage, ...)

ad (1): If the spoke was *not* displayed you are probably hitting another race: dhcp is too slow to be available when payload (source) is fetched in payloadInitialize run from anaconda script before UI, but it is fast enough to succeed before checking network status for the pre-hub spoke - so it wouldn't be displayed (as in 1) from comment #12).

This doesn't seem easy, CCing Chris, he might have ideas to solve the issue in more generic way. I'll attach a patch for one approach I was poking around.

[1]
commit 605f7999294267e1d7c7dbfd90bf5ba4007ecf12
commit 2d39ad12dbf4e388da5576bf61ed2e00634e9166
Comment 25 Radek Vykydal 2012-11-16 10:54:56 EST
Created attachment 646430 [details]
a patch for issues from comment #24
Comment 26 Radek Vykydal 2012-11-16 10:59:00 EST
(In reply to comment #25)

> Created attachment 646430 [details]
> a patch for issues from comment #24

updates image with the patch for TC9 (anaconda 18.29)
http://rvykydal.fedorapeople.org/updates.payloadrace.img
Comment 27 Adam Williamson 2012-11-16 15:40:46 EST
radek: the last few times I've reproduced this, no, I did not go through pre-hub network spoke. it's been happening when I boot an image to test something and then switch context for a bit, so it sits at the language selection screen for a while. I hit it like five times in a row yesterday, so I may well be able to reproduce it today. if I do hit it I'll grab the syslog.
Comment 28 Adam Williamson 2012-11-16 18:12:21 EST
Created attachment 646651 [details]
syslog from fail with 18.29 (TC9)

Hit the bug first boot of TC9 on my RAID test config today. Here's syslog. I'll try it with the updates.img a few times later and see how it goes.
Comment 29 Radek Vykydal 2012-11-19 09:29:40 EST
Adam thanks for your update and log. If you hit the bug again, please attach both anaconda.log and syslog from the installation so that I can compare timestamps to see if it is really the race condition I am suspecting. I am still unable to hit the contidion on my side (except for hacking anaconda with a sleep).
Comment 30 Adam Williamson 2012-11-19 15:17:40 EST
Will do, sorry, didn't realize you needed both from a single install.
Comment 31 Adam Williamson 2012-11-23 01:25:10 EST
Sorry, forgot to attach logs for this, but I've still been seeing it. I'll try and remember the logs next time.
Comment 32 Radek Vykydal 2012-11-27 08:31:20 EST
Should be fixed in anaconda-18.32-1 (post Beta).
Comment 33 Fedora Update System 2012-11-28 20:21:13 EST
anaconda-18.32-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.32-1.fc18
Comment 34 Fedora Update System 2012-11-30 01:39:05 EST
Package anaconda-18.32-1.fc18, pykickstart-1.99.22-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-18.32-1.fc18 pykickstart-1.99.22-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-19380/pykickstart-1.99.22-1.fc18,anaconda-18.32-1.fc18
then log in and leave karma (feedback).
Comment 35 Fedora Update System 2012-12-06 02:26:25 EST
pykickstart-1.99.22-1.fc18, anaconda-18.35-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 36 Petr Schindler 2012-12-20 09:31:36 EST
I have reproduced this bug with anaconda 18.37.5. It happens on every try. Because the workaround isn't easy to find (for general user this is show stopper) I propose this bug as blocker.
Comment 37 Petr Schindler 2012-12-20 09:32:22 EST
Created attachment 666694 [details]
Output from 'journalctl -a'
Comment 38 Petr Schindler 2012-12-20 09:32:54 EST
Created attachment 666695 [details]
anaconda.log
Comment 39 Petr Schindler 2012-12-20 09:33:22 EST
Created attachment 666696 [details]
ifcfg.log
Comment 40 Petr Schindler 2012-12-20 09:33:54 EST
Created attachment 666697 [details]
program.log
Comment 41 Adam Williamson 2012-12-20 20:22:28 EST
I have not seen this at all in recent tries...
Comment 42 Radek Vykydal 2012-12-21 07:41:09 EST
This concerns waiting (or not doing so in this case) for slow dhcp. I am suspecting that fix bug 830434 is causing that NM considers connection waiting for dhcp as connected (via ipv6).
This updates image contains patch (for 18.37-5 but should work with later) that could confirm my assessment:
http://rvykydal.fedorapeople.org/updates.ipv6_873468.img
Comment 43 Adam Williamson 2012-12-21 13:56:42 EST
Discussed at 2012-12-21 blocker review meeting: http://meetbot.fedoraproject.org/fedora-bugzappers/2012-12-21/f18final-blocker-review-7.2012-12-21-18.33.log.txt . We think this case isn't likely to be very common, so for now, rejected as blocker but accepted as NTH: if further testing shows this happening frequently, we can re-consider.
Comment 44 Mukundan Ragavan 2013-01-03 17:26:52 EST
I have not seen this in recent installation attempts using TC3. I think this could be considered fixed (?).
Comment 45 Adam Williamson 2013-01-03 23:58:18 EST
yeah, I haven't seen it either. Petr?
Comment 46 Petr Schindler 2013-01-10 06:01:54 EST
I can still reproduce it with 18.37.11. I run netinst written with dd on usb on UEFI.
Comment 47 Aleksandar Kostadinov 2013-01-14 03:47:26 EST
this perhaps related to Bug 888946 ?
Comment 48 Fedora End Of Life 2013-12-21 10:10:59 EST
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 49 Fedora End Of Life 2014-02-05 17:51:51 EST
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.