Red Hat Bugzilla – Bug 679486
Liveinst doesn't start if hostname changes
Last modified: 2013-09-13 08:15:06 EDT
+++ This bug was initially created as a clone of Bug #663294 +++
--- Additional comment from email@example.com on 2011-02-22 09:35:04 EST ---
> Can't confirm fix with Alpha RC1 KDE (anaconda-15.20.1-1.fc15.x86_64).
> On a freshly booted system:
> $ liveinst
> No protocol specified
> xhost: unable to open display ":0"
> 14:28:44 Starting graphical installation.
> No protocol specified
> 14:28:44 Exception starting GUI installer: could not open display
> 14:28:44 GUI installer startup failed, falling back to text mode.
> So the patch actually made things worse :)
> No idea why liveinst fails to perform the xhost command though, as I can run it manually:
> $ xhost
> access control enabled, only authorized clients can connect
> $ xhost +
> access control disabled, clients can connect from any host
> I haven't tried any other Live iso yet.
--- Additional comment from firstname.lastname@example.org on 2011-02-22 10:30:56 EST ---
Confirmed Sandro's behavior on the KDE RC1 live images. On 2 of 6 boot attempts, running liveinst starts the graphical installer. The 4 other attempts, running liveinst fails to start the graphical installer, and launches the text-mode installer. If you are clicking on the desktop icon to start liveinst, you'll never see the text-mode installer.
Running the following before starting liveinst works
# xhost +
In the 4 failure attempts, it seems that the XAUTHORITY ENV variable isn't being passed to the root user. I think this means the root user won't be able to connect to the liveinst X session. In the 2 successful attempts, when I `su` to root, I see an XAUTHORITY ENV variable.
So far, I've not managed to reproduce this failure on the GNOME live image. If this doesn't impact the GNOME image, we may want to track this as a different bug report, and leave this issue in VERIFIED (as the original problem has been addressed).
Discussed out-of-meeting-cycle with jlaska. Our call for now is acceptedblocker, on the basis of our historical practice of taking KDE bugs as blockers. If we take KDE bugs as blockers, this obviously hits the criterion "The installer must boot (if appropriate) and run on all primary architectures from default live image, DVD, and boot.iso install media ".
I just retested and discovered that 10 of 10 KDE live image boots were problem free. There must be something related to timing, or network connectivity. Either way, I can't reproduce it at all today.
Unless someone else is seeing this problem on a regular basis ... it's time to CommonBugs this issue, imo.
I'm removing F15Alpha and whiteboard:AcceptedBlocker from this issue. Both bcl and I have tested the heck out of this issue. I definitely did encounter this failure yesterday, but have yet to hit it again today. If this issue is this reproducer, I don't think that's enough to qualify as a blocker for the Alpha release. Should we know more about this issue later on, it's fairly straightforward to document the workaround, and we'll have updated nightly live images to fall-back to as well.
Removing the blocker because there's a workaround (xhost +) is fine with me...but closing notabug without even asking the original reporter (me) because he's afk for a mere 5-6h is not.
I've reproduced this a couple of times and I'm pretty sure I can do so again. I also think it's a race condition (and I said so in IRC before) and it's well-known those can be very tricky to reproduce and debug. I'll try to provide more of what I see soon.
I got this bug with KDE 20110223 LiveCD image yesterday.
Hardware: Acer 6935G notebook
Smolt profile: http://www.smolts.org/client/show/pub_55b08fdf-dd18-48d6-90b6-577bcb8da55e
Was able to boot with x86-64 version of GNOME LiveCD 20110224 on same hardware (Acer 6935G).
But Install to HDD also doens't work - similar to KDE i686 LiveCD 20110223.
The root cause and the reason why this is mostly seen on the kde but not on the gnome spin is as easy as: kdm is quicker to start than gdm - or rather the part that does the xauth stuff is quicker in kdm than in gdm.
Therefore, xauth will use localhost.localdomain with kdm while the dhcp hostname is used with gdm. Later on, liveinst will use the dhcp hostname in either case but once (kdm) failing because of the wrongish xauth entry and once (gdm) successful because the entry is correct.
That said, there's certain additional points:
- On a fast system, gdm might be 'too quick' as well
- On a slow system, kdm might be 'slow enough' as well
- If there's no dhcp hostname to set, it will work no matter how slow or quick kdm/gdm is
In case the difference has not been clear, here's the different log on process.
1) hostname = localhost.localdomain
2) hostname = dhcp hostname
3) kdm/gdm is started: xauth uses dhcp hostname
4) liveinst is started on dhcp hostname and allowed by xauth
1) hostname = localhost.localdomain
2) kdm/gdm is started: xauth uses localhost.localdomain
3) hostname = dhcp hostname
4) liveinst is started on dhcp hostname and denied by xauth
To verify my findings, I made two different approaches:
1) boot to runlevel 3 (login prompt will say localhost), login as root, verify that hostname is now set to the dhcp hostname, init 5, verify that xauth is set up correctly -> liveinst works
2) boot and edit /etc/X11/prefdm to add a "sleep 5" before the the dm is started, reboot to runlevel 5, verify that xauth is set up correctly -> liveinst works
So the solution that has to be found here is to make sure setting the hostname to dhcp is finished before prefdm is executed.
As this issue is still reproducible and could effect every spin - probably depending on hardware and network (i.e. fast vs slow) - I'm again to propose an alpha blocker on this to have it at least discussed on Friday's blocker bug meeting.
well, I executed # xhost +
and launched #liveinst
I was able to enter root password.
But later, when I tried to make disk partitioning, I got a bug similar to Bug 679125.
- AttributeError: 'NoneType' object has no attribute 'format'
So I was not able to proceed with install further.
Any ide how to overcome this?
Or current Rawhide (20110224) is "not installable"?
I was booting using Desktop-x86_64 version.
v.plessky, please note that this is not a support forum. If you find further bugs, please report them separately and if you need support make use of the appropriate support channels. Thank you.
Sandro Mathys, there is a bug with current ISO image, and I reported it.
Installer doesn't work neither in GNOME nor in KDE versions of LiveCD.
And yes, I can wait until it's fixed.
one bug per report, please!
nice detective job, hellmuth^H^H^H^H^H^H^H^H sandro :)
so this becomes a systemd-units issue, I believe, as that's where prefdm.service lives. I agree with re-considering it as a blocker. Fedora can boot very fast on SSDs in particular; I wouldn't be surprised if this affects quite a few SSD-based systems.
(In reply to comment #11)
> one bug per report, please!
> nice detective job, hellmuth^H^H^H^H^H^H^H^H sandro :)
> so this becomes a systemd-units issue, I believe, as that's where
> prefdm.service lives. I agree with re-considering it as a blocker. Fedora can
> boot very fast on SSDs in particular; I wouldn't be surprised if this affects
> quite a few SSD-based systems.
Remember that this is limited to DHCP hostname updates, I'm not sure how many actually use that feature. That's why I wasn't able to reproduce it.
I believe it'll happen with most systems directly connected to a consumer ISP (rather than going through a router), right?
Yes, and I'd say systems in a corporate network as well. At least that's my use case here :)
wassup? Can somebody please explain what this is a about? And what I am supposed to fix in systemd?
(In reply to comment #15)
> And what I am supposed to fix in systemd?
Comment #7 has the detailed analysis. The crux of the problem is an un-anticipated dependency that seems to be difficult to accomplish:
> So the solution that has to be found here is to make sure setting the hostname
> to [the one determined by] dhcp is finished before prefdm is executed.
Please look at that, and give an opinion about how it should be done, whether it is the right thing to do, and ideally suggest+explain an explicit fix [real code.]
You seem to be way of track here.
This is an bug in anaconda/liveinst.
And this seems to be 2 seperated bugs one regarding X and another regarding dhcp hostnames.
We had similar issues with firstboot when it got started with deprecated X options
And if I can recall correctly there is a dhcp hostname bug in anaconda which I cant find at the moment but we discussed during blocker bug meeting which had to do with dhcp and if memory servers me correct we accepted it localhost.localdomain as the valid workaround for that one ( as in not use the supplied domain name ).
Anyway it looks like systemd has exposed potential network depending bug and I dont think it has to do with systemd it self.
Does not running xhost + indicates that.
A) the localhost/localhost.localdomain.does not have access
B) the dhcp supplied domain name does not have access
C) the running user ( live user or whatever user liveinst runs as ) is not allowed to connect to the X server
So basically the problem is that xauth will be set up for localhost and when the hostname is then changed to the value obtained from dhcp $USERS won't be allowed to connect to the X server because xauth doesn't have a rule for the "new hostname"
Which is basically, what I wrote in comment #7.
So I did some further debugging, too. I've found in the syslog, that NetworkManager is the service that's re-setting the hostname to the value obtained from dhcp. So I tried to add NetworkManager.service to the After: line in prefdm.service but that didn't change anything. So it looks like NM is returning to systemd before it tries to obtain a hostname from dhcp. Probably no bug but a feature (i.e. normally there's no reason to wait for this).
That's where I lack the detail knowledge to debug this further. To me it looks like all components do what they're supposed to do but that just doesn't fit together.
And the only logical resolution of this problem to me is that NetworkManager or network basically what ever application controls the networking runs "xhost +my-dhcp-optain-domain-name" after it has received it right?
Or the application always tries to first authenticate towards X as user@localhost
I just want to say that requiring domain name for local users is just makes absolutely no sense to me and I'm pretty sure xhost +$domain was designed to be used for external domains only not localdomain/localhost nor a domain either set by dhcp or manually by the end user so there is something wrong with the actual liveuser
What does xhost say on a logged in user on the livecd?
Should be something like..
access control enabled, only authorized clients can connect
And what does xauth list say with and without networking disabled.
[liveuser@mjolnir ~]$ xauth list
localhost.localdomain/unix:0 MIT-MAGIC-COOKIE-1 d216855dcf353d8552cd351c7feac327
[liveuser@mjolnir ~]$ xauth list
mjolnir.ethz.ch:0 MIT-MAGIC-COOKIE-1 4aa76f169d293e1ca2b88cbf2132c92a
[fe80::21a:a0ff:feea:7dda]:0 MIT-MAGIC-COOKIE-1 4aa76f169d293e1ca2b88cbf2132c92a
mjolnir.ethz.ch/unix:0 MIT-MAGIC-COOKIE-1 4aa76f169d293e1ca2b88cbf2132c92a
So the first example is correct:
- if networking is disabled OR
- if no hostname is obtained from dhcp
But in the case this bug is about, we have networking enabled AND get a hostname back from dhcp and therefore xauth is configured incorrect.
What happens if you manually set the hostname restart the network service and then log in?
It doesn't matter who or what sets the hostname or when he/it does so - as long as Xorg is (re)started after it happens so xauth is set up for the correct hostname.
Actually, I'm not exactly sure whether the hostname being used depends on:
- the set hostname when xorg is started OR
- the set hostname when liveuser logs in
On the kde spin, the time difference between both is much smaller as in gdm. This is because kdm doesn't even show up, it jsut logs in liveuser while gdm first shows up and waits for the user to do something for quite a long time before logging him in automatically.
Does adding network.target to the "After=" line in /lib/systemd/system/prefdm.service
Workaround this broken behaviour?
The problem here is the hostname is being tied into the authentication for local system
I do belive the side affect of doing this is that the wait time == the response time of the dhcp server or the timeout value of the networking application waiting for a response from the dhcp server...
Lennart what's your take on doing this?
(In reply to comment #27)
> Does adding network.target to the "After=" line in
> Workaround this broken behaviour?
I tried NetworkManager.service, network.server and network.target - none helped.
(In reply to comment #29)
> I do belive the side affect of doing this is that the wait time == the response
> time of the dhcp server or the timeout value of the networking application
> waiting for a response from the dhcp server...
In my case that'd take something between 2 and 3 seconds according to my logs.
That's a working server right?
If we add any kind of "time" related option we are effectively slowing down the boot for everybody right not just those using dhcp+hostnames?
My understanding is that we normally just fix X to not die in this situation.
(Sorry, 'die' isn't the right word, but fix auth such that it doesn't start denying things if the hostname happens to change.)
ajax said something like that won't happen because there's no notification whenever there's a hostname change. I don't know details, though.
So, if i understand this correctly, then dhcp (started by NM) is changing the hostname and xauth gets confused by that since it X compares hostnames literally?
And the request is to fix that race by adding ordering deps to systemd's ruleset? Uh. I am not convinced that makes sense. Networks are dynamic these days, they come and go. Relying to be synchronized to the point where "network is up" is just backwards.
Why for heaven's sake does dhcp even change the local host name? Sounds like a really bad idea.
So there are two possible fixes here: fix dhcp so that it doesn't change the local host name.
Or fix X not to rely on the literal and dynamicly changing hostname for authentication. If it wants to do host based auth then it should use a static identifier for the host, such as /var/lib/dbus/machine_id. But really not the hostname. I mean, does this mean I can forge a DHCP packet to your machine with a bogus hostname in it and make all your local X clients fail to auth against your X server? This is just wrong.
I think trying to work around this brokeness by adding ordering deps between NM and X is the worst solution. It would be an ugly work-around.
Johann, for God's sake, PLEASE STOP - you're filling the report with spam and making it much harder for anyone to look at. We don't need a running commentary on your thoughts as you try to figure this bug out. If you come up with a sensible, working, and tested fix please attach it.
lennart: there wasn't actually a request, unless you count johann's stream-of-consciousness ramblings. I only assigned the bug to you as we'd pretty much ascertained it was due to an init race of some kind; we didn't yet make any determination what the best fix would be, but it seemed a good idea to have you involved in the discussion.
Sorry for my "stream-of-consciousness ramblings." we where going back on fourth digging into this..
Anyway move this under relevant X component.
given bill's and lennart's comments, re-assigning to xauth for now. adding dcbw, as networkmanager maintainer, to cc.
so the moving parts we've identified here:
with nice, fast, async startup thanks to systemd, some systems can reach a desktop - especially on livecds where autologin is used for login, which is kde, xfce and lxde at least - before NetworkManager completes DHCP negotiation. If the system hostname is changed by the DHCP negotiation, liveinst (and probably other apps we haven't found yet) will refuse to run as xauth is not satisfied thanks to the changed hostname.
#1 make xauth happy even if the hostname is changed by DHCP shortly after desktop login
#2 don't change the system hostname via DHCP at least by default
#2 has the obvious shortcoming that it won't help in cases where you really want that to happen (Sandro cites some corporate networks). I don't know the pros and cons to #1.
I probably don't understand all of this, but it seems that you should be able to use the loopback interface for local X connections instead of the public IP address?
From the alpha blocker meeting:
#agreed 679486 agreed not an alpha blocker due to ease of workaround and quite limited impact, agreed beta blocker and alpha NTH
Discussed at 2011-03-11 blocker review meeting. No real change here; we're just waiting on the developers of the components involved to agree on a fix.
Discussed at 2011-03-18 blocker review meeting. We're still waiting for development movement on this one. Can someone please decide on and implement a fix? Thanks. ajax, you're on the sharp end of this one currently ;)
This test build might help:
In which basically we have a "search harder" heuristic for finding auth cookies when making a local connection, that ignores the hostname.
Short of building an ISO with that package included, one could probably test this by booting a live image, installing the new package from a USB disk, then restarting the X session and plugging in the network and etc.
Discussed at the 2011-03-25 blocker review meeting. This remains a blocker, can reporters please test the build ajax provided in comment #43? Thanks.
Can't reproduce because the (KDE) LiveCD (Nightly from 2-3 days ago) no longer sets the hostname to the dhcp value, i.e. it sticks to localhost.localdomain and with that this bug is not reproducible.
Well, you'll have to try a new nightly with NetworkManager 0.9 (0.8.997, which just got pushed to stable) to see what happens there.
Discussed at the 2011-04-01 blocker review meeting. It appears that this issue may have gone away, but more testing is needed. Please re-test with either the beta TC1 images or the nightly images.
The update Kevin mentioned in comment 46 didn't change what I said in comment 45 but I can try with TC1 or the latest nightly once again on Monday.
The same as in comment 45 still applies to Beta TC1.
Can anybody else confirm this?
I propose dropping this to NTH for Beta; impact still seems uncertain and we didn't have a flood of people hitting it with Alpha. We never really justified it as a Beta blocker, just kicked it from Alpha to Beta.
(In reply to comment #51)
> I propose dropping this to NTH for Beta; impact still seems uncertain and we
> didn't have a flood of people hitting it with Alpha. We never really justified
> it as a Beta blocker, just kicked it from Alpha to Beta.
No objections. Should more recent information come back, we can certainly re-evaluate, but with the knowledge we have now, I'm comfortable with that approach. Additionally, the previously identified workaround is still "documentable" if needed.
seems like we dont have enough info here really. and its imapct doesnt seem to be huge as there has not been lots of reports. im ok with dropping as a blocker and evaluating further the real issue.
so that's three +1s for NTH, demoting. Kevin, I'm dropping it from the KDE blocker too: that causes it to block release, plus it's nothing KDE specific and not a KDE bug at all really.
Sorry, but I don't think demoting this to NTH makes any sense whatsoever. It is a release criterion for our spins to be installable. This bug was making the F15 Alpha KDE spin uninstallable for several people.
If this bug is really fixed as claimed by the original reporter, it should be closed, otherwise IMHO it should remain a blocker.
the criteria don't require it to be installable for absolutely everyone in all cases. if it's a corner case which hits only a few people, that does not break criteria. right now we have precisely one person who claims to still be hitting this, and we did not get a flood of people hitting it with the Alpha when it came out.
(In reply to comment #55)
> If this bug is really fixed as claimed by the original reporter, it should be
> closed, otherwise IMHO it should remain a blocker.
I have not claimed it to be fixed, I said it can't be reproduced because the circumstances changed. I actually wonder whether that change is a bug or intended...or rather whether it's a feature or a regression.
It's as simple as: the hostname is no longer set to the value obtained by dhcp (which it was before). As you could only hit this bug in a network where the dhcp server actually sends a hostname the bug can no longer be observed now.
This message is a notice that Fedora 15 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 15. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.
(Please note: Our normal process is to give advanced warning of this
occurring, but we forgot to do that. A thousand apologies.)
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen
this bug and simply change the 'version' to a later Fedora version.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we were unable to fix it before Fedora 15 reached end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" (top right of this page) and open it against that
version of Fedora.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Reopening. This is causing us bug 928279 (GNOME affected, KDE not yet tested). We might dupe that bug to this one, or vice versa, or just keep them separate for better readability.
So, to summarize comment #56 and comment #59, if KDE fails to install for some people, it's not a blocker, if GNOME does the same, it is? WTF?!
Blocker decisions are not set in stone. Now that we know the root cause, the workarounds and can estimate the number of affected people, the blocker status can be re-evaluated.
> Now that we know the root cause
What do we know now that we didn't know back on F15 (comment #38)?
kevin: #59 did not state that this bug is a blocker, and it hasn't been accepted as one. Kamil just re-opened it as we found it was (still/again) happening.
*** Bug 928279 has been marked as a duplicate of this bug. ***
Discussed at 2013-04-08 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-04-08/f19alpha-blocker-review-5.2013-04-08-16.01.log.txt , now we found out that 928279 was a dupe of this bug. We agreed that our decision from F15 timeframe still stands and this is not a blocker bug, as it only affects a fairly small range of users (cases where DHCP hostnames are used), and it's relatively easy to work around.
Note Dan Williams' comment from the dupe:
"Pavel's workarounds are correct; the real fix for this is to ensure that local X connections are *always* allowed, and to never rely on hostname-based authentication, because hostnames can change, and there are legitimate reasons for allowing the DHCP server to determine your hostname.
So the next step is to figure out what broken with the X local authentication workarounds that Fedora has had in place for years to always allow local connections regardless of hostname.
I don't think this is an Anaconda bug, but an xauth one."
Please propose this as a freeze exception for 19 Alpha if an xauth fix is found soon and it is not too invasive. By request of kparal, proposing as a Final blocker.
*** Bug 893218 has been marked as a duplicate of this bug. ***
I have built a custom LiveCD where I have pre-created /etc/hostname containing "localhost". This work-around fixes the problem, anaconda starts correctly on a such system. If the xauth fixes prove to be difficult and long-term, something like this could be a way to go.
If this is what I think it is, then maybe I _finally_ have willing testers for a patch I wrote eighteen months ago and have been begging for testers ever since.
Try this build:
(In reply to Adam Jackson from comment #68)
I have tested the new build , created a new LiveCD with it and anaconda still complains that it can't open the display :0 when started (provided that the dhcp hostname is changed after X starts).
 which is now available at https://koji.fedoraproject.org/koji/taskinfo?taskID=5400115
Discussed at 2013-05-29 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-05-29/f19final-blocker-review-1.2013-05-29-16.02.log.txt .
We ultimately didn't want to change from our existing position on this bug (see c#65), though it seems like a few people have been hitting this one lately. So it's still rejected as a blocker, accepted as a freeze exception issue.
Thought: running liveinst via sudo would work around this, right? We may need to do that anyway for F20: see https://bugzilla.redhat.com/show_bug.cgi?id=967385 , there is something of a push to make the root account on lives locked and have people use 'sudo' for root actions anyway. We could possibly change liveinst to run via sudo for f19 and that would 'solve' this, right? But I suppose it comes with the risk of breaking other things.
*** Bug 969676 has been marked as a duplicate of this bug. ***
Sorry to be a messenger of bad news. I've found out that this bug does not affect only users of networks with dhcp hostnames. It seems to affect _anybody_ who disables a network connection before starting anaconda. (No idea why, the hostname doesn't seem to change when a network connection is disabled).
1. Boot F19 TC5 Live (bare metal or VM) with network cable attached.
2. Once in GNOME, either disable network connection in NetworkManager or unplug the cable.
3. Start liveinst.
4. Anaconda doesn't appear, it's stuck in an endless loop.
5. In terminal you see:
> No protocol specified
> xhost: unable to open display ":0"
All these use cases are broken, I tested them with bare metal:
A) boot with cable -> unplug cable -> start Anaconda
B) boot without cable -> plug cable -> unplug cable -> start Anaconda
C) boot without cable -> plug cable -> unplug cable -> plug cable -> start Anaconda
D) boot without cable -> connect wifi -> disconnect wifi -> start Anaconda
E) boot without cable -> connect wifi -> disconnect wifi -> plug cable -> start Anaconda
Please properly note C) and E). Once you disconnect any device, it doesn't matter if you connect the same or other device, Anaconda won't start.
The following use cases work:
F) boot without cable -> start Anaconda
G) boot without cable -> plug cable -> start Anaconda
H) boot without cable -> connect wifi -> start Anaconda
I've played with the scripts a bit. I'm a layman in this area and I wasn't able to fix this using userhelper. However, any of these two approaches worked:
Fix1) Edit /sbin/liveinst and replace "xhost +si:localuser:root" with "sudo -u liveuser xhost +si:localuser:root".
Fix2) Replace /usr/bin/liveinst->consolehelper symlink with a script that calls "sudo /sbin/liveinst".
Since the number of affected users seem to be larger now, I'm re-proposing for yet another blocker discussion. The most obvious model situations are:
S1) Boot LiveCD with a network connection, then decide that you don't want the installer to download any files (updates) during the installation, so disable your network connection. (LiveCD doesn't download anything, but a lot of our users don't know that).
S2) Boot LiveCD and enable wifi, play with the system a bit. Then decide that cable is better for installation (faster download of updates), so disconnect wifi and connect cable.
S3) Boot LiveCD and connect wifi. Unfortunately your signal is weak and your wifi sometimes disconnects, so you need to reconnect every now and then. Experience this before running the installer.
Confirming that A, B and C do not work and that F works. However, according to my testing, G does not work.
(In reply to Martin Krizek from comment #73)
> However, according to my testing, G does not work.
Martin, I believe that is caused by the fact that you're on our corporate network, which uses dhcp hostnames. In that case this is also broken, correct. I was referring to a standard general users' home networks, without dhcp hostnames.
If you can, try again at home, thanks. Josef also promised to try to verify this.
At this point, I'm +1 blocker, I think. liveinst failing to run sucks, and the impact is now sufficiently broad. Thanks for the diligent testing, Kamil.
I'm going to give it a +1 blocker vote as well. Seems sufficiently impactful and unfriendly that it warrants blocker status.
The core problem seems to be that when pam tries to transfer the xauth cookie from liveuser it fails when the network is off. After adding debug to the pam.d modules I see this in /var/log/secure
Jun 18 14:41:00 localhost userhelper: pam_xauth(liveinst:session): requesting user 1000/1000, target user 0/0
Jun 18 14:41:00 localhost userhelper: pam_xauth(liveinst:session): /home/liveuser/.xauth/export does not exist, ignoring
Jun 18 14:41:00 localhost userhelper: pam_xauth(liveinst:session): /root/.xauth/import does not exist, ignoring
Jun 18 14:41:00 localhost userhelper: pam_xauth(liveinst:session): reading keys from `/run/gdm/auth-for-liveuser-JRYMRD/database'
Jun 18 14:41:00 localhost userhelper: pam_xauth(liveinst:session): running "/usr/bin/xauth -f /run/gdm/auth-for-liveuser-JRYMRD/database nlist :0" as 1000/1000
Jun 18 14:41:00 localhost userhelper: pam_xauth(liveinst:session): no key
When running xauth from the cmdline:
[root@localhost ~]# xauth -f /run/gdm/auth-for-liveuser-sFq8bu/database nlist
0100 0009 6c6f63616c686f7374 0001 30 0012 4d49542d4d414749432d434f4f4b49452d31 0010 73ffbc144bfcb3de5384228b45d8493f
ffff 0009 6c6f63616c686f7374 0001 30 0012 4d49542d4d414749432d434f4f4b49452d31 0010 73ffbc144bfcb3de5384228b45d8493f
But if you add :0 to that it returns nothing.
bcl mentioned on IRC last night that one of kparal's proposed fixes does not work for him:
18-06-2013 10:03:11 < email@example.com: sudo -u liveuser xhost +si:localuser:root would be fine with me
18-06-2013 10:05:58 < firstname.lastname@example.org: I'll give it a quick test and send up a patch
18-06-2013 10:20:17 < email@example.com: well that doesn't seem to work so well.
18-06-2013 10:20:43 > adamw: oh dear.
18-06-2013 10:24:35 < firstname.lastname@example.org: at least it's easy to reproduce.
that's the point at which he dived into a deeper examination of the issue, the results of which are in c#77.
Brian, I tried again and it works for me. What exactly doesn't work for you?
All I did:
1. boot Live
2. disable network
3. run terminal, login as root, edit /sbin/liveinst
4. edit line 120 and add "sudo -u liveuser" before xhost command
5. save, logout root
6. run liveinst as liveuser, see anaconda, profit
Do the same without editing the file and anaconda doesn't appear.
Discussed at 2013-06-19 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-06-19/f19final-blocker-review-7.2013-06-19-16.01.log.txt . Given that kparal identified more cases in which this bug happens, and we seem to be seeing more and more people hitting it lately (I can recall a half dozen that I've just casually noticed in IRC and mailing list posts during F19 cycle), we decided the impact of this now appears to be broad enough to accept it as a release blocker for F19.
(In reply to Kamil Páral from comment #79)
> Brian, I tried again and it works for me. What exactly doesn't work for you?
> All I did:
> 1. boot Live
> 2. disable network
> 3. run terminal, login as root, edit /sbin/liveinst
> 4. edit line 120 and add "sudo -u liveuser" before xhost command
> 5. save, logout root
> 6. run liveinst as liveuser, see anaconda, profit
> Do the same without editing the file and anaconda doesn't appear.
Don't run it from the terminal, use the welcome screen or the desktop icon.
*** Bug 956242 has been marked as a duplicate of this bug. ***
(In reply to Brian C. Lane from comment #81)
> Don't run it from the terminal, use the welcome screen or the desktop icon.
Ah, right. The following line added to /etc/sudoers fixes it:
> Defaults!/usr/bin/xhost !requiretty
Disclaimer: I'm not a security expert, I don't know the implications. But this can be easily changed just on the LiveCD overlay and not in the installed system.
Working around the xauth problem starts to feel a lot hackish, that's true.
Ajax, any chances to provide another fix for xauth?
(In reply to Kamil Páral from comment #83)
> (In reply to Brian C. Lane from comment #81)
> > Don't run it from the terminal, use the welcome screen or the desktop icon.
> Ah, right. The following line added to /etc/sudoers fixes it:
> > Defaults!/usr/bin/xhost !requiretty
> Disclaimer: I'm not a security expert, I don't know the implications. But
> this can be easily changed just on the LiveCD overlay and not in the
> installed system.
> Working around the xauth problem starts to feel a lot hackish, that's true.
Right. The real problem here is that we shouldn't need sudo -- the permissions are supposed to be handled by consolehelper, pam and xauth -- none of which should depend on the network being up or down.
f19 remix of desktop with gnome-classic-session and @sugar-desktop added to .ks:
logged in to live user in gnome-classic session:(VirtualBox with bridged networking)
Icon "Install to Hard Drive" gets no response; (In Terminal:) "liveinst" from user asks for password fails when enter no password; su has same behaviour; "sudo su" returns root terminal' "liveinst" starts anaconda. Is liveuser password (none) not available?
I know now why disconnecting a network connection breaks xauth. "hostnamectl" returns the same data ("localhost") in both cases, but "hostname" command returns "localhost" after boot, but "localhost.localdomain" after network device is disconnected. So it is essentially the same problem as with dhcp hostnames - the hostname is changed after session start.
(In reply to Kamil Páral from comment #67)
> I have built a custom LiveCD where I have pre-created /etc/hostname
> containing "localhost". This work-around fixes the problem, anaconda starts
> correctly on a such system. If the xauth fixes prove to be difficult and
> long-term, something like this could be a way to go.
Have we forgotten about this fix? I re-verified it, everything works properly if it is included. Here's a patch to spin-kickstarts:
diff --git a/fedora-live-base.ks b/fedora-live-base.ks
index f87373a..7227308 100644
@@ -219,6 +219,10 @@ FOE
chmod +x /sbin/halt.local
+# add static hostname to work around xauth bug
+echo "localhost" > /etc/hostname
# bah, hal starts way too late
I think this is the simplest fix we can have right now.
It seems pam_xauth is just not working (tested on Fedora-Live-Desktop-x86_64-19-TC6-1.iso: booted in a VM, then switched the wired interface off on the panel applet).
Unprivileged user's session; note the "localhost/unix:0" entry which is presumably hostname-independent. All three variables are set.
> [liveuser@localhost ~]$ echo $HOME $XAUTHORITY $DISPLAY; xauth list
> /home/liveuser /run/gdm/auth-for-liveuser-duYjGM/database :0
> localhost/unix:0 MIT-MAGIC-COOKIE-1 82d155f153d57ddb2876fed0f78f488f
> #ffff#6c6f63616c686f7374#:0 MIT-MAGIC-COOKIE-1 82d155f153d57ddb2876fed0f78f488f
(su -): New HOME, XAUTHORITY and DISPLAY not set at all (!?)
> [liveuser@localhost ~]$ su -
> [root@localhost ~]# echo $HOME $XAUTHORITY $DISPLAY; xauth list
> xauth: file /root/.Xauthority does not exist
(sudo -i); all variables preserved, point to the original database; does that make sense? (AFAICS (sudo) and (sudo -i) don't use pam_xauth at all).
> $ sudo -i sh
> sh-4.2# echo $HOME $XAUTHORITY $DISPLAY; xauth list
> /root /run/gdm/auth-for-liveuser-duYjGM/database :0
> localhost/unix:0 MIT-MAGIC-COOKIE-1 82d155f153d57ddb2876fed0f78f488f
> #ffff#6c6f63616c686f7374#:0 MIT-MAGIC-COOKIE-1 82d155f153d57ddb2876fed0f78f488f
consolehelper: modified /susr/sbin/liveinst to dump the same information:
> $ head -n 3 /usr/sbin/liveinst
> echo $HOME $XAUTHORITY $DISPLAY; xauth list
result: DISPLAY set, but not XAUTHORITY
> $ liveinst
> /root :0
> xauth: file /root/.Xauthority does not exist
With both (su -) and consolehelper, the debug output of pam_xauth includes a "no key" message that Brian quoted in comment #77; the fact that "xauth ... :0" does not list any mathing key seems to be the approximate root cause of the problem here.
Can we please have thoughts from everyone about whether https://bugzilla.redhat.com/show_bug.cgi?id=679486#c86 would be a sufficient fix for the biggest practical bug here - liveinst failing to run in various circumstances - for F19 Final, for which the go/no-go meeting is *in 3 days*? Thanks.
From my today's communication with ajax:
[16:32] <jreznik> ping, do you think you can do more with https://bugzilla.redhat.com/show_bug.cgi?id=679486 now or we should go with workaround kamil mentioned in #86?
[18:02] <ajax> i'm probably not going to have time to look into it today
[18:02] <ajax> the workaround seems reasonable, though i'd hope there's a better way
(In reply to Kamil Páral from comment #86)
> (In reply to Kamil Páral from comment #67)
> > ... I have pre-created /etc/hostname
> > containing "localhost". This work-around fixes the problem, anaconda starts
> > correctly on a such system. If the xauth fixes prove to be difficult and
> > long-term, something like this could be a way to go.
> Have we forgotten about this fix? I re-verified it, everything works
> properly if it is included. Here's a patch to spin-kickstarts:
> +# add static hostname to work around xauth bug
> +# https://bugzilla.redhat.com/show_bug.cgi?id=679486
> +echo "localhost" > /etc/hostname
> I think this is the simplest fix we can have right now.
Thanks for pointing that out, Kamil.
livecd-tools used to create /etc/hostname, but this commit appears to have changed that:
write hostname to /etc/hostname (#870805)
author Brian C. Lane <email@example.com> 2012-12-04 18:52:19 (GMT)
Bug 870805 - imgcreate needs update for sysconfig changes (systemd >= 195)
Verified fixes with RC1.
For the record, I'm waiting till we have a spin-kickstarts package build to close all these bugs fixed in spin-kickstarts; seems like a good way to make sure we get a spin-kickstarts build for final.
spin-kickstarts-0.19.7-1.fc19 has been submitted as an update for Fedora 19.
spin-kickstarts-0.19.8-1.fc19 has been submitted as an update for Fedora 19.
spin-kickstarts-0.19.8-1.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
I'm reopening this bug because master and f20 branches don't contain patch from comment 91, so the bug is present in F20 Alpha TC1 again.
Adjusting the fields, and proposing for Alpha blocker discussion. See bug summary in comment 72. It was accepted as F19 Final Blocker, because we found out about additional use cases very late in the cycle. We might want to block earlier this time.
I have verified that the /etc/hostname fix still works. It just needs to be included in spin-kickstarts, master and f20.
Discussed at 2013-08-28 blocker review meeting . This is rejected as an Alpha blocker, but accepted as an Alpha Freeze Exception. It doesn't violate any F20 alpha release criteria but a tested fix would be considered after freeze.
Committed to f20 and master:
I can verified that the fix solves problem on my machine.
This bug is fixed in spin-kickstarts-0.20.18-1.fc20. Closing bug.