679486 – Liveinst doesn't start if hostname changes

Bug 679486 - Liveinst doesn't start if hostname changes

Summary: Liveinst doesn't start if hostname changes

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	xorg-x11-xauth
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Adam Jackson
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	https://fedoraproject.org/wiki/Common...
Duplicates (4):	893218 928279 956242 969676 (view as bug list)
Depends On:
Blocks:	F20AlphaFreezeException F20BetaBlocker
TreeView+	depends on / blocked

Reported:	2011-02-22 17:30 UTC by James Laska
Modified:	2016-08-31 10:56 UTC (History)
CC List:	41 users (show)
Fixed In Version:	spin-kickstarts-0.20.18-1.fc20
Clone Of:	663294
Environment:
Last Closed:	2013-09-13 12:15:06 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	928279	0	unspecified	CLOSED	anaconda doesn't start on LiveCD (in Gnome)	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1370222	0	unspecified	CLOSED	session/apps fail to start if hostname changes during boot due to network (infamous xauth issue)	2021-02-22 00:41:40 UTC

Internal Links: 928279 1370222

Description James Laska 2011-02-22 17:30:21 UTC

+++ This bug was initially created as a clone of Bug #663294 +++

--- Additional comment from sm on 2011-02-22 09:35:04 EST ---
> Can't confirm fix with Alpha RC1 KDE (anaconda-15.20.1-1.fc15.x86_64).
>
> On a freshly booted system:
>
> $ liveinst
> No protocol specified
> xhost:  unable to open display ":0"
> 14:28:44 Starting graphical installation.
> No protocol specified
> 14:28:44 Exception starting GUI installer: could not open display
> 14:28:44 GUI installer startup failed, falling back to text mode.
>
> So the patch actually made things worse :)
>
> No idea why liveinst fails to perform the xhost command though, as I can run it manually:
>
> $ xhost
> access control enabled, only authorized clients can connect
> SI:localuser:liveuser
> $ xhost +
> access control disabled, clients can connect from any host
>
> I haven't tried any other Live iso yet.

--- Additional comment from jlaska on 2011-02-22 10:30:56 EST ---

Confirmed Sandro's behavior on the KDE RC1 live images.  On 2 of 6 boot attempts, running liveinst starts the graphical installer.  The 4 other attempts, running liveinst fails to start the graphical installer, and launches the text-mode installer.  If you are clicking on the desktop icon to start liveinst, you'll never see the text-mode installer.

Running the following before starting liveinst works
# xhost +

In the 4 failure attempts, it seems that the XAUTHORITY ENV variable isn't being passed to the root user.  I think this means the root user won't be able to connect to the liveinst X session.  In the 2 successful attempts, when I `su` to root, I see an XAUTHORITY ENV variable.

So far, I've not managed to reproduce this failure on the GNOME live image.  If this doesn't impact the GNOME image, we may want to track this as a different bug report, and leave this issue in VERIFIED (as the original problem has been addressed).

Comment 1 Adam Williamson 2011-02-22 19:00:20 UTC

Discussed out-of-meeting-cycle with jlaska. Our call for now is acceptedblocker, on the basis of our historical practice of taking KDE bugs as blockers. If we take KDE bugs as blockers, this obviously hits the criterion "The installer must boot (if appropriate) and run on all primary architectures from default live image, DVD, and boot.iso install media ".

Comment 2 James Laska 2011-02-23 19:15:54 UTC

I just retested and discovered that 10 of 10 KDE live image boots were problem free.  There must be something related to timing, or network connectivity.  Either way, I can't reproduce it at all today.

Unless someone else is seeing this problem on a regular basis ... it's time to CommonBugs this issue, imo.

Comment 3 James Laska 2011-02-23 20:11:59 UTC

I'm removing F15Alpha and whiteboard:AcceptedBlocker from this issue.  Both bcl and I have tested the heck out of this issue.  I definitely did encounter this failure yesterday, but have yet to hit it again today.  If this issue is this reproducer, I don't think that's enough to qualify as a blocker for the Alpha release.  Should we know more about this issue later on, it's fairly straightforward to document the workaround, and we'll have updated nightly live images to fall-back to as well.

Comment 4 Sandro Mathys 2011-02-23 22:51:24 UTC

Removing the blocker because there's a workaround (xhost +) is fine with me...but closing notabug without even asking the original reporter (me) because he's afk for a mere 5-6h is not.

I've reproduced this a couple of times and I'm pretty sure I can do so again. I also think it's a race condition (and I said so in IRC before) and it's well-known those can be very tricky to reproduce and debug. I'll try to provide more of what I see soon.

Comment 5 v.plessky 2011-02-24 07:21:12 UTC

I got this bug with KDE 20110223 LiveCD image yesterday.
Hardware: Acer 6935G notebook
Smolt profile: http://www.smolts.org/client/show/pub_55b08fdf-dd18-48d6-90b6-577bcb8da55e

Comment 6 v.plessky 2011-02-24 12:39:56 UTC

Was able to boot with x86-64 version of GNOME LiveCD 20110224 on same hardware (Acer 6935G).
Smolt profile: 
http://www.smolts.org/client/show/pub_5593f24c-06dd-4111-b994-1944b03218cd

But Install to HDD also doens't work - similar to KDE i686 LiveCD 20110223.

Comment 7 Sandro Mathys 2011-02-24 13:59:15 UTC

The root cause and the reason why this is mostly seen on the kde but not on the gnome spin is as easy as: kdm is quicker to start than gdm - or rather the part that does the xauth stuff is quicker in kdm than in gdm.

Therefore, xauth will use localhost.localdomain with kdm while the dhcp hostname is used with gdm. Later on, liveinst will use the dhcp hostname in either case but once (kdm) failing because of the wrongish xauth entry and once (gdm) successful because the entry is correct.

That said, there's certain additional points:
- On a fast system, gdm might be 'too quick' as well
- On a slow system, kdm might be 'slow enough' as well
- If there's no dhcp hostname to set, it will work no matter how slow or quick kdm/gdm is

In case the difference has not been clear, here's the different log on process.

WORKS:
1) hostname = localhost.localdomain
2) hostname = dhcp hostname
3) kdm/gdm is started: xauth uses dhcp hostname
4) liveinst is started on dhcp hostname and allowed by xauth

FAILS:
1) hostname = localhost.localdomain
2) kdm/gdm is started: xauth uses localhost.localdomain
3) hostname = dhcp hostname
4) liveinst is started on dhcp hostname and denied by xauth

To verify my findings, I made two different approaches:
1) boot to runlevel 3 (login prompt will say localhost), login as root, verify that hostname is now set to the dhcp hostname, init 5, verify that xauth is set up correctly -> liveinst works 
2) boot and edit /etc/X11/prefdm to add a "sleep 5" before the the dm is started, reboot to runlevel 5, verify that xauth is set up correctly -> liveinst works

So the solution that has to be found here is to make sure setting the hostname to dhcp is finished before prefdm is executed.

As this issue is still reproducible and could effect every spin - probably depending on hardware and network (i.e. fast vs slow) - I'm again to propose an alpha blocker on this to have it at least discussed on Friday's blocker bug meeting.

Comment 8 v.plessky 2011-02-24 14:25:56 UTC

well, I executed # xhost + 
and launched #liveinst

I was able to enter root password.

But later, when I tried to make disk partitioning, I got a bug similar to Bug 679125.
- AttributeError: 'NoneType' object has no attribute 'format'
https://bugzilla.redhat.com/show_bug.cgi?id=679125

So I was not able to proceed with install further.

Any ide how to overcome this?
Or current Rawhide (20110224) is "not installable"?
I was booting using Desktop-x86_64 version.

Comment 9 Sandro Mathys 2011-02-24 15:04:28 UTC

v.plessky, please note that this is not a support forum. If you find further bugs, please report them separately and if you need support make use of the appropriate support channels. Thank you.

Comment 10 v.plessky 2011-02-24 15:26:49 UTC

Sandro Mathys, there is a bug with current ISO image, and I reported it.
Installer doesn't work neither in GNOME nor in KDE versions of LiveCD.

And yes, I can wait until it's fixed.

Comment 11 Adam Williamson 2011-02-24 17:21:24 UTC

one bug per report, please!

nice detective job, hellmuth^H^H^H^H^H^H^H^H sandro :) 

so this becomes a systemd-units issue, I believe, as that's where prefdm.service lives. I agree with re-considering it as a blocker. Fedora can boot very fast on SSDs in particular; I wouldn't be surprised if this affects quite a few SSD-based systems.

Comment 12 Brian Lane 2011-02-24 17:45:26 UTC

(In reply to comment #11)
> one bug per report, please!
> 
> nice detective job, hellmuth^H^H^H^H^H^H^H^H sandro :) 
> 
> so this becomes a systemd-units issue, I believe, as that's where
> prefdm.service lives. I agree with re-considering it as a blocker. Fedora can
> boot very fast on SSDs in particular; I wouldn't be surprised if this affects
> quite a few SSD-based systems.

Remember that this is limited to DHCP hostname updates, I'm not sure how many actually use that feature. That's why I wasn't able to reproduce it.

Comment 13 Adam Williamson 2011-02-24 21:47:00 UTC

I believe it'll happen with most systems directly connected to a consumer ISP (rather than going through a router), right?

Comment 14 Sandro Mathys 2011-02-24 21:52:49 UTC

Yes, and I'd say systems in a corporate network as well. At least that's my use case here :)

Comment 15 Lennart Poettering 2011-02-25 00:58:57 UTC

wassup? Can somebody please explain what this is a about? And what I am supposed to fix in systemd?

Comment 16 John Reiser 2011-02-25 01:30:21 UTC

(In reply to comment #15)
>  And what I am supposed to fix in systemd?

Comment #7 has the detailed analysis.  The crux of the problem is an un-anticipated dependency that seems to be difficult to accomplish:
> So the solution that has to be found here is to make sure setting the hostname
> to [the one determined by] dhcp is finished before prefdm is executed.

Please look at that, and give an opinion about how it should be done, whether it is the right thing to do, and ideally suggest+explain an explicit fix [real code.]

Comment 17 Jóhann B. Guðmundsson 2011-02-25 11:00:34 UTC

You seem to be way of track here.

This is an bug in anaconda/liveinst.

And this seems to be 2 seperated bugs one regarding X and another regarding dhcp hostnames.

We had similar issues with firstboot when it got started with deprecated X options  

http://git.fedorahosted.org/git/?p=firstboot.git;a=commitdiff;h=caf15bec53fc846829a5e3d448b53ec7718dbf6d

And if I can recall correctly there is a dhcp hostname bug in anaconda which I cant find at the moment but we discussed during blocker bug meeting which had to do with dhcp and if memory servers me correct we accepted it localhost.localdomain as the valid workaround for that one ( as in not use the supplied domain name ).

Anyway it looks like systemd has exposed potential network depending bug and I dont think it has to do with systemd it self.

Comment 18 Jóhann B. Guðmundsson 2011-02-25 11:39:20 UTC

Does not running xhost + indicates that.

A) the localhost/localhost.localdomain.does not have access
B) the dhcp supplied domain name does not have access 
C) the running user ( live user or whatever user liveinst runs as ) is not allowed to connect to the X server

Comment 19 Jóhann B. Guðmundsson 2011-02-25 11:58:31 UTC

So basically the problem is that xauth will be set up for localhost and when the hostname is then changed to the value obtained from dhcp $USERS won't be allowed to connect to the X server because xauth doesn't have a rule for the "new hostname"

Comment 20 Sandro Mathys 2011-02-25 12:02:15 UTC

Which is basically, what I wrote in comment #7.

So I did some further debugging, too. I've found in the syslog, that NetworkManager is the service that's re-setting the hostname to the value obtained from dhcp. So I tried to add NetworkManager.service to the After: line in prefdm.service but that didn't change anything. So it looks like NM is returning to systemd before it tries to obtain a hostname from dhcp. Probably no bug but a feature (i.e. normally there's no reason to wait for this).

That's where I lack the detail knowledge to debug this further. To me it looks like all components do what they're supposed to do but that just doesn't fit together.

Comment 21 Jóhann B. Guðmundsson 2011-02-25 12:08:30 UTC

And the only logical resolution of this problem to me is that NetworkManager or network basically what ever application controls the networking runs "xhost +my-dhcp-optain-domain-name" after it has received it right?

Or the application always tries to first authenticate towards X as user@localhost

Comment 22 Jóhann B. Guðmundsson 2011-02-25 13:10:57 UTC

I just want to say that requiring domain name for local users is just makes absolutely no sense to me and I'm pretty sure xhost +$domain was designed to be used for external domains only not localdomain/localhost nor a domain either set by dhcp or manually by the end user so there is something wrong with the actual liveuser 

What does xhost say on a logged in user on the livecd?

Should be something like.. 

access control enabled, only authorized clients can connect
SI:localuser:liveuser

Comment 23 Jóhann B. Guðmundsson 2011-02-25 13:27:23 UTC

And what does xauth list say with and without networking disabled.

Comment 24 Sandro Mathys 2011-02-25 13:46:48 UTC

[liveuser@mjolnir ~]$ xauth list
localhost.localdomain/unix:0  MIT-MAGIC-COOKIE-1  d216855dcf353d8552cd351c7feac327

instead of

[liveuser@mjolnir ~]$ xauth list
mjolnir.ethz.ch:0  MIT-MAGIC-COOKIE-1  4aa76f169d293e1ca2b88cbf2132c92a
[fe80::21a:a0ff:feea:7dda]:0  MIT-MAGIC-COOKIE-1  4aa76f169d293e1ca2b88cbf2132c92a
mjolnir.ethz.ch/unix:0  MIT-MAGIC-COOKIE-1  4aa76f169d293e1ca2b88cbf2132c92a

So the first example is correct:
- if networking is disabled OR
- if no hostname is obtained from dhcp

But in the case this bug is about, we have networking enabled AND get a hostname back from dhcp and therefore xauth is configured incorrect.

Comment 25 Jóhann B. Guðmundsson 2011-02-25 14:00:36 UTC

What happens if you manually set the hostname restart the network service and then log in?

Comment 26 Sandro Mathys 2011-02-25 14:21:52 UTC

It doesn't matter who or what sets the hostname or when he/it does so - as long as Xorg is (re)started after it happens so xauth is set up for the correct hostname.

Actually, I'm not exactly sure whether the hostname being used depends on:
- the set hostname when xorg is started OR
- the set hostname when liveuser logs in

On the kde spin, the time difference between both is much smaller as in gdm. This is because kdm doesn't even show up, it jsut logs in liveuser while gdm first shows up and waits for the user to do something for quite a long time before logging him in automatically.

Comment 27 Jóhann B. Guðmundsson 2011-02-25 14:25:59 UTC

Does adding network.target to the "After=" line in /lib/systemd/system/prefdm.service

Workaround this broken behaviour?

Comment 28 Jóhann B. Guðmundsson 2011-02-25 14:27:59 UTC

The problem here is the hostname is being tied into the authentication for local system

Comment 29 Jóhann B. Guðmundsson 2011-02-25 14:47:48 UTC

I do belive the side affect of doing this is that the wait time == the response time of the dhcp server or the timeout value of the networking application waiting for a response from the dhcp server...

Lennart what's your take on doing this?

Comment 30 Sandro Mathys 2011-02-25 15:11:24 UTC

(In reply to comment #27)
> Does adding network.target to the "After=" line in
> /lib/systemd/system/prefdm.service
> 
> Workaround this broken behaviour?

I tried NetworkManager.service, network.server and network.target - none helped.

(In reply to comment #29)
> I do belive the side affect of doing this is that the wait time == the response
> time of the dhcp server or the timeout value of the networking application
> waiting for a response from the dhcp server...

In my case that'd take something between 2 and 3 seconds according to my logs.

Comment 31 Jóhann B. Guðmundsson 2011-02-25 15:38:24 UTC

That's a working server right?

If we add any kind of "time" related option we are effectively slowing down the boot for everybody right not just those using dhcp+hostnames?

Comment 32 Bill Nottingham 2011-02-25 15:54:54 UTC

My understanding is that we normally just fix X to not die in this situation.

Comment 33 Bill Nottingham 2011-02-25 16:01:30 UTC

(Sorry, 'die' isn't the right word, but fix auth such that it doesn't start denying things if the hostname happens to change.)

Comment 34 Sandro Mathys 2011-02-25 16:17:25 UTC

ajax said something like that won't happen because there's no notification whenever there's a hostname change. I don't know details, though.

Comment 35 Lennart Poettering 2011-02-25 16:18:19 UTC

So, if i understand this correctly, then dhcp (started by NM) is changing the hostname and xauth gets confused by that since it X compares hostnames literally?

And the request is to fix that race by adding ordering deps to systemd's ruleset? Uh. I am not convinced that makes sense. Networks are dynamic these days, they come and go. Relying to be synchronized to the point where "network is up" is just backwards.

Why for heaven's sake does dhcp even change the local host name? Sounds like a really bad idea.

So there are two possible fixes here: fix dhcp so that it doesn't change the local host name.

Or fix X not to rely on the literal and dynamicly changing hostname for authentication. If it wants to do host based auth then it should use a static identifier for the host, such as /var/lib/dbus/machine_id. But really not the hostname. I mean, does this mean I can forge a DHCP packet to your machine with a bogus hostname in it and make all your local X clients fail to auth against your X server? This is just wrong.

I think trying to work around this brokeness by adding ordering deps between NM and X is the worst solution. It would be an ugly work-around.

Comment 36 Adam Williamson 2011-02-25 16:24:36 UTC

Johann, for God's sake, PLEASE STOP - you're filling the report with spam and making it much harder for anyone to look at. We don't need a running commentary on your thoughts as you try to figure this bug out. If you come up with a sensible, working, and tested fix please attach it.

lennart: there wasn't actually a request, unless you count johann's stream-of-consciousness ramblings. I only assigned the bug to you as we'd pretty much ascertained it was due to an init race of some kind; we didn't yet make any determination what the best fix would be, but it seemed a good idea to have you involved in the discussion.

Comment 37 Jóhann B. Guðmundsson 2011-02-25 16:37:10 UTC

Sorry for my "stream-of-consciousness ramblings." we where going back on fourth digging into this.. 

Anyway move this under relevant X component.

Comment 38 Adam Williamson 2011-02-25 16:45:24 UTC

given bill's and lennart's comments, re-assigning to xauth for now. adding dcbw, as networkmanager maintainer, to cc.

so the moving parts we've identified here:

with nice, fast, async startup thanks to systemd, some systems can reach a desktop - especially on livecds where autologin is used for login, which is kde, xfce and lxde at least - before NetworkManager completes DHCP negotiation. If the system hostname is changed by the DHCP negotiation, liveinst (and probably other apps we haven't found yet) will refuse to run as xauth is not satisfied thanks to the changed hostname.

Suggested fixes:

#1 make xauth happy even if the hostname is changed by DHCP shortly after desktop login

#2 don't change the system hostname via DHCP at least by default

#2 has the obvious shortcoming that it won't help in cases where you really want that to happen (Sandro cites some corporate networks). I don't know the pros and cons to #1.

Comment 39 Bruno Wolff III 2011-02-25 17:58:34 UTC

I probably don't understand all of this, but it seems that you should be able to use the loopback interface for local X connections instead of the public IP address?

Comment 40 Bruno Wolff III 2011-02-25 18:00:45 UTC

From the alpha blocker meeting:
#agreed 679486 agreed not an alpha blocker due to ease of workaround and quite limited impact, agreed beta blocker and alpha NTH

Comment 41 Adam Williamson 2011-03-11 18:57:50 UTC

Discussed at 2011-03-11 blocker review meeting. No real change here; we're just waiting on the developers of the components involved to agree on a fix.

Comment 42 Adam Williamson 2011-03-18 18:37:58 UTC

Discussed at 2011-03-18 blocker review meeting. We're still waiting for development movement on this one. Can someone please decide on and implement a fix? Thanks. ajax, you're on the sharp end of this one currently ;)

Comment 43 Adam Jackson 2011-03-22 03:42:53 UTC

This test build might help:

http://kojipkgs.fedoraproject.org/scratch/ajax/task_2930432/

In which basically we have a "search harder" heuristic for finding auth cookies when making a local connection, that ignores the hostname.

Short of building an ISO with that package included, one could probably test this by booting a live image, installing the new package from a USB disk, then restarting the X session and plugging in the network and etc.

Comment 44 Adam Williamson 2011-03-25 17:51:45 UTC

Discussed at the 2011-03-25 blocker review meeting. This remains a blocker, can reporters please test the build ajax provided in comment #43? Thanks.

Comment 45 Sandro Mathys 2011-03-28 07:14:15 UTC

Can't reproduce because the (KDE) LiveCD (Nightly from 2-3 days ago) no longer sets the hostname to the dhcp value, i.e. it sticks to localhost.localdomain and with that this bug is not reproducible.

Comment 46 Kevin Kofler 2011-03-28 21:55:19 UTC

Well, you'll have to try a new nightly with NetworkManager 0.9 (0.8.997, which just got pushed to stable) to see what happens there.

Comment 47 Tim Flink 2011-04-01 18:30:13 UTC

Discussed at the 2011-04-01 blocker review meeting. It appears that this issue may have gone away, but more testing is needed. Please re-test with either the beta TC1 images or the nightly images.

Comment 48 Sandro Mathys 2011-04-01 21:03:43 UTC

The update Kevin mentioned in comment 46 didn't change what I said in comment 45 but I can try with TC1 or the latest nightly once again on Monday.

Comment 49 Sandro Mathys 2011-04-04 08:11:58 UTC

The same as in comment 45 still applies to Beta TC1.

Comment 50 Kevin Kofler 2011-04-04 08:18:50 UTC

Can anybody else confirm this?

Comment 51 Adam Williamson 2011-04-07 18:55:37 UTC

I propose dropping this to NTH for Beta; impact still seems uncertain and we didn't have a flood of people hitting it with Alpha. We never really justified it as a Beta blocker, just kicked it from Alpha to Beta.

Comment 52 James Laska 2011-04-07 18:59:50 UTC

(In reply to comment #51)
> I propose dropping this to NTH for Beta; impact still seems uncertain and we
> didn't have a flood of people hitting it with Alpha. We never really justified
> it as a Beta blocker, just kicked it from Alpha to Beta.

No objections.  Should more recent information come back, we can certainly re-evaluate, but with the knowledge we have now, I'm comfortable with that approach.  Additionally, the previously identified workaround is still "documentable" if needed.

Comment 53 Dennis Gilmore 2011-04-07 19:13:28 UTC

seems like we dont have enough info here really. and its imapct doesnt seem to be huge as there has not been lots of reports.  im ok with dropping as a blocker and evaluating further the real issue.

Comment 54 Adam Williamson 2011-04-07 19:17:47 UTC

so that's three +1s for NTH, demoting. Kevin, I'm dropping it from the KDE blocker too: that causes it to block release, plus it's nothing KDE specific and not a KDE bug at all really.

Comment 55 Kevin Kofler 2011-04-07 19:31:48 UTC

Sorry, but I don't think demoting this to NTH makes any sense whatsoever. It is a release criterion for our spins to be installable. This bug was making the F15 Alpha KDE spin uninstallable for several people.

If this bug is really fixed as claimed by the original reporter, it should be closed, otherwise IMHO it should remain a blocker.

Comment 56 Adam Williamson 2011-04-07 19:42:41 UTC

the criteria don't require it to be installable for absolutely everyone in all cases. if it's a corner case which hits only a few people, that does not break criteria. right now we have precisely one person who claims to still be hitting this, and we did not get a flood of people hitting it with the Alpha when it came out.

Comment 57 Sandro Mathys 2011-04-07 20:06:17 UTC

(In reply to comment #55)
> If this bug is really fixed as claimed by the original reporter, it should be
> closed, otherwise IMHO it should remain a blocker.

I have not claimed it to be fixed, I said it can't be reproduced because the circumstances changed. I actually wonder whether that change is a bug or intended...or rather whether it's a feature or a regression.

It's as simple as: the hostname is no longer set to the value obtained by dhcp (which it was before). As you could only hit this bug in a network where the dhcp server actually sends a hostname the bug can no longer be observed now.

Comment 58 Fedora End Of Life 2012-08-07 20:10:29 UTC

This message is a notice that Fedora 15 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 15. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we were unable to fix it before Fedora 15 reached end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" (top right of this page) and open it against that
version of Fedora.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 59 Kamil Páral 2013-04-08 13:54:10 UTC

Reopening. This is causing us bug 928279 (GNOME affected, KDE not yet tested). We might dupe that bug to this one, or vice versa, or just keep them separate for better readability.

Comment 60 Kevin Kofler 2013-04-08 15:50:33 UTC

So, to summarize comment #56 and comment #59, if KDE fails to install for some people, it's not a blocker, if GNOME does the same, it is? WTF?!

Comment 61 Kamil Páral 2013-04-08 16:24:21 UTC

Blocker decisions are not set in stone. Now that we know the root cause, the workarounds and can estimate the number of affected people, the blocker status can be re-evaluated.

Comment 62 Kevin Kofler 2013-04-08 16:46:18 UTC

> Now that we know the root cause

What do we know now that we didn't know back on F15 (comment #38)?

Comment 63 Adam Williamson 2013-04-08 17:25:18 UTC

kevin: #59 did not state that this bug is a blocker, and it hasn't been accepted as one. Kamil just re-opened it as we found it was (still/again) happening.

Comment 64 Adam Williamson 2013-04-08 17:31:05 UTC

*** Bug 928279 has been marked as a duplicate of this bug. ***

Comment 65 Adam Williamson 2013-04-08 17:33:15 UTC

Discussed at 2013-04-08 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-04-08/f19alpha-blocker-review-5.2013-04-08-16.01.log.txt , now we found out that 928279 was a dupe of this bug. We agreed that our decision from F15 timeframe still stands and this is not a blocker bug, as it only affects a fairly small range of users (cases where DHCP hostnames are used), and it's relatively easy to work around.

Note Dan Williams' comment from the dupe:

"Pavel's workarounds are correct; the real fix for this is to ensure that local X connections are *always* allowed, and to never rely on hostname-based authentication, because hostnames can change, and there are legitimate reasons for allowing the DHCP server to determine your hostname.

So the next step is to figure out what broken with the X local authentication workarounds that Fedora has had in place for years to always allow local connections regardless of hostname.

I don't think this is an Anaconda bug, but an xauth one."

Please propose this as a freeze exception for 19 Alpha if an xauth fix is found soon and it is not too invasive. By request of kparal, proposing as a Final blocker.

Comment 66 Jaroslav Reznik 2013-04-08 17:33:16 UTC

*** Bug 893218 has been marked as a duplicate of this bug. ***

Comment 67 Kamil Páral 2013-04-11 08:27:32 UTC

I have built a custom LiveCD where I have pre-created /etc/hostname containing "localhost". This work-around fixes the problem, anaconda starts correctly on a such system. If the xauth fixes prove to be difficult and long-term, something like this could be a way to go.

Comment 68 Adam Jackson 2013-04-11 17:06:23 UTC

If this is what I think it is, then maybe I _finally_ have willing testers for a patch I wrote eighteen months ago and have been begging for testers ever since.

Try this build:

http://koji.fedoraproject.org/koji/taskinfo?taskID=5243120

Comment 69 Kamil Páral 2013-05-21 07:45:11 UTC

(In reply to Adam Jackson from comment #68)
I have tested the new build [1], created a new LiveCD with it and anaconda still complains that it can't open the display :0 when started (provided that the dhcp hostname is changed after X starts).

[1] which is now available at https://koji.fedoraproject.org/koji/taskinfo?taskID=5400115

Comment 70 Adam Williamson 2013-05-29 19:16:58 UTC

Discussed at 2013-05-29 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-05-29/f19final-blocker-review-1.2013-05-29-16.02.log.txt .

We ultimately didn't want to change from our existing position on this bug (see c#65), though it seems like a few people have been hitting this one lately. So it's still rejected as a blocker, accepted as a freeze exception issue.

Thought: running liveinst via sudo would work around this, right? We may need to do that anyway for F20: see https://bugzilla.redhat.com/show_bug.cgi?id=967385 , there is something of a push to make the root account on lives locked and have people use 'sudo' for root actions anyway. We could possibly change liveinst to run via sudo for f19 and that would 'solve' this, right? But I suppose it comes with the risk of breaking other things.

Comment 71 Michael Catanzaro 2013-06-05 04:43:33 UTC

*** Bug 969676 has been marked as a duplicate of this bug. ***

Comment 72 Kamil Páral 2013-06-18 13:02:49 UTC

Sorry to be a messenger of bad news. I've found out that this bug does not affect only users of networks with dhcp hostnames. It seems to affect _anybody_ who disables a network connection before starting anaconda. (No idea why, the hostname doesn't seem to change when a network connection is disabled).

Reproducer:
1. Boot F19 TC5 Live (bare metal or VM) with network cable attached.
2. Once in GNOME, either disable network connection in NetworkManager or unplug the cable.
3. Start liveinst.
4. Anaconda doesn't appear, it's stuck in an endless loop.
5. In terminal you see:
> No protocol specified
> xhost:  unable to open display ":0"

All these use cases are broken, I tested them with bare metal:
A) boot with cable -> unplug cable -> start Anaconda
B) boot without cable -> plug cable -> unplug cable -> start Anaconda
C) boot without cable -> plug cable -> unplug cable -> plug cable -> start Anaconda
D) boot without cable -> connect wifi -> disconnect wifi -> start Anaconda
E) boot without cable -> connect wifi -> disconnect wifi -> plug cable -> start Anaconda

Please properly note C) and E). Once you disconnect any device, it doesn't matter if you connect the same or other device, Anaconda won't start.

The following use cases work:
F) boot without cable -> start Anaconda
G) boot without cable -> plug cable -> start Anaconda
H) boot without cable -> connect wifi -> start Anaconda


I've played with the scripts a bit. I'm a layman in this area and I wasn't able to fix this using userhelper. However, any of these two approaches worked:

Fix1) Edit /sbin/liveinst and replace "xhost +si:localuser:root" with "sudo -u liveuser xhost +si:localuser:root".

Fix2) Replace /usr/bin/liveinst->consolehelper symlink with a script that calls "sudo /sbin/liveinst".


Since the number of affected users seem to be larger now, I'm re-proposing for yet another blocker discussion. The most obvious model situations are:
S1) Boot LiveCD with a network connection, then decide that you don't want the installer to download any files (updates) during the installation, so disable your network connection. (LiveCD doesn't download anything, but a lot of our users don't know that).
S2) Boot LiveCD and enable wifi, play with the system a bit. Then decide that cable is better for installation (faster download of updates), so disconnect wifi and connect cable.
S3) Boot LiveCD and connect wifi. Unfortunately your signal is weak and your wifi sometimes disconnects, so you need to reconnect every now and then. Experience this before running the installer.

Comment 73 Martin Krizek 2013-06-18 15:07:09 UTC

Confirming that A, B and C do not work and that F works. However, according to my testing, G does not work.

Comment 74 Kamil Páral 2013-06-18 15:14:55 UTC

(In reply to Martin Krizek from comment #73)
> However, according to my testing, G does not work.

Martin, I believe that is caused by the fact that you're on our corporate network, which uses dhcp hostnames. In that case this is also broken, correct. I was referring to a standard general users' home networks, without dhcp hostnames.

If you can, try again at home, thanks. Josef also promised to try to verify this.

Comment 75 Adam Williamson 2013-06-18 17:04:48 UTC

At this point, I'm +1 blocker, I think. liveinst failing to run sucks, and the impact is now sufficiently broad. Thanks for the diligent testing, Kamil.

Comment 76 Jared Smith 2013-06-18 18:21:07 UTC

I'm going to give it a +1 blocker vote as well.  Seems sufficiently impactful and unfriendly that it warrants blocker status.

Comment 77 Brian Lane 2013-06-18 19:10:57 UTC

The core problem seems to be that when pam tries to transfer the xauth cookie from liveuser it fails when the network is off. After adding debug to the pam.d modules I see this in /var/log/secure

Jun 18 14:41:00 localhost userhelper[1862]: pam_xauth(liveinst:session): requesting user 1000/1000, target user 0/0
Jun 18 14:41:00 localhost userhelper[1862]: pam_xauth(liveinst:session): /home/liveuser/.xauth/export does not exist, ignoring
Jun 18 14:41:00 localhost userhelper[1862]: pam_xauth(liveinst:session): /root/.xauth/import does not exist, ignoring
Jun 18 14:41:00 localhost userhelper[1862]: pam_xauth(liveinst:session): reading keys from `/run/gdm/auth-for-liveuser-JRYMRD/database'
Jun 18 14:41:00 localhost userhelper[1862]: pam_xauth(liveinst:session): running "/usr/bin/xauth -f /run/gdm/auth-for-liveuser-JRYMRD/database nlist :0" as 1000/1000
Jun 18 14:41:00 localhost userhelper[1862]: pam_xauth(liveinst:session): no key

When running xauth from the cmdline:

[root@localhost ~]# xauth -f /run/gdm/auth-for-liveuser-sFq8bu/database nlist
0100 0009 6c6f63616c686f7374 0001 30 0012 4d49542d4d414749432d434f4f4b49452d31 0010 73ffbc144bfcb3de5384228b45d8493f
ffff 0009 6c6f63616c686f7374 0001 30 0012 4d49542d4d414749432d434f4f4b49452d31 0010 73ffbc144bfcb3de5384228b45d8493f

But if you add :0 to that it returns nothing.

Comment 78 Adam Williamson 2013-06-19 17:31:55 UTC

bcl mentioned on IRC last night that one of kparal's proposed fixes does not work for him:

18-06-2013 10:03:11 < bcl!~bcl.com: sudo -u liveuser xhost +si:localuser:root would be fine with me
...
18-06-2013 10:05:58 < bcl!~bcl.com: I'll give it a quick test and send up a patch
...
18-06-2013 10:20:17 < bcl!~bcl.com: well that doesn't seem to work so well.
18-06-2013 10:20:43 > adamw: oh dear.
18-06-2013 10:24:35 < bcl!~bcl.com: at least it's easy to reproduce.

that's the point at which he dived into a deeper examination of the issue, the results of which are in c#77.

Comment 79 Kamil Páral 2013-06-19 17:40:19 UTC

Brian, I tried again and it works for me. What exactly doesn't work for you?

All I did:
1. boot Live
2. disable network
3. run terminal, login as root, edit /sbin/liveinst
4. edit line 120 and add "sudo -u liveuser" before xhost command
5. save, logout root
6. run liveinst as liveuser, see anaconda, profit

Do the same without editing the file and anaconda doesn't appear.

Comment 80 Adam Williamson 2013-06-19 17:52:52 UTC

Discussed at 2013-06-19 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-06-19/f19final-blocker-review-7.2013-06-19-16.01.log.txt . Given that kparal identified more cases in which this bug happens, and we seem to be seeing more and more people hitting it lately (I can recall a half dozen that I've just casually noticed in IRC and mailing list posts during F19 cycle), we decided the impact of this now appears to be broad enough to accept it as a release blocker for F19.

Comment 81 Brian Lane 2013-06-20 22:37:11 UTC

(In reply to Kamil Páral from comment #79)
> Brian, I tried again and it works for me. What exactly doesn't work for you?
> 
> All I did:
> 1. boot Live
> 2. disable network
> 3. run terminal, login as root, edit /sbin/liveinst
> 4. edit line 120 and add "sudo -u liveuser" before xhost command
> 5. save, logout root
> 6. run liveinst as liveuser, see anaconda, profit
> 
> Do the same without editing the file and anaconda doesn't appear.

Don't run it from the terminal, use the welcome screen or the desktop icon.

Comment 82 Brian Lane 2013-06-21 01:41:03 UTC

*** Bug 956242 has been marked as a duplicate of this bug. ***

Comment 83 Kamil Páral 2013-06-21 09:54:20 UTC

(In reply to Brian C. Lane from comment #81)
> Don't run it from the terminal, use the welcome screen or the desktop icon.

Ah, right. The following line added to /etc/sudoers fixes it:

> Defaults!/usr/bin/xhost !requiretty

Disclaimer: I'm not a security expert, I don't know the implications. But this can be easily changed just on the LiveCD overlay and not in the installed system.

Working around the xauth problem starts to feel a lot hackish, that's true.

Ajax, any chances to provide another fix for xauth?

Comment 84 Brian Lane 2013-06-21 13:31:23 UTC

(In reply to Kamil Páral from comment #83)
> (In reply to Brian C. Lane from comment #81)
> > Don't run it from the terminal, use the welcome screen or the desktop icon.
> 
> Ah, right. The following line added to /etc/sudoers fixes it:
> 
> > Defaults!/usr/bin/xhost !requiretty
> 
> Disclaimer: I'm not a security expert, I don't know the implications. But
> this can be easily changed just on the LiveCD overlay and not in the
> installed system.
> 
> Working around the xauth problem starts to feel a lot hackish, that's true.

Right. The real problem here is that we shouldn't need sudo -- the permissions are supposed to be handled by consolehelper, pam and xauth -- none of which should depend on the network being up or down.

Comment 85 satellitgo 2013-06-24 13:44:23 UTC

f19 remix of desktop with gnome-classic-session and @sugar-desktop added to .ks:
logged in to live user in gnome-classic session:(VirtualBox with bridged networking)
Icon "Install to Hard Drive" gets no response; (In Terminal:) "liveinst" from user asks for password fails when enter no password; su has same behaviour; "sudo su" returns root terminal' "liveinst" starts anaconda. Is liveuser password (none) not available?

Comment 86 Kamil Páral 2013-06-24 14:20:23 UTC

I know now why disconnecting a network connection breaks xauth. "hostnamectl" returns the same data ("localhost") in both cases, but "hostname" command returns "localhost" after boot, but "localhost.localdomain" after network device is disconnected. So it is essentially the same problem as with dhcp hostnames - the hostname is changed after session start.

(In reply to Kamil Páral from comment #67)
> I have built a custom LiveCD where I have pre-created /etc/hostname
> containing "localhost". This work-around fixes the problem, anaconda starts
> correctly on a such system. If the xauth fixes prove to be difficult and
> long-term, something like this could be a way to go.

Have we forgotten about this fix? I re-verified it, everything works properly if it is included. Here's a patch to spin-kickstarts:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/fedora-live-base.ks b/fedora-live-base.ks
index f87373a..7227308 100644
--- a/fedora-live-base.ks
+++ b/fedora-live-base.ks
@@ -219,6 +219,10 @@ FOE
 chmod +x /sbin/halt.local
 fi
 
+# add static hostname to work around xauth bug
+# https://bugzilla.redhat.com/show_bug.cgi?id=679486
+echo "localhost" > /etc/hostname
+
 EOF
 
 # bah, hal starts way too late
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I think this is the simplest fix we can have right now.

Comment 87 Miloslav Trmač 2013-06-24 18:54:29 UTC

It seems pam_xauth is just not working (tested on Fedora-Live-Desktop-x86_64-19-TC6-1.iso: booted in a VM, then switched the wired interface off on the panel applet).


Unprivileged user's session; note the "localhost/unix:0" entry which is presumably hostname-independent.  All three variables are set.
> [liveuser@localhost ~]$ echo $HOME $XAUTHORITY $DISPLAY; xauth list
> /home/liveuser /run/gdm/auth-for-liveuser-duYjGM/database :0
> localhost/unix:0  MIT-MAGIC-COOKIE-1  82d155f153d57ddb2876fed0f78f488f
> #ffff#6c6f63616c686f7374#:0  MIT-MAGIC-COOKIE-1  82d155f153d57ddb2876fed0f78f488f


(su -): New HOME, XAUTHORITY and DISPLAY not set at all (!?)
> [liveuser@localhost ~]$ su -
> [root@localhost ~]# echo $HOME $XAUTHORITY $DISPLAY; xauth list
/root
> xauth:  file /root/.Xauthority does not exist


(sudo -i); all variables preserved, point to the original database; does that make sense?  (AFAICS (sudo) and (sudo -i) don't use pam_xauth at all).
> $ sudo -i sh
> sh-4.2# echo $HOME $XAUTHORITY $DISPLAY; xauth list
> /root /run/gdm/auth-for-liveuser-duYjGM/database :0
> localhost/unix:0  MIT-MAGIC-COOKIE-1  82d155f153d57ddb2876fed0f78f488f
> #ffff#6c6f63616c686f7374#:0  MIT-MAGIC-COOKIE-1  82d155f153d57ddb2876fed0f78f488f


consolehelper: modified /susr/sbin/liveinst to dump the same information:
> $ head -n 3 /usr/sbin/liveinst 
> #!/bin/bash
> echo $HOME $XAUTHORITY $DISPLAY; xauth list
> exit
result: DISPLAY set, but not XAUTHORITY
> $ liveinst
> /root :0
> xauth:  file /root/.Xauthority does not exist


With both (su -) and consolehelper, the debug output of pam_xauth includes a "no key" message that Brian quoted in comment #77; the fact that "xauth ... :0" does not list any mathing key seems to be the approximate root cause of the problem here.

Comment 88 Adam Williamson 2013-06-24 19:03:00 UTC

Can we please have thoughts from everyone about whether https://bugzilla.redhat.com/show_bug.cgi?id=679486#c86 would be a sufficient fix for the biggest practical bug here - liveinst failing to run in various circumstances - for F19 Final, for which the go/no-go meeting is *in 3 days*? Thanks.

Comment 89 Jaroslav Reznik 2013-06-24 19:46:15 UTC

From my today's communication with ajax:
[16:32] <jreznik> ping, do you think you can do more with https://bugzilla.redhat.com/show_bug.cgi?id=679486 now or we should go with workaround kamil mentioned in #86?
[18:02] <ajax> i'm probably not going to have time to look into it today
[18:02] <ajax> the workaround seems reasonable, though i'd hope there's a better way

Comment 90 Steve Tyler 2013-06-24 19:56:50 UTC

(In reply to Kamil Páral from comment #86)
...
> (In reply to Kamil Páral from comment #67)
> > ... I have pre-created /etc/hostname
> > containing "localhost". This work-around fixes the problem, anaconda starts
> > correctly on a such system. If the xauth fixes prove to be difficult and
> > long-term, something like this could be a way to go.
> 
> Have we forgotten about this fix? I re-verified it, everything works
> properly if it is included. Here's a patch to spin-kickstarts:
...
> +# add static hostname to work around xauth bug
> +# https://bugzilla.redhat.com/show_bug.cgi?id=679486
> +echo "localhost" > /etc/hostname
...
> I think this is the simplest fix we can have right now.

Thanks for pointing that out, Kamil.

livecd-tools used to create /etc/hostname, but this commit appears to have changed that:

write hostname to /etc/hostname (#870805)
author	Brian C. Lane <bcl>	2012-12-04 18:52:19 (GMT)
https://git.fedorahosted.org/cgit/livecd/commit/imgcreate/kickstart.py?id=2f58f519a2693d4eecac9adb968061c503c18ab1

See also:
Bug 870805 - imgcreate needs update for sysconfig changes (systemd >= 195)

Comment 91 Adam Williamson 2013-06-24 19:58:32 UTC

https://git.fedorahosted.org/cgit/spin-kickstarts.git/commit/?h=f19&id=bc4e104add2b06680c057735c2083559b1090ae5

Comment 92 Kamil Páral 2013-06-25 13:05:15 UTC

Verified fixes with RC1.

Comment 93 Adam Williamson 2013-06-26 04:37:02 UTC

For the record, I'm waiting till we have a spin-kickstarts package build to close all these bugs fixed in spin-kickstarts; seems like a good way to make sure we get a spin-kickstarts build for final.

Comment 94 Fedora Update System 2013-06-26 15:17:35 UTC

spin-kickstarts-0.19.7-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/spin-kickstarts-0.19.7-1.fc19

Comment 95 Fedora Update System 2013-06-28 04:46:20 UTC

spin-kickstarts-0.19.8-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/spin-kickstarts-0.19.8-1.fc19

Comment 96 Fedora Update System 2013-06-28 07:16:44 UTC

spin-kickstarts-0.19.8-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 97 Petr Schindler 2013-08-26 08:23:15 UTC

I'm reopening this bug because master and f20 branches don't contain patch from comment 91, so the bug is present in F20 Alpha TC1 again.

Comment 98 Kamil Páral 2013-08-26 10:31:42 UTC

Adjusting the fields, and proposing for Alpha blocker discussion. See bug summary in comment 72. It was accepted as F19 Final Blocker, because we found out about additional use cases very late in the cycle. We might want to block earlier this time.

Comment 99 Kamil Páral 2013-08-26 12:21:53 UTC

I have verified that the /etc/hostname fix still works. It just needs to be included in spin-kickstarts, master and f20.

Comment 100 Kamil Páral 2013-08-28 17:23:02 UTC

Discussed at 2013-08-28 blocker review meeting [1]. This is rejected as an Alpha blocker, but accepted as an Alpha Freeze Exception. It doesn't violate any F20 alpha release criteria but a tested fix would be considered after freeze.

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-08-28/

Comment 101 Kamil Páral 2013-08-28 17:35:18 UTC

Committed to f20 and master:
https://git.fedorahosted.org/cgit/spin-kickstarts.git/commit/?h=f20&id=802966ff92ef408d269d4efaf6601c2e8e640098

Comment 102 Petr Schindler 2013-08-29 12:59:30 UTC

I can verified that the fix solves problem on my machine.

Comment 105 Petr Schindler 2013-09-13 12:15:06 UTC

This bug is fixed in spin-kickstarts-0.20.18-1.fc20. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

ajax
awilliam
bcl
bruno
dcbw
dmaxel
erikina
flokip
herrold
jfrieben
jonathan
jreiser
jreznik
jsmith.fedora
jturner
kevin
kparal
lpoetter
massi.ergosum
mcatanzaro+wrong-account-do-not-cc
metherid
meyering
mikhail.v.gavrilov
mitr
mkrizek
mschmidt
notting
plautrba
pschindl
rbergero
rdieter
reklov
robatino
satellitgo
stanley.king
sysoutfran
tflink
ultima.ratio.regum69
vanmeeuwen+fedora
v.plessky
xgl-maint