Bug 1370222 - session/apps fail to start if hostname changes during boot due to network (infamous xauth issue)
Summary: session/apps fail to start if hostname changes during boot due to network (in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: sddm
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Martin Bříza
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker https://fedoraproject...
: 1161967 1350107 1351149 1383757 1426698 1426707 1431872 1441845 1442749 1442750 1452140 1452143 (view as bug list)
Depends On:
Blocks: 1364278 F27BetaBlocker 1467316 1483185 1483187 1483189 1483197 1487476
TreeView+ depends on / blocked
 
Reported: 2016-08-25 14:59 UTC by Petr Schindler
Modified: 2017-09-02 22:23 UTC (History)
50 users (show)

Fixed In Version: sddm-0.14.0-14.fc26
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-02 22:23:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
photo of screen when the boot ends (1.77 MB, image/jpeg)
2016-08-25 14:59 UTC, Petr Schindler
no flags Details
boot.log (5.30 KB, text/plain)
2016-08-30 12:14 UTC, Kamil Páral
no flags Details
journal (164.70 KB, text/plain)
2016-08-30 12:14 UTC, Kamil Páral
no flags Details
rpm-qa (55.07 KB, text/plain)
2016-08-30 12:14 UTC, Kamil Páral
no flags Details
systemctl-status (160.30 KB, text/plain)
2016-08-30 12:14 UTC, Kamil Páral
no flags Details
Xorg.0.log (19.19 KB, text/plain)
2016-08-30 12:14 UTC, Kamil Páral
no flags Details
File: backtrace (19.14 KB, text/plain)
2017-07-03 09:03 UTC, Jan Sedlák
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 679486 0 low CLOSED Liveinst doesn't start if hostname changes 2021-02-22 00:41:40 UTC

Internal Links: 679486

Description Petr Schindler 2016-08-25 14:59:05 UTC
Description of problem:
On one of our computers KDE live doesn't boot in most of cases. It hangs after a while and waits for something. You can find the photo of the final screen attached. The system sometimes boot, but mostly not. When I tried to boot it with systemd.log_level=debug systemd.log_target=console it boots every time. 

When I disconnect the computer from the network (pull out the cable) the image boots.

Last rows of output suggest that ipv6 is somehow connected to the problem. We have ipv6 connectivity just in our office but not outside of our office.

Version-Release number of selected component (if applicable):
NetworkManager-1.4.0-0.5.git20160621.072358da.fc24.x86_64
systemd-231-3.fc25.x86_64
Fedora-KDE-Live-x86_64-25_Alpha-2.iso

How reproducible:
Most of the time (90% of cases is my guess). It happens only on one computer where I've tried it. All are connected to the same network.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I propose this as an alpha blocker as it violates the criterion: "Release-blocking live images must boot to the expected boot menu, and then to a desktop or to a login prompt where it is clear how to log in to a desktop."
This seems to be really some race condition which doesn't appear on all computers and not always. And there is quite easy workaround (run with cable pulled out and connect it after boot is complete). I'm -1 blocker and +1 FE

Comment 1 Petr Schindler 2016-08-25 14:59:59 UTC
Created attachment 1194053 [details]
photo of screen when the boot ends

Comment 2 Kamil Páral 2016-08-25 17:20:52 UTC
Discussed today at Go/NoGo meeting. Voted as RejectedBlocker - as so far this only appears to affect a single system, and a couple of workarounds were already discovered, this doesn't seem to merit blocker status

Comment 3 Thomas Haller 2016-08-26 10:34:30 UTC
sorry, the screenshot doesn't reveal to me whats wrong. Any way to gather logfiles?

Comment 4 Kamil Páral 2016-08-26 11:01:25 UTC
That's the problem, we're not sure how. We can supply a log when the Live system boots properly, but everything will probably look OK in there. If we try to debug the issue and add "systemd.log_level=debug systemd.log_target=console" to kernel cmdline, it also boots OK. We also see it just on one bare metal machine, so no serial console etc. Any idea how to get useful info about this race condition?

There's a possibility that this is not directly related to NetworkManager, but can be something in systemd, or plymouth, or somewhere else. But unplugging the cable makes the problem reliably go away, so it has to be at least somehow related to network.

Comment 5 Kamil Páral 2016-08-30 12:13:56 UTC
It turns out there's a working login prompt on VT2, even though the system seems stuck booting on VT1. I gathered the logs and attach them. It seems this is more likely a sddm issue, it simply doesn't start. 

If I restart sddm.service, it starts OK.

The network cable "hack" probably just affects the race condition slightly so that we no longer hit it.

Comment 6 Kamil Páral 2016-08-30 12:14:28 UTC
Created attachment 1195872 [details]
boot.log

Comment 7 Kamil Páral 2016-08-30 12:14:36 UTC
Created attachment 1195873 [details]
journal

Comment 8 Kamil Páral 2016-08-30 12:14:42 UTC
Created attachment 1195874 [details]
rpm-qa

Comment 9 Kamil Páral 2016-08-30 12:14:49 UTC
Created attachment 1195875 [details]
systemctl-status

Comment 10 Kamil Páral 2016-08-30 12:14:55 UTC
Created attachment 1195876 [details]
Xorg.0.log

Comment 11 Rex Dieter 2016-08-30 15:50:32 UTC
I've occasionally seen similar symptoms in previous releases, and it was often caused by xauth failures (asynchronous startup where dhcp sets/changes hostname, and now xauth is invalid).  Of course, that shouldn't happen

Comment 12 Kamil Páral 2016-08-31 10:56:57 UTC
Bingo! It's related to both X11 and network, and matches our testing (no issue when network disconnected). We even fixed this in the past, see bug 679486#c101.

The patch is still there, but the behavior changed. This is when I boot KDE/Workstation Live with network cable plugged in:

$ hostname
dhcp-28-109.brq.redhat.com

$ hostnamectl
   Static hostname: localhost
Transient hostname: dhcp-28-109.brq.redhat.com
         Icon name: computer-desktop

And this is when I boot with network cable unplugged:

$ hostname
localhost

$ hostnamectl
   Static hostname: localhost
         Icon name: computer-desktop


Whereas if I boot Fedora 20 Live, there's no "Transient hostname" field in hostnamectl output even with network enabled, and `hostname` reports "localhost" *always*. So this is yet another incarnation of bug 679486.

It also explains why we hit it so often on one particular computer - its network card is slightly faster or slower than in other computers. We don't hit this issue with GNOME at all, but that might be just a race, and it can be affected as well.

It also explains why so few people complain about this, because not that many people have DNS which assigns hostnames to computers.

I'm not sure whether to keep this assigned to sddm, or change it to xauth or systemd or spin-kickstarts. We probably need to investigate why the hostname behavior changed in the first place (and how to prevent it once again).

Comment 13 Kamil Páral 2016-08-31 10:59:08 UTC
(In reply to Kamil Páral from comment #12)
> We don't
> hit this issue with GNOME at all, but that might be just a race, and it can
> be affected as well.

Now that I think of it, the reason also might be the fact that GNOME now uses Wayland by default, not X11.

Comment 14 Rex Dieter 2016-08-31 11:22:49 UTC
Reassigning to xorg-x11-xauth per the prior bug (it's almost certainly not sddm's job to make sure xauth is valid in this case)

If the analysis is true, I think this ought to be reconsidered as a f25 blocker (similar to bug #679486 was).

Comment 15 Kamil Páral 2016-08-31 12:46:26 UTC
There, I fixed it:

> diff --git a/fedora-live-base.ks b/fedora-live-base.ks
> index d1401a4..1465e07 100644
> --- a/fedora-live-base.ks
> +++ b/fedora-live-base.ks
> @@ -214,7 +214,9 @@ touch /.liveimg-configured
>  
>  # add static hostname to work around xauth bug
>  # https://bugzilla.redhat.com/show_bug.cgi?id=679486
> -echo "localhost" > /etc/hostname
> +# the hostname must be something else than 'localhost'
> +# https://bugzilla.redhat.com/show_bug.cgi?id=1370222
> +echo "localhost-live" > /etc/hostname
>  
>  EOF

From `man hostnamectl`:
"If a static hostname is set, and is valid (something other than localhost), 
then the transient hostname is not used."

So, as long as the hostname is "localhost", the transient hostname is retrieved from network and used. I used the kickstart patch above and built a new KDE Live image (using "localhost-live" hostname), and the problem is gone - KDE boots just OK and the hostname is "localhost-live" every time.

Since bug 679486 was accepted as a blocker in the past, and this is basically the same bug, I'm proposing this one as well.

Now we just need to decide whether we want to use the fix above, or something else. Bug 679486 is definitely too long to read, but it seems the kickstart hack was accepted as a temporary workaround until xauth can solve this better. From that POV, updating the workaround is probably the simplest thing to do, and we'll continue waiting for a proper fix somewhere else in the stack...

Comment 16 Hans de Goede 2016-08-31 13:51:25 UTC
(In reply to Rex Dieter from comment #14)
> Reassigning to xorg-x11-xauth per the prior bug 

Except that the prior bug was not fixed at the xauth side at all, it was fixes with this workaround:

https://git.fedorahosted.org/cgit/spin-kickstarts.git/commit/?h=f20&id=802966ff92ef408d269d4efaf6601c2e8e640098

> (it's almost certainly not sddm's job to make sure xauth is valid in this case)

Given that sddm is setting up the xauth file it certainly is sddm's job and no one elses! Also xauth the cmdline tool is likely not even involved. IIRC gdm creates the file itself using libXau, I kinda expect sddm to do the same. But either way the garbage in garbage out principle applies. There really is nothing xauth or libXau can do to fix things here.

Maybe the sddm .service needs to be amended so that it does not start until the network is up ?

Comment 17 Rex Dieter 2016-08-31 14:30:36 UTC
xauth is (most likely) valid at the time of it's creation.  The problem is the system's hostname changing, which is apparently invalidating xauth.

I doubt waiting for network is an acceptable fix or workaround here.

Comment 18 Kamil Páral 2016-09-01 08:30:58 UTC
(In reply to Hans de Goede from comment #16)
> Except that the prior bug was not fixed at the xauth side at all, it was
> fixes with this workaround:
> 
> https://git.fedorahosted.org/cgit/spin-kickstarts.git/commit/
> ?h=f20&id=802966ff92ef408d269d4efaf6601c2e8e640098

That was just a workaround, because nobody fixed it properly. I'm no expert in this, but from bug 679486 discussed I gathered this is really a problem either in X11 (xauth), or lower in the stack (dhcp, systemd). It seems very silly that X11 relies on hostname not changing in runtime and then dhcp changes it in runtime. Lennart comments on that in bug 679486 comment 35, and then Ajax tried to fix it (in xauth) in bug 679486 comment 43 and bug 679486 comment 68. That led me to a conclusion that this is really something we want to have fixed in xauth.

> Maybe the sddm .service needs to be amended so that it does not start until
> the network is up ?

Please don't. We're talking about workstations here, not servers. Network connections come and go, at any time. The system must be able to reflect that.

Comment 19 Hans de Goede 2016-09-01 10:49:14 UTC
(In reply to Kamil Páral from comment #18)
> (In reply to Hans de Goede from comment #16)
> > Except that the prior bug was not fixed at the xauth side at all, it was
> > fixes with this workaround:
> > 
> > https://git.fedorahosted.org/cgit/spin-kickstarts.git/commit/
> > ?h=f20&id=802966ff92ef408d269d4efaf6601c2e8e640098
> 
> That was just a workaround, because nobody fixed it properly. I'm no expert
> in this, but from bug 679486 discussed I gathered this is really a problem
> either in X11 (xauth), or lower in the stack (dhcp, systemd). It seems very
> silly that X11 relies on hostname not changing in runtime and then dhcp
> changes it in runtime. Lennart comments on that in bug 679486 comment 35,
> and then Ajax tried to fix it (in xauth) in bug 679486 comment 43 and bug
> 679486 comment 68. That led me to a conclusion that this is really something
> we want to have fixed in xauth.

libXau actually, but that does not matter. So ajax had a patch for this back when this was an issue the first time, but no one tested it!  If someone can reliably reproduce this it might be a good idea to drop ajax a mail and kindly ask if he can dig out that patch and do a libXau build with it to test.

Comment 20 Kamil Páral 2016-09-01 11:10:30 UTC
I tested it in bug 679486 comment 69 and it did not work for me. I can reliably reproduce the issue and I'll be happy to test any new version of that fix.

Ajax (adding to CC), do you have some new iteration of that patch? Thanks.

Comment 21 Kamil Páral 2016-09-19 15:08:42 UTC
I re-proposed this as a blocker in comment 15, but I forgot to remove an important keyword. Fixing.

Comment 22 Petr Schindler 2016-09-19 18:16:54 UTC
Discussed at 2016-09-19 blocker review meeting: [1]. 

This bug was accepted as Final blocker: This bug violates the final criterion "Release-blocking live images must boot to the expected boot menu, and then to a desktop or to a login prompt where it is clear how to log in to a desktop."

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2016-09-19/

Comment 23 Steven Haigh 2016-10-09 06:53:50 UTC
For some extra data, I also see this on the F25 beta ISO for the KDE Spin live environment.

Restarting SDDM via 'systemctl restart sddm' gives me a live desktop as expected.

Comment 24 Rex Dieter 2016-10-28 14:52:28 UTC
Why did this get reassigned back to sddm?  sddm is setting xauth correctly, as far as I can tell.  The problem is that it is getting invalidated (by hostname changes?).

Bug referenced in comment #20 is against libXau , assigning there (unless anyone has a better idea)

Comment 25 Ray Strode [halfline] 2016-10-28 18:04:34 UTC
sddm should be creating FamilyWild type xauth entries if it isn't already. They're immune to hostname changes.  Also sddm could set up xhost ServerInterpeted entries for the localuser, so the xauth file isn't required.

Here's some code to follow for setting up FamilyWild:

https://git.gnome.org/browse/gdm/tree/daemon/gdm-x-session.c#n107

Note the auth entry is written twice, but one line 145 the family is changed to FamilyWild.

Here's some code for setting up xhost ServerInterpreted entries:

https://git.gnome.org/browse/gdm/tree/daemon/gdm-display.c#n1695

Comment 26 Rex Dieter 2016-10-31 12:27:37 UTC
Thanks, looks like sddm currently does not support handling of xauth entries natively, but calls /usr/bin/xauth do to that... So, implementing your suggestion may not be trivial.

Comment 27 Kamil Páral 2016-10-31 12:52:49 UTC
Since I don't expect libXau to fix this in 1 week (before F25 release), I'm reassigning this to spin-kickstarts, and I propose to include the workaround described in comment 15.

Comment 28 Rex Dieter 2016-10-31 17:11:01 UTC
fyi, submitted sddm/RFE upstream 
https://github.com/sddm/sddm/issues/733

Comment 29 Kevin Fenzi 2016-10-31 17:27:44 UTC
Can someone submit a fedora-kickstarts PR then?

Comment 30 Kamil Páral 2016-10-31 18:13:02 UTC
fedora-kickstarts pull requests proposed and merged:
https://pagure.io/fedora-kickstarts/pull-request/89
https://pagure.io/fedora-kickstarts/pull-request/90

We'll see in the next compose. Reassigning back to libXau for a proper fix, if possible.

Comment 31 Adam Williamson 2016-11-04 23:44:05 UTC
Can someone re-test this with a current nightly? We can drop blocker status if it's successfully worked-around.

Comment 32 Petr Schindler 2016-11-07 13:57:34 UTC
I just tested this with Fedora-KDE-Live-x86_64-25-20161106.n.0.iso and it boots fine on affected computers.

Comment 33 Kamil Páral 2016-11-07 14:00:08 UTC
Dropping blocker status, resetting for libXau.

Comment 34 Hans de Goede 2016-11-08 11:33:41 UTC
Hi,

(In reply to Kamil Páral from comment #33)
> Dropping blocker status, resetting for libXau.

As explained in comment 25, the problem is not libXau (*), the problem is sddm creating the wrong type of
xauth entry, there is a specific type of xauth entry which is not coupled to a hostname, the FamilyWild type. 

Now since ssdm is creating normal xauth entries, which are by design couple to a hostname, it really is not libXau's problem that they do not survice a hostname change. Changing the component back to sddm.

Regards,

Hans


*) According to comment 26 sddm is not even using libXau, which is part of the problem

Comment 35 Ray Strode [halfline] 2016-11-14 15:03:04 UTC
of course, it's arguably a /usr/bin/xauth bug that it can't create FamilyWild entries.  It also can't create -displayfd friendly entries, but that's another matter entirely...

Comment 36 Rex Dieter 2017-01-30 12:57:28 UTC
*** Bug 1350107 has been marked as a duplicate of this bug. ***

Comment 37 Gerald Cox 2017-01-30 15:36:04 UTC
Guys, any idea when this is going to be fixed?  I'm able to work around it and have in the meantime changed to using kdm - but a casual user who experiences this issue isn't going to have a clue as to what to do.  All the get is a black screen.  I realize everybody is busy but since kde now has sddm as the default, it should escalate the priority a bit.

Comment 38 Rex Dieter 2017-01-30 15:38:24 UTC
Upstream is tracking this at:
https://github.com/sddm/sddm/issues/733

Comment 39 Gerald Cox 2017-01-30 16:12:39 UTC
(In reply to Rex Dieter from comment #38)
> Upstream is tracking this at:
> https://github.com/sddm/sddm/issues/733

Thanks Rex... I saw that... I've been dealing with this problem since June 2016, and you provided upstream with some code the end of October 2016.  I might have missed it, but doesn't look like alot of movement on there end to resolve.  

If upstream can't or won't fix this in a timely manner perhaps Fedora should re-evaluate having SDDM as the default for KDE.  If it works for people, they can continue to use it - but booting up and some people getting a black screen is really not acceptable.  I have experienced this on two separate laptops now and have had to revert to kdm.  

Don't get me wrong... I much prefer SDDM - but if it has a problem like this and they can't resolve it in a timely manner, we shouldn't keep it as the default.

Comment 40 Rex Dieter 2017-02-19 05:17:36 UTC
*** Bug 1161967 has been marked as a duplicate of this bug. ***

Comment 41 Rex Dieter 2017-02-19 05:51:37 UTC
*** Bug 1161967 has been marked as a duplicate of this bug. ***

Comment 42 Gerald Cox 2017-02-19 13:36:39 UTC
When reading through this bug I somehow missed the fact that the SDDM issue has a simple workaround... duh!  Change the /etc/hostname from localhost to match the dhcp issued hostname.  The hard part is going to be communicating this to people and unfortunately most people don't read release notes and sifting through comments in bugzilla tends to make ones eyes glaze over.  I'll update upstream with this circumvention.  Hopefully, they can fix it soon.

Comment 43 Rex Dieter 2017-02-20 16:41:19 UTC
*** Bug 1383757 has been marked as a duplicate of this bug. ***

Comment 44 Rex Dieter 2017-02-24 16:01:30 UTC
*** Bug 1426698 has been marked as a duplicate of this bug. ***

Comment 45 Rex Dieter 2017-02-24 16:24:22 UTC
*** Bug 1426707 has been marked as a duplicate of this bug. ***

Comment 46 Rex Dieter 2017-03-16 12:54:04 UTC
*** Bug 1351149 has been marked as a duplicate of this bug. ***

Comment 47 Rex Dieter 2017-03-17 11:46:42 UTC
*** Bug 1431872 has been marked as a duplicate of this bug. ***

Comment 48 Andrej Podzimek 2017-03-19 10:20:17 UTC
In my case the "simple workaround" can't be applied, because hostname, hostnamectl and /etc/hostname all show/contain the same hostname already, yet I still have a black screen in sddm.

Comment 49 Bruce Petrie 2017-04-07 03:24:19 UTC
I have been experiencing SDDM failures as described here on F23 and F24. GDM based hosts don't have this issue. I am running an access point with a local LAN domain name and MAC based IP addresses provided by DHCP. 

hostnamectl was showing a transient hostname different than the static hostname. 
systemctl restart sddm.service from alternate console restores SDDM login screen. 

On the SDDM host, hard coded in /etc/hosts:

IPaddr thehostname.localdomainname

rebooted and SDDM login works every time now. hostnamectl no longer shows transient hostname.

Comment 50 Rex Dieter 2017-04-13 02:26:26 UTC
*** Bug 1441845 has been marked as a duplicate of this bug. ***

Comment 51 Rex Dieter 2017-04-17 11:49:19 UTC
*** Bug 1442749 has been marked as a duplicate of this bug. ***

Comment 52 Rex Dieter 2017-04-17 11:49:22 UTC
*** Bug 1442750 has been marked as a duplicate of this bug. ***

Comment 53 Rex Dieter 2017-05-18 13:11:51 UTC
*** Bug 1452143 has been marked as a duplicate of this bug. ***

Comment 54 Rex Dieter 2017-05-18 13:48:04 UTC
*** Bug 1452140 has been marked as a duplicate of this bug. ***

Comment 55 Jan Sedlák 2017-07-03 09:03:07 UTC
Similar problem has been detected:

Fresh install from KDE Live image, first boot worked, sddm stopped working after first reboot.

reporter:       libreport-2.9.1
backtrace_rating: 4
cmdline:        /usr/bin/sddm-greeter --socket /tmp/sddm-:0-vXcsYL --theme /usr/share/sddm/themes/01-breeze-fedora
crash_function: qt_message_fatal
executable:     /usr/bin/sddm-greeter
journald_cursor: s=9b28215428164879a30f27af36ba3700;i=d29;b=f16146e7c50f4a0db1fc6319d2b51ea1;m=54063e;t=55365d6827f16;x=47bb7b5299192d8f
kernel:         4.11.8-300.fc26.x86_64
package:        sddm-0.14.0-10.fc26
reason:         sddm-greeter killed by signal 6
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            991

Comment 56 Jan Sedlák 2017-07-03 09:03:28 UTC
Created attachment 1293780 [details]
File: backtrace

Comment 57 Jan Sedlák 2017-07-03 10:54:03 UTC
I'm experiencing this problem on freshly installed Fedora 26 with KDE in VM. I'm using NAT networking in VM and it looks like libvirt/virt-manager somehow "remembers" that my hostname was "localhost-live" during installation and it assigns me this hostname even on installed system after reboot. If I delete virtual network card and then add it again, it forgets my "localhost-live" hostname, it takes hostname from /etc/hostname and then it works OK.

Comment 58 Kamil Páral 2017-07-06 14:06:12 UTC
Proposing as F26 blocker, as yet another round in the endless saga. KDE system installed from F26 KDE Live doesn't boot on subsequent reboots, sddm always (for me) crashes. IOW, you can reboot after installation (I'm not completely sure why that works, probably because inital-setup delays sddm startup), but on any other subsequent reboot, you only get a black screen. This happens consistently in a libvirt (virt-manager) environment on F26 with NAT networking. It might affect some home or enterprise networking, probably depending on how the DHCP server is configured. (I can't really guess how many users can be affected).

Here's my *theory* how it happens: DHCP server has the ability to provide a hostname for a machine. In (usually) enterprise setups that often broke GNOME (until Wayland) and KDE LiveCD boot functionality, and we fixed that by using temporary "localhost-live" hostname for LiveCDs (comment 15). That hostname is only set in tmpfs overlay, so it doesn't affect the installed system. However, the DHCP servers seem to have a few more tricks up their sleeve, one of them being to remember an existing hostname of a machine, and then send the hostname back to the machine when the machine reconnects. (This is just my guess, I have zero experience with DHCP). Latest libvirt seems to have gained this new feature. So if you have a default libvirt NAT network, and a VM reboots, the integrated DHCP server sends the previous hostname back to the VM. That's why, if you install an F26 Live image (GNOME or KDE), and you reboot, you still see "localhost-live" in your shell prompt - because it's a transient hostname retrieved by systemd from dhcp, even though your static hostname is (by default) "localhost.localdomain". GNOME uses wayland so it doesn't matter, but KDE breaks if hostname changes during boot.

For the record, this is the output after a default F26 KDE Live installation (rebooted into the system) in libvirt NAT:

[kparal@localhost-live ~]$ hostnamectl 
   Static hostname: localhost.localdomain
Transient hostname: localhost-live.default
         Icon name: computer-vm
...

Once your system only boots to a black screen (because it has localhost.localdomain static hostname, but receives localhost-live.default transient hostname from libvirt hdcp), there are following ways to fix it:

1. Make libvirt forget the old hostnames. I found them to be stored in /var/lib/libvirt/dnsmasq/virbr0.status (in my case), so just removing that file and restarting libvirtd could probably do the trick (not tested). However, this only fixes the issue until your next static hostname change.

2. Remove and re-add the network card in virt-manager, which makes you get a different MAC address, and therefore not get the old hostname from dhcp (tested by Jan above). Or edit the MAC address manually using virsh edit. However, this only fixes the issue until your next static hostname change.

3. Switch to tty2 on a broken-boot KDE system, and use "hostnamectl set-hostname foo" to set a static hostname. Once the hostname is different from localhost(.localdomain), transient hostnames should not be used and you should not be affected (tested). Permanent solution.


Netinst installations are not initially affected by this problem, because they don't set a static hostname in the installer environment. So you have no old hostname to be returned to you. This is output from a netinst-installed system:

[kparal@localhost ~]$ hostnamectl 
   Static hostname: localhost.localdomain
         Icon name: computer-vm
...

However, you can still be affected by this problem if you change your hostname to foo and then back to localhost (that obviously applies for all the cases mentioned here). 


As a more proper fix from Fedora side, not user workarounds, I don't see any easy or good solution:

1. Fix sddm and/or xauth, obviously.
2. Replace sddm with something else.
3. Make anaconda default to a hostname other than localhost(.localdomain).
4. Configure systemd to not accept dhcp hostnames, or to only accept them if /etc/hostname doesn't exist (i.e. do not make the special exception for localhost(.localdomain)).
5. Discontinue KDE Live and only offer netinst (sorry, getting out of ideas).

Comment 59 Hans de Goede 2017-07-06 14:12:07 UTC
As explained in comment 25, more then 8 months ago now, the right way to fix this is to make sddm create xauth entries of the FamilyWild type which does not care about hostnames.

This really should not be all that hard. It just requires someone with some coding skills (not a lot but some) to sit down and do the work...

Comment 60 Matthew Miller 2017-07-06 15:16:32 UTC
-1 to blocker as "as yet another round in the endless saga". Please propose as prioritized bug instead.

Comment 61 Steven Haigh 2017-07-06 15:21:09 UTC
Also -1 for a blocker. It's been around for almost a year - didn't get fixed in F25, still exists in F26.

It'd be fantastic to see this finally fixed - but blocker status is probably over the top - unless it spurs someone on to fix it in record time! :)

Comment 62 Kamil Páral 2017-07-06 15:39:45 UTC
(In reply to Steven Haigh from comment #61)
> It's been around for almost a year - didn't get fixed
> in F25, still exists in F26.

Please note that the issue described in comment 58 (i.e. the reason for proposing it as a blocker) is quite likely *new* in Fedora 26. I can't check easily right now, but the issue probably didn't exist in Fedora 24/25 (or at least we didn't notice it).

Comment 63 Steven Haigh 2017-07-06 15:43:21 UTC
From my understanding (and I'm happy to be corrected) is that the problem is still the same, but now you're triggering it from libvirt for a VM environment instead of on bare metal.

The initial report on this seems to be from Fedora-KDE-Live-x86_64-25_Alpha-2.iso

Comment 64 Jaroslav Reznik 2017-07-06 15:52:13 UTC
Are we able to isolate the DHCP set up that causes this issue to understand how broad this is (aka if the set up is rare or not)? Sddm works for me after subsequent restarts in Virt Manager and I'm able to login.

Comment 65 Kamil Páral 2017-07-06 16:00:23 UTC
(In reply to Steven Haigh from comment #63)
> From my understanding (and I'm happy to be corrected) is that the problem is
> still the same, but now you're triggering it from libvirt for a VM
> environment instead of on bare metal.

No, as I tried to describe in detail, the initial problem was related to LiveCD booting. We fixed that, and now it is related to subsequent installed system booting. The new problem only manifests in networks with dhcp configured to remember and return previous hostname to the machine. Which now happens in libvirt NAT network (didn't use to happen in previous releases, I believe), but can also happen in real bare-metal networks (I have no idea if or how often this is the case).

> 
> The initial report on this seems to be from
> Fedora-KDE-Live-x86_64-25_Alpha-2.iso

See bug 679486. It's quite a read.

(In reply to Jaroslav Reznik from comment #64)
> Are we able to isolate the DHCP set up that causes this issue to understand
> how broad this is (aka if the set up is rare or not)? Sddm works for me
> after subsequent restarts in Virt Manager and I'm able to login.

I was of the opinion that it happens for everyone with libvirt NAT. But it is essentially a race condition, so it might happen always, depending on how fast your system and network is (for me, it's 100%).

When reproducing, try this:
1. Make sure you're running F26 as a virt host.
2. Make sure you have a default libvirt NAT-type network.
3a. Install F26 KDE Live and reboot twice, sddm should fail to start and you should see the same hostnamectl output as in comment 28 ("transient hostname: localhost-live.default").
-- or, if you already have KDE VM installed --
3b. In KDE, use "hostnamectl set-hostname foo". Reboot.
4b. Use "hostnamectl set-hostname localhost". Reboot.
5b. KDE should fail to boot to sddm, and in tty2 if you run "hostnamectl", you should see "static hostname: localhost" and "transient hostname: foo" (your old hostname).

Comment 66 Adam Williamson 2017-07-06 21:05:11 UTC
Discussed at 2017-07-06 Fedora 26 Final Go/No-Go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-meeting-2/2017-07-06/f26_final_gono-go_meeting.2017-07-06-17.00.html . Along the lines explained in https://bugzilla.redhat.com/show_bug.cgi?id=1468239#c6 , we agreed to accept this as a blocker but waive it to Fedora 27 Beta. We took into consideration again that the bug (or rather, the libvirt complication that's new in F26) was discovered late, and that the impact is conditional (so far we are sure only that F26 libvirt hosts using NAT networking are affected), and that workarounds are available.

Comment 67 Martin Bříza 2017-08-16 10:13:27 UTC
Ok, so I did what Hans proposed, which means I replaced the execs of xauth with calls to libXau, with FamilyLocal and FamilyWild. Seems alright on my system. Please check if it fixes the problem for you too.

https://koji.fedoraproject.org/koji/taskinfo?taskID=21260582

Comment 68 Lukas Brabec 2017-08-17 11:21:59 UTC
(In reply to Martin Bříza from comment #67)
> Ok, so I did what Hans proposed, which means I replaced the execs of xauth
> with calls to libXau, with FamilyLocal and FamilyWild. Seems alright on my
> system. Please check if it fixes the problem for you too.
> 
> https://koji.fedoraproject.org/koji/taskinfo?taskID=21260582

this fixed the problem for me

Comment 69 Fedora Update System 2017-08-17 15:10:54 UTC
sddm-0.14.0-13.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-d119cd8c3f

Comment 70 Fedora Update System 2017-08-19 18:55:20 UTC
sddm-0.14.0-13.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-d119cd8c3f

Comment 71 Steve Storey 2017-08-22 09:30:05 UTC
After applying this update, X11 forwarding no longer works properly at least from a KDE Plasma session:

Before update
```
ssh somemachine -X
Last login: Tue Aug 22 10:16:26 2017 from 10.0.0.1
```

and after update:

```
ssh somemachine -X
Warning: No xauth data; using fake authentication data for X11 forwarding.
Last login: Tue Aug 22 10:16:31 2017 from 10.0.0.1
```

From x2goclient, I get the following log:

```
NXPROXY - Version 3.5.0

Copyright (C) 2001, 2010 NoMachine.
See http://www.nomachine.com/ for more information.Info: Connection with remote proxy completed.
Warning: Unrecognized session type 'unix-kde-depth_24'. Assuming agent session.
Error: Failed to identify the cookie in string 'mylocalmachine/unix:  MIT-MAGIC-COOKIE-1'.
Error: Cannot read the cookie from the X authorization file.
Error: Error creating the X authorization.
Session: Session terminated at 'Tue Aug 22 10:21:50 2017'.


Info: Proxy running in client mode with pid '13686'.
Session: Starting session at 'Tue Aug 22 10:21:49 2017'.
Info: Using abstract X11 socket in kernel namespace for accessing DISPLAY=:0.
Info: Connecting to remote host 'localhost:41628'.
Info: Connection to remote proxy 'localhost:41628' established.
```
which prevented the x2go session from starting.

Also - after applying the update, the next time I logout and try to log back in, I get a hang (sometimes to the extent I can't even login on a text console, but could get in over SSH still). If I do:

systemctl restart sddm.service

then I can login properly again. Downgrading the package back to 0.14.0-10 from the updates repo solves all the problems (and I don't need a service restart to be able to login again immediately)

Comment 72 Rex Dieter 2017-08-22 13:36:00 UTC
I cannot reproduce the ssh/x-forwarding issue locally (at least).

$ ssh -Y localhost
...
$ echo $DISPLAY
localhost:10.0

$ xterm
(launches and displays)

Comment 73 Steve Storey 2017-08-22 20:06:13 UTC
Hmm, well ...

As it happens, yes, xterm will still launch despite the warning as shown above (not quite sure how/why that would still work if there's a lack of valid auth - does it suggest I've not got some security area configured correctly?)

Running ssh -vv -Y indicates that it runs

/usr/bin/auth  list :0

to get the cookie information and that does seem to change with the update. One a pre-update machine I have:

preupdate/unix:0  MIT-MAGIC-COOKIE-1  3c824031a7d4c59610eade797c159a74

and a post-update machine I have:

postupdate/unix:  MIT-MAGIC-COOKIE-1  881f59f7cdc69695478436e885ec2295
#ffff#73746576652d64656c6c2e6c616e#:  MIT-MAGIC-COOKIE-1  881f59f7cdc69695478436e885ec2295

so there's a second line, and no '0' after the ':'. I'm nowhere near knowledgeable enough to know whether the second one is also valid or not, but it's presumably this which causes at least x2go to barf.

Comment 74 Steve Storey 2017-08-22 20:08:00 UTC
Ugh. Sorry should have been /usr/bin/xauth in my post above ...

Comment 75 Steve Storey 2017-08-24 00:49:13 UTC
I've now done some digging and am rather more knowledgeable (empirically at least!) about xauth formats (though the public docs on detail of the structures seem very sparse !?).

I believe there are 2 issues with the patch as it stands:

1. It's not setting auth.number with the display number, which means the xauth line is then thought to be valid for all displays (owned by that user?). At least for consistency with the old behaviour, I believe we should be setting the display number there - that was the cause of the missing '0' after the ':' in my original report.

2. Much more fatally however, the cookie name is being passed including the nul terminator due to sizeof vs strlen, and so clients which do matches on the binary data don't see a match between 'MIT-MAGIC-COOKIE-1' and 'MIT-MAGIC-COOKIE-1\0' which was the main problem I was experiencing (particularly with x2go). You can only spot this happening if you examine the output of

/usr/bin/xauth nlist

I've created a proposed update to the patch here: https://github.com/stevestorey/fedora-package-sddm/commit/b11bf17eede1ee84fce2c6dd452c6c523eeda01f - very happy for someone to rewrite how I'm getting the display number! The proposed update means the display also gets set as well as fixing the nul termination problem in the cookie name for me.

Comment 76 Raphael Groner 2017-08-24 08:20:08 UTC
(In reply to Steve Storey from comment #71)
> After applying this update, X11 forwarding no longer works properly at least
> from a KDE Plasma session:> From x2goclient, I get the following log:
…

Well, you should report your issue as a bug to X2Go developers. It seems to be a missing but needed feature there to support libXau.

As this bug was an AcceptedBlocker, my vote goes for inclusion of this fix to stable, finally.

Comment 77 Steve Storey 2017-08-24 09:21:19 UTC
(In reply to Raphael Groner from comment #76)
> (In reply to Steve Storey from comment #71)
> > After applying this update, X11 forwarding no longer works properly at least
> > from a KDE Plasma session:
> …
> > From x2goclient, I get the following log:
> …
> 
> Well, you should report your issue as a bug to X2Go developers. It seems to
> be a missing but needed feature there to support libXau.

See comment #75 - the problem is not upstream, the problem is in the patch to sddm added in the -13 update

Comment 78 Martin Bříza 2017-08-24 10:19:34 UTC
Steve,
thank you for the research and for the patch, especially! I have just tested it and it works the same on my system which, with your reasoning, means I'll include it in the next update. Does it fix the problem with X forwarding for you?

Comment 79 Steve Storey 2017-08-24 11:18:27 UTC
No problem - yes, my update does fix all the X forwarding issues, and indeed (for the FamilyLocal entry) the output of xauth nlist is exactly the same as it was previously, so I think it should work fine for all users and happy to have it included in the next update.

To be explicit - this is my first patch involving this kind of QT code - so while

char displayNumber[display.size() + 1] = { 0 };
strcpy(displayNumber, qPrintable(display.mid(1))); // Need to skip the ':'

does seem to work for me, I'm sure there's a better and more idiomatic way to convert a QString to a char array (and/or maybe a better way to get the display environment var?). In particular, if the display string were empty (and this mid(1) doesn't make sense), I don't know what the behaviour would be (I'm more a Java & Python developer than C ;) ), so by all means rewrite the patch.

Comment 80 Fedora Update System 2017-08-24 22:15:15 UTC
sddm-0.14.0-13.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-d119cd8c3f

Comment 81 Fedora Update System 2017-08-26 22:33:18 UTC
sddm-0.14.0-14.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-d119cd8c3f

Comment 82 Fedora Update System 2017-09-02 22:23:29 UTC
sddm-0.14.0-14.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.