Bug 545267
Summary: | gdm-2.28.1-25.fc12+ does not display users: "no seat-id found" in /var/log/gdm/:0-greeter.log | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | mica1884 | ||||||
Component: | dbus | Assignee: | Colin Walters <walters> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 12 | CC: | antonio.montagnani, beland, davidz, denis, dwmw2, ejnersan, htl10, james, jmccann, masao-takahashi, michal, mschmidt, opossum1er, petersen, rdieter, rhughes, rstrode, walters, walters, zing | ||||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 1.2.16-9.fc12 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 547808 (view as bug list) | Environment: | |||||||
Last Closed: | 2010-01-09 20:02:42 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 549687 | ||||||||
Attachments: |
|
Description
mica1884
2009-12-08 00:39:59 UTC
It would be easier to debug the problem if we knew what packages were actually updated. You can find this information in /var/log/yum.log, or using the "yum history" feature. But let me guess anyway... - Do you have updates-testing repository enabled? - Do you have hal-0.5.14-1.fc12.x86_64 installed? - Does "yum downgrade hal hal-libs" fix it? Same problem here. This was triggered by gdm-2.28.1-25.fc12.i686 from updates-testing. Same problem here. Environment linux-2.6.31.6-164.fc12.i686 gdm-2.28.1-25.fc12.i686 pango-1.26.1-1.fc12 fontconfig-2.8.0-1.fc12 gnome-session-2.28.0-2.fc12 metacity-2.28.0-11.fc12 (In reply to comment #1) > It would be easier to debug the problem if we knew what packages were actually > updated. You can find this information in /var/log/yum.log, or using the "yum > history" feature. > > But let me guess anyway... > - Do you have updates-testing repository enabled? > - Do you have hal-0.5.14-1.fc12.x86_64 installed? > - Does "yum downgrade hal hal-libs" fix it? I downgrade hal-0.5.14-1.fc12.i686 to hal-0.5.13-10.fc12.i686. Then rebooting kernel. The gdm succeeds to display user list again. My env linux-2.6.31.6-166.fc12.i686 gdm-2.28.1-24.fc12 I also see this with gdm-2.28.1-25.fc12 in testing. (Changing to all archs) (Takahashi-san: guess you mean kernel-2.6.31.6-166.fc12.i686) I also confirm reverting to -24 fixes the problem. (In reply to comment #6) > (Takahashi-san: guess you mean kernel-2.6.31.6-166.fc12.i686) Yes. But, kernel-2.6.31.6-166.fc12.i686 is not mandatory. There is no dependency with kernel version. > > I also confirm reverting to -24 fixes the problem. I tested -24 with hal-0.5.14. But, gdm failed to display user-list. hal-0.5.13 is mandatory T think. (In reply to comment #7) > (In reply to comment #6) > > (Takahashi-san: guess you mean kernel-2.6.31.6-166.fc12.i686) > Yes. > But, kernel-2.6.31.6-166.fc12.i686 is not mandatory. > There is no dependency with kernel version. > > > > I also confirm reverting to -24 fixes the problem. > I tested -24 with hal-0.5.14. But, gdm failed to display user-list. > hal-0.5.13 is mandatory T think. hal-0.5.13 is mandatory, I think. Michal, I have updates-testing enabled, but neither hal nor hal-libs were among the packages that were upgraded at the time in question. Instead they are: glib2, gtk2, samba-winbind-clients, samba-common, libicu, libtool-ltdl, libmtp, libsmbclient, gnote, mdadm, xorg-x11-drv-dummy, gzip, selinux-policy, selinux-policy-targeted, fontpackages-filesystem, glib2, gtk2, gdm, libtool-ltdl, gdm-plugin-fingerprint, gdm-user-switch-applet, gtk2-immodule-xim, perl-Pod-Escapes, perl-version, perl-libs, perl, perl-Pod-Simple, perl-Module-Pluggable, samba-client That being said, hal-0.5.14 is installed. I'll see if downgrading fixes the problem for me and get back to you. It works for me with hal-0.5.14-1.fc12 and gdm-2.28.1-24.fc12. I am testing on x86_64 if it matters. I tested combinations of gdm and hal . The results is below. 1. Env. kernel-2.6.31.6-166.fc12.i686(not x86_64) gtk2-2.18.5-3.fc12.i686 pango-1.26.1-1.fc12.i686 fontconfig-2.8.0-1.fc12.i686 glib2-2.22.3-1.fc12.i686 glibc-2.11-4.i686 gcc-4.4.2-14.fc12.i686 xorg-x11-server-Xorg-1.7.3-1.fc12.i686 initscripts-9.02.1-1.i686 upstart-0.3.11-3.fc12.i686 2. tests gdm-2.28.1-24 and hal-0.5.14 ----> NG gdm-2.28.1-25 and hal-0.5.14 ----> NG gdm-2.28.1-24 and hal-0.5.13 ----> GO gdm-2.28.1-25 and hal-0.5.13 ----> GO Seems to be some sort of ConsoleKit hiccup.. Would someone mind attaching a greeter log file with the "no seat-id found" error message in it? It looks like the only way the error mesasge: Unable to lookup session information for process .... org.freedesktop.ConsoleKit.Manager.GeneralError could show up is if XDG_SESSION_COOKIE is unset. GDM is responsible for setting this environment variable, so it's likely the problem is in GDM. Honestly, I think hal may be a red herring. It could just be the behavior is sporadic. Created attachment 377622 [details] greeter.log with "no seat-id found" It is perfectly deterministic on my laptop. I always get the same results after boot as in Masao's tests in comment #11. (In reply to comment #14) > It is perfectly deterministic on my laptop. I always get the same results after > boot as in Masao's tests in comment #11. I should emphasize _after_boot_. Even the broken combination will correctly show the list of users after I logout from a Gnome session. I have tested a combination of gdm-2.28.1-25 and hal-0.5.14 as follows. 1. get cvs tree of hal 2. edit hal.spec %configure \ --enable-docbook-docs \ --docdir=%{_docdir}/%{name}-%{version} \ --with-os-type=redhat \ --with-udev-prefix=/etc \ --enable-console-kit \<------ previous --disable-console-kit --disable-policy-kit \ --disable-acpi-ibm \ --disable-smbios \ --enable-umount-helper \ --without-usb-csr \ --without-cpufreq \ --with-eject=%{_sbindir}/eject 3. make i686 4. rpm -Uvh hal-0.5.14.i686.rpm 5. reboot 6. user list is displayed on the gdm screen. I don't know why. So I hammered at this a bit this afternoon/evening and I have a theory. Reading through the dbus activation code it seems like it has a race condition. If an activated service daemonizes (forks and exits its parent) quickly after being started and the daemon processes that SIGCHLD exit notification before the child has taken a name on the bus, then I think the dbus daemon misidentifies the activated service damonization as a failure in the activiation launch helper. Note this means only the *first* ConsoleKit call will fail, since from then on the consolekit daemon will be running and won't need to be activated. I added --no-daemon to the Exec line in /usr/share/dbus-1/system-services/org.freedesktop.ConsoleKit.service and rebooted a few times without being able to reproduce the problem anymore (although the problem only sporadically happens for me anyway). If this is indeed the problem, then the reason hal plays a role is probably because it previously was making consolekit calls before GDM was started, causing it to trigger the consolekit daemon activation instead of GDM. It's first call would have potentially failed (i don't know what the ramifications of that failure is), but all subsequent calls by GDM would succeed. I'm going to do a little more investigation to 1) confirm that this theory is actually correct 2) determine what the proper fix is. Possibilities include: a) tell consolekit not to daemonize when being activated (like my test mentioned above) b) fix consolekit to not exit its parent until it gets a name on the bus c) don't exec straight from the activation helper process, but instead fork a child, keep the activation helper alive as a watchdog until the child takes a name on the bus. This way the dbus daemon is never conflating exit codes from dbus-daemon-launch-helper and the activated service. I don't know if I'll do that investigation this weekend or early next week. Colin, David, Richard, thoughts? I always get the no display users at GDM on fresh reboot, and has been for a while. I upgraded to F12 from F11 about a week after F12's release, and possibly it started to happen a week after that but since I don't reboot that often (mostly suspend/resume, or just leave it on over night; the problem probably started earlier than 7th, I think. I found that I can get gdm to display users by switching to a different vt, log in as root, telinit 3 then telinit 5. (i.e. shutdown the X server and start it again). Still the problem is annoying. Is this relevant? :0-slave.log gdm-simple-slave[1471]: WARNING: Unable to open session: Launch helper exited with unknown return code 0 gdm-simple-slave[1471]: WARNING: Unable to close session: no session open this in :0-greeter.log? (polkit-gnome-authentication-agent-1:1611): polkit-gnome-1-WARNING **: Unable to determine the session we are in: Remote Exception invoking org.freedesktop.ConsoleKit.Manager.Get SessionForUnixProcess() on /org/freedesktop/ConsoleKit/Manager at name org.freedesktop.ConsoleKit: org.freedesktop.ConsoleKit.Manager.GeneralError: Unable to lookup session infor mation for process '1611' org.freedesktop.ConsoleKit.Manager.GeneralError Unable%20to%20lookup%20session%20information%20for%20process%20%271611%27 ** (process:1612): DEBUG: Greeter session pid=1612 display=:0.0 xauthority=/var/run/gdm/auth-for-gdm-guKZVq/database socket(): Address family not supported by protocol ah ha: I mentioned my theory to mclasen and he googled and found this: http://lists.alioth.debian.org/pipermail/pkg-utopia-maintainers/2008-December/003957.html so others have hit this issue before. Hin-Tak, sounds like the same bug. and mclasen found a patch: http://patch-tracker.debian.org/patch/series/view/consolekit/0.4.1-2/04-defer_daemonizing.patch It's broken though. Using a dbus connection gotten before a daemon() call after a daemon() call is a really bad idea. Created attachment 377832 [details]
untested patch
This untested patch may fix the problem.
Now, when an activated process exits, the code examines the exit code
for failure before immediately assuming an error.
Some relevant discussion I had with mclasen: <mclasen> halfline: the way the activation code is set up makes ignoring exit code 0 somewhat impractical, though <halfline> mclasen: i wrote a patch <halfline> didn't test it though <halfline> https://bugzilla.redhat.com/attachment.cgi?id=377832 <mclasen> if you want to keep the pending activations around to wait for the service after the child dies, how do you clean them up eventually ? <halfline> they have a timeout <halfline> or they get cleaned up when a bus name is registered <mclasen> I guess that might work <halfline> remember dbus already has to handle the case where the activated service starts but never gets on the bus <halfline> anyway patch might work or it might not <halfline> i'll experiment on monday <mclasen> still feels like a bug in the service, anyway <halfline> basically just wanted to get something posted in case walters shows up <halfline> mclasen: it could definitely be defined as a bug in the service <halfline> i don't think the relationship between the daemon and activated services has really been speced out <halfline> i mean i don't think there are any documented rules <mclasen> right <halfline> given that, it's a bit unsavory to pick the one that leaves a lot of real world existing cases broken <halfline> had we documented it from the start, i would say you're totally right. the service is in the wrong <halfline> but since we didn't document it, and all the services are getting it wrong... we should probably try to accomodate them <mclasen> but then, there's no real reason to daemonize here, in the first place <halfline> the only reason to daemonize is because these things started out as init scripts <halfline> and when they were moved to dbus activation, nobody thought about it <halfline> they just thought, "oh, move it from /etc/init.d/blah shell script to /usr/share/dbus-1/system-services/blah desktop file" <mclasen> just add --no-daemon and it works <halfline> sure <halfline> we could have a flag day and fix all the services <halfline> and have the repeated pain of fixing services that get it wrong in the future <mclasen> so, instead of just ignoring the exit 0, you want to log the offending service and keep waiting <halfline> we could do that.. document daemonizing as illegal, accept it but generate spew <halfline> or we could just say daemonizing is legal, and not have spew <mclasen> daemonizing is legal, if you do it right <halfline> well no one does it right though <halfline> because glibc ships a broken by design daemon() api *** Bug 546958 has been marked as a duplicate of this bug. *** I think we should just allow daemonization; it's ugly, but if consumers understand that dbus activation is just "we'll run this binary and wait for a name to show up from someone" (ok, allowing for nonzero exit to be an error). Patch looks right in broad outline, I'll have a go at writing a test in the dbus code for this and getting it committed. Awesome, thanks. FWIW, I did a local build of this this morning, fixed my ConsoleKit service file to not do --no-daemon again (i hacked it up earlier, see comment 17) and then rebooted 10 times. It came up properly all 10 times. So in limited testing it seems to check out. ConsoleKit-0.4.1-2.fc12 whose update description says "This update fixes a race condition in ConsoleKit activation which could lead to gdm not showing a user list after booting." does NOT fix the issue for me. The list of users is not shown after reboot. /var/log/gdm/:0-greeter.log says: (polkit-gnome-authentication-agent-1:1787): polkit-gnome-1-WARNING **: Unable to determine the session we are in: Remote Exception invoking org.freedesktop.ConsoleKit.Manager.GetSessionForUnixProcess() on /org/freedesktop/ConsoleKit/Manager at name org.freedesktop.ConsoleKit: org.freedesktop.ConsoleKit.Manager.GeneralError: Unable to lookup session information for process '1787' org.freedesktop.ConsoleKit.Manager.GeneralError Unable%20to%20lookup%20session%20information%20for%20process%20%271787%27 ** (process:1789): DEBUG: Greeter session pid=1789 display=:0.0 xauthority=/var/run/gdm/auth-for-gdm-7n8elp/database Failed to play sound: File or data not found Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message with a timestamp of 0 for 0x140002b (Login Wind) Window manager warning: meta_window_activate called by a pager with a 0 timestamp; the pager needs to be fixed. gdm-simple-greeter[1789]: WARNING: Unable to find users: no seat-id found rpm -qa hal hal-libs ConsoleKit\* gdm: ConsoleKit-x11-0.4.1-2.fc12.x86_64 ConsoleKit-libs-0.4.1-2.fc12.x86_64 gdm-2.28.2-1.fc12.x86_64 hal-libs-0.5.14-1.fc12.x86_64 ConsoleKit-0.4.1-2.fc12.x86_64 hal-0.5.14-1.fc12.x86_64 Downgrading hal and hal-libs to 0.5.13-9.fc12 still helps. This is perfectly reproducible here. dbus-1.2.16-9.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/dbus-1.2.16-9.fc12 Still reproducible with dbus-{,libs,x11}-1.2.16-9.fc12.x86_64 Still not getting a user list with dbus-1.2.16-9.fc12.i686. Log snippet: Dec 18 12:23:48 localhost kernel: [drm:drm_mode_rmfb] *ERROR* tried to remove a fb that we didn't own Dec 18 12:23:52 localhost gdm-simple-slave[1064]: WARNING: Unable to open session: Method "OpenSessionWithParameters" with signature "a(sv)" on interface "org.freedesktop.ConsoleKit.Manager" doesn't exist#012#012 Dec 18 12:23:58 localhost kernel: type=1307 audit(1261157038.466:17725): cwd="/var/gdm" Dec 18 12:23:58 localhost kernel: type=1302 audit(1261157038.466:17725): item=0 name="/usr/lib/libgvfscommon.so.0" inode=8787 dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:lib_t:s0 Dec 18 12:23:58 localhost dbus: Rejected send message, 1 matched rules; type="method_call", sender=":1.23" (uid=42 pid=1182 comm="gnome-power-manager) interface="org.freedesktop.Hal.Device.LaptopPanel" member="GetBrightness" error name="(unset)" requested_reply=0 destination=":1.1" (uid=0 pid=884 comm="hald)) Dec 18 12:24:01 localhost gdm-simple-greeter[1186]: WARNING: Unable to find users: no seat-id found Dec 18 12:24:01 localhost auditd[1222]: Started dispatcher: /sbin/audispd pid: 1224 Dec 18 12:24:01 localhost audispd: audispd initialized with q_depth=80 and 1 active plugins Dec 18 12:24:01 localhost auditd[1222]: Init complete, auditd 2.0.4 listening for events (startup state enable) Dec 18 12:24:10 localhost gdm-simple-slave[1064]: WARNING: Unable to close session: no session open#012 Dec 18 12:24:14 localhost kernel: fuse init (API version 7.12) would you mind adding --debug to the Exec line of /usr/share/dbus-1/system-services/org.freedesktop.ConsoleKit.service and then attaching /var/log/messages? okay after about 20 reboots I got this to happen on my system with the latest dbus and consolekit. It looks like we're hitting *another* bug by being the first user to use ConsoleKit. There's one thing that looks suspicious in the ConsoleKit code. We take a name on the bus before we register a handler for the call GDM is making. So the race could be something like: 1) GDM calls OpenSessionWithParameters 2) d-bus notices consolekit isn't running and starts it 3) consolekit starts up and takes a name on the bus 4) consolekit registers handlers for the methods it supports 5) consolekit enters its event loop to process events When the d-bus daemon notices ConsoleKit has taken a name (step 3) it delivers the pending gdm call. If that happens before step 4 then it could explain this behavior. What I don't get is, I don't think ConsoleKit should be processing any requests from the d-bus daemon until 5. I'll push a ConsoleKit build that reorders 3 and 4, to close the potential race and also investigate how messages could be getting processed before 5. So ConsoleKit calls: polkit_authority_get () before step 4 from comment 32. looking through the eggdbus source code (used by polkit), I that it does on occassion create a main loop and use it. That would explain this problem. I can't find a direct link from polkit_authority_get to eggdbus's code that does the main loop running from inspectiong, but there are are a twists and turns in the code, so it's conceivable i'm just missing it. I would appreciate feedback on the ConsoleKit build: http://koji.fedoraproject.org/koji/buildinfo?buildID=147736 Looks good. With this build I can't reproduce it anymore. ConsoleKit-0.4.1-3.fc12.x86_64 dbus-1.2.16-9.fc12.x86_64 gdm-2.28.2-1.fc12.x86_64 hal-0.5.14-1.fc12.x86_64 Confirmed, I'm now seeing a list of users with ConsoleKit-0.4.1-3.fc12.i686 and dbus-1.2.16-9.fc12.i686. Okay thanks, I'll push to stable. dbus-1.2.16-9.fc12, ConsoleKit-0.4.1-3.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report. *** Bug 545639 has been marked as a duplicate of this bug. *** > dbus-1.2.16-9.fc12, ConsoleKit-0.4.1-3.fc12 has been pushed ...
It has been pushed and announced (FEDORA-2009-13408) but the catch is they did not show up in repositories even when packages from the next batch already did. No good to have CLOSED ERRATA when updates are MIA.
Curiously enough update notifications for ConsoleKit-0.4.1-3.fc12 and dbus-1.2.16-9.fc12 have the same FEDORA-2009-13408 marker on two different mailings. Is this like it should be?
They showed up on my mirror(s), fine, updated yesterday, per /var/log/yum.log: Dec 22 11:04:57 Updated: 1:dbus-1.2.16-9.fc12.x86_64 ... Dec 22 11:05:01 Updated: ConsoleKit-0.4.1-3.fc12.x86_64 > They showed up on my mirror(s), fine, ...
Do you have updates-testing enabled by any chance? Searching directly on five or six mirror sites picked up from a relevant metalink.xml did not reveal any traces of dbus-1.2.16-9.fc12 and ConsoleKit-0.4.1-3.fc12 in 'updates'. OTOH with
'yum --enablerepo=updates-testing ...' all these packages popped out immediately.
I upgraded to versions mentioned in #34 a few days ago and haven't had the problem for the last couple of reboots. (In reply to comment #41) > > They showed up on my mirror(s), fine, ... > > Do you have updates-testing enabled by any chance? Searching directly on five > or six mirror sites picked up from a relevant metalink.xml did not reveal any > traces of dbus-1.2.16-9.fc12 and ConsoleKit-0.4.1-3.fc12 in 'updates'. OTOH > with > 'yum --enablerepo=updates-testing ...' all these packages popped out > immediately. You can get at early testing rpms from koji, instead of updates-testing - e.g. for hal : http://koji.fedoraproject.org/koji/packageinfo?packageID=74 oh, yes, updates-testing here. I missed that it had been pushed to stable already, my bad. > I missed that it had been pushed to stable Yes, supposedly it was; and announced that way by FEDORA-2009-13408 too. https://www.redhat.com/archives/fedora-package-announce/2009-December/msg01082.html https://www.redhat.com/archives/fedora-package-announce/2009-December/msg01083.html Only these packages are not updates and that my comment was about. Comment #37 claims that "dbus-1.2.16-9.fc12, ConsoleKit-0.4.1-3.fc12 has been pushed to the Fedora 12 stable repository". There were also published annoucements to this effect (cf. comment #44). A week later these updates still did not arrive. Due to that for a "general user population" this is still as broken is it was before. I filed a rel-eng ticket for the lost update: https://fedorahosted.org/rel-eng/ticket/3227 (In reply to comment #46) > I filed a rel-eng ticket for the lost update: > https://fedorahosted.org/rel-eng/ticket/3227 dbus-1.2.16-9.fc12 and ConsoleKit-0.4.1-3.fc12 at last showed up in updates so I guess that this bug can be closed. Where they were stuck and why is another good question. (In reply to comment #47) > (In reply to comment #46) > > I filed a rel-eng ticket for the lost update: > > https://fedorahosted.org/rel-eng/ticket/3227 > > dbus-1.2.16-9.fc12 and ConsoleKit-0.4.1-3.fc12 at last showed up in updates so > I guess that this bug can be closed. Where they were stuck and why is another > good question. It was a bug in Bodhi (the Fedora updates system). Luke Macken fixed it (see the rel-eng ticket for details). I still have this problem with a fresh install of F12. ConsoleKit-0.4.1-3.fc12.x86_64 dbus-1.2.16-9.fc12.x86_64 gdm-2.28.2-1.fc12.x86_64 hal-0.5.13-9.fc12.x86_64 Unlike comment #34 the version of HAL is not updated to 0.5.14-1 and enabling the fedora-testing-updates repo does not offer this updated version. A search for the package gives results, but the package itself is nowhere to be found. A version for F13 is the only option. It's here, https://admin.fedoraproject.org/updates/F12/FEDORA-2009-12840 But it got pulled due to negative feedback. Just installed: hal-0.5.14-1.fc13.x86_64.rpm hal-libs-0.5.14-1.fc13.x86_64.rpm Still no users in GDM. Content of /var/log/gdm/:0-greeter.log showed this error: ** (ck-history:1981): WARNING **: Error opening /var/log/ConsoleKit/history (Permission denied) Doing the most primitive thing: chmod 777 /var/log/ConsoleKit/history Fixed the problem. Users are now visible. What should the correct permissions be on the above file? Correction: only 1 user shows up - I jumped to conclusions before, sorry! Logging in as another and out again does not include that user on the list. Restarting (GDM/machine) does not change this. Only the first user to log in is displayed. |