Bug 787222 - Polkit makes gnome-shell unusable
Summary: Polkit makes gnome-shell unusable
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: polkit
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: David Zeuthen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-03 15:38 UTC by Zdenek Kabelac
Modified: 2013-03-06 04:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-06 18:17:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Crash from failed gnome-shell (10.63 KB, text/plain)
2012-02-03 15:38 UTC, Zdenek Kabelac
no flags Details
Crash from pkcheck (5.29 KB, text/plain)
2012-02-03 15:40 UTC, Zdenek Kabelac
no flags Details
Busy-looping mission (4.55 KB, text/plain)
2012-02-03 15:42 UTC, Zdenek Kabelac
no flags Details
Patch (1.89 KB, patch)
2012-02-06 16:45 UTC, David Zeuthen
no flags Details | Diff

Description Zdenek Kabelac 2012-02-03 15:38:27 UTC
Created attachment 559309 [details]
Crash from failed gnome-shell

Description of problem:

Since it takes already almost 2 months that my gnome-shell used to work I've decrypted after long session with Tomas the key issue why the whole system doesn't work on my laptop - meanwhile I've been using openbox-gnome-session - which had a bit limited functionality - but as long as I didn't wont to modify anything in NetworkManager connection it seemed to work good enough.

So the key issue is the crashing polkit connection (which is even probably completely untested  code path - as all apps are crashing just when they are about to report such problem).

I'll attach backtrace from gnome-shell crash - and pkcheck crash.

Version-Release number of selected component (if applicable):
Any version 104 does NOT work.

I had to revert to version 103 - this version works:
polkit-devel-0.103-1.fc17.x86_64

How reproducible:
Login into gnome-shell

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Zdenek Kabelac 2012-02-03 15:40:35 UTC
Created attachment 559310 [details]
Crash from pkcheck

Simple tool is crashing - instead of giving some hint's what could be wrong - oh my...

Is this even tested by any anyone??

Comment 2 Zdenek Kabelac 2012-02-03 15:42:38 UTC
Created attachment 559312 [details]
Busy-looping mission

Even though I've no idea what is this thing good for - it's looping on CPU after gnome-shell is aborted.

Comment 3 David Zeuthen 2012-02-03 16:41:10 UTC
I have no idea why you think this is a polkit problem or why you are posting stack traces from random programs. If anything, it looks like your D-Bus system bus instance is not working as intended.

Please don't post attachments as application/octet-stream - use text/plain so they are viewable in the browser instead of forcing the user to download it.

In general, you need to use 't a a bt' to get stack traces from all stacks.

I would try running selinux in permissive mode and see if that helps - more often than not, the SELinux policies are out of date or not covering all kinds of random corner cases that Dan Walsh didn't think of.

Comment 4 Zdenek Kabelac 2012-02-03 22:03:19 UTC
(In reply to comment #3)
> I have no idea why you think this is a polkit problem or why you are posting
> stack traces from random programs. If anything, it looks like your D-Bus system
> bus instance is not working as intended.
> 

Well simply because the only thing that needs to be downgraded in my uptodate  Rawhide to make everything usable again is the polkit package to version 103.

Traces are not 'random' program - trace are from Gnome 'key' programs and all seem to be showing one simple thing - failing connection to polkin subsystem - error pointer is not even set to some value - thus all programs are crashing
(I guess this is another API fault)

pkcheck is actually polkit's utility - so it's definitely not a random binary.


> Please don't post attachments as application/octet-stream - use text/plain so
> they are viewable in the browser instead of forcing the user to download it.

Ahh sorry - probably forget to click - anyway rhbz is poor at autodetection of plain attachments.

> 
> In general, you need to use 't a a bt' to get stack traces from all stacks.

The binaries are all showing crash on failed connection - there is nothing more to see there - 'error' pointer is NULL.


> 
> I would try running selinux in permissive mode and see if that helps - more
> often than not, the SELinux policies are out of date or not covering all kinds
> of random corner cases that Dan Walsh didn't think of.

SELinux is disabled (on kernel boot =0) so there is no interaction which could be blamed to SELinux.

Please note version 103 is 100% functional, just version 104 doesn't work.

I'm not saying it must be the bug in polkit - it might as well be some interaction between systemd and ConsoleKit - since my machine is continually updated Rawhide (and no - I'm not going to reinstall my whole system).

So I expect you propose a way how to debug/trace this issue - since there is really nothing reasonable I could've googled to track anything polkit related.

For this moment I'm considering version 104 unusable on updated machine - while it's possible it may work on fresh install - but there is no doc about what has changed and what must be reconfigured from version 103, so the version 104 would work.

I assume there is a few people on this planet who actually do know - how the Gnome start-up works (I've been just 'amazed' how many thousands files are being opened these days... )

Comment 5 Zdenek Kabelac 2012-02-05 14:58:31 UTC
So after some more playing around the polkit thing - it seems like  until version  104 -  user could login on console - and start everything just by typing  'startx' 

With version  104 this is no longer true - and user has to manually start his environment via  ~/.xinitr file with content like this:

exec ck-launch-session openbox-gnome-session 

(or exec ck-launch-session gnome-session)

Since looking at /etc/X11/xinit/xinitrc-common I'm suspecting someone is playing  some games with XDG_SESSION_COOKIE - not really sure what are the plans here - but definitely whole system must be updated to work - and especially startup of X 
session.

Comment 6 David Zeuthen 2012-02-06 15:36:03 UTC
(In reply to comment #5)
> So after some more playing around the polkit thing - it seems like  until
> version  104 -  user could login on console - and start everything just by
> typing  'startx' 
> 
> With version  104 this is no longer true - and user has to manually start his
> environment via  ~/.xinitr file with content like this:
> 
> exec ck-launch-session openbox-gnome-session 
> 
> (or exec ck-launch-session gnome-session)
> 
> Since looking at /etc/X11/xinit/xinitrc-common I'm suspecting someone is
> playing  some games with XDG_SESSION_COOKIE - not really sure what are the
> plans here - but definitely whole system must be updated to work - and
> especially startup of X 
> session.

Thanks for investigating this. Looks like there are possibly two bugs here

 - something is broken if you use the startx(1) path
 - polkit_unix_session_new_for_process_sync() hangs in that case

I'll try to reproduce the latter, I think the problem is that we're not setting the error properly.

Comment 7 Tomáš Bžatek 2012-02-06 15:37:49 UTC
Hi David,

I was debugging the whole problem with Zdenek last week. The first backtrace attached here is the most painful appearance of the issue. The code in shell-polkit-authentication-agent.c is apparently running in process and the NULL dereference makes whole gnome-shell going away. Leading to automatic respawn, another crash and after few iterations user gets that nasty session load fail screen with only the Logout button available.

(In reply to comment #3)
> I have no idea why you think this is a polkit problem or why you are posting
> stack traces from random programs. If anything, it looks like your D-Bus system
> bus instance is not working as intended.
I've tried to call random method from the org.freedesktop.PolicyKit1 name available on system bus and got some reply, so I suppose the system bus worked fine. Tested using d-feet running in plain xterm X session started via 'startx'.

The problem here is polkit_unix_session_new_for_process_sync() returning NULL and not setting the GError variable passed in. Since this particular function is widely used, it affects other utilities and applications as well (such as pkcheck).

Looking at polkitunixsession-systemd.c sources, could the problem be that polkit_unix_session_initable_init() may return FALSE without setting an error?

Comment 8 David Zeuthen 2012-02-06 15:45:44 UTC
(In reply to comment #7)
> Hi David,
> 
> I was debugging the whole problem with Zdenek last week. The first backtrace
> attached here is the most painful appearance of the issue. The code in
> shell-polkit-authentication-agent.c is apparently running in process and the
> NULL dereference makes whole gnome-shell going away. Leading to automatic
> respawn, another crash and after few iterations user gets that nasty session
> load fail screen with only the Logout button available.

Yes. Also, shell-polkit-authentication-agent.c should be using the async methods whenever possible because we must never block the compositor (but that doesn't matter much anyway, really, since polkit_unix_session_new_for_process_sync() when using systemd is never blocking anyway).

> The problem here is polkit_unix_session_new_for_process_sync() returning NULL
> and not setting the GError variable passed in. Since this particular function
> is widely used, it affects other utilities and applications as well (such as
> pkcheck).
> 
> Looking at polkitunixsession-systemd.c sources, could the problem be that
> polkit_unix_session_initable_init() may return FALSE without setting an error?

Yup, the fix, polkit-wise, is very likely to just set the error.

Comment 9 David Zeuthen 2012-02-06 16:45:42 UTC
Created attachment 559688 [details]
Patch

Untested patch (I'm not near any F17 system to test this) - try to rebuild the polkit rpm with this and see if that fixes the problem. Thanks.

Comment 10 David Zeuthen 2012-02-06 18:17:29 UTC
I just tested the patch and it works great insofar that gnome-shell doesn't crash if there is no systemd-logind session for its PID. Instead, the shell will print a warning (check .xsession-errors) and continue but it will not be a polkit authentication agent. So some things will just not work.

This fix is in polkit-0.104.4.fc17

http://koji.fedoraproject.org/koji/taskinfo?taskID=3766420

Please open a new bug for the root problem - e.g. that no new systemd-logind session is created when using startx(1).

Comment 11 Zdenek Kabelac 2012-02-06 20:18:21 UTC
OK, I've tested my case - and now it works at least in a way that NetworkManager is able to use gnome-keyring and gnome-shell doesn't crash - thought .xsession-errors is giving polkit warning and error so it surely needs to be fixed.


Note You need to log in before you can comment on or make changes to this bug.