Bug 1384096 - user's login session sometimes fails to start because no permission on DRI
Summary: user's login session sometimes fails to start because no permission on DRI
Keywords:
Status: CLOSED DUPLICATE of bug 1371596
Alias: None
Product: Fedora
Classification: Fedora
Component: gdm
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ray Strode [halfline]
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-12 14:06 UTC by Ian Collier
Modified: 2016-10-13 11:17 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-13 11:17:07 UTC
Type: Bug


Attachments (Terms of Use)
Xorg.0.log of the user while unsuccessfully logging in (14.25 KB, text/plain)
2016-10-12 14:06 UTC, Ian Collier
no flags Details
Xorg.0.log of root attempting to start X on the console (and failing) (7.17 KB, text/plain)
2016-10-12 14:07 UTC, Ian Collier
no flags Details

Description Ian Collier 2016-10-12 14:06:09 UTC
Created attachment 1209622 [details]
Xorg.0.log of the user while unsuccessfully logging in

Every so often, in an unpredictable fashion, gdm (or some system process) gets
into a state where users can't log in: when the correct password is entered,
the system tries to start the session and fails, then returns to the login
screen.

The user's Xorg.0.log file says things like:

 vesa: Ignoring device with a bound kernel driver
 (EE) modeset(0): drmSetMaster failed: Permission denied
 (EE) AddScreen/ScreenInit failed for driver 0

(bearing in mind the driver for this system should be intel(4) not
modesetting(4))

whereas if one logs on to the console as root and tries to start X,
it says things like:

 (EE) intel(0): [drm] failed to set drm interface version: Permission denied [13].
 (EE) intel(0): Failed to claim DRM device.

The problem seems to be related to this:

# cat /sys/kernel/debug/dri/0/clients 
             command   pid dev master a   uid      magic
           <unknown>  1004   0   y    y     0          0
            Xwayland  1520   0   n    y    42          1
            Xwayland  1520   0   n    y    42          2
            Xwayland  1520   0   n    y    42          3

There is no process 1004 running on the system.  However, if one
kills process 1520 then gdm restarts and the ghost of process 1004
disappears:

# cat /sys/kernel/debug/dri/0/clients 
             command   pid dev master a   uid      magic
      systemd-logind  4955   0   n    y     0          0
            Xwayland  6780   0   n    y    42          1
            Xwayland  6780   0   n    y    42          2

At that point, users are again able to log in successfully.

Comment 1 Ian Collier 2016-10-12 14:07:40 UTC
Created attachment 1209623 [details]
Xorg.0.log of root attempting to start X on the console (and failing)

Comment 2 Ian Collier 2016-10-12 16:55:32 UTC
Digging further... in the logs we have a record of pid 1004 crashing - 
it was systemd-logind.

Question would be why didn't the pid disappear from the DRI clients when the
process crashed?

Further mystery: we have recently installed Fedora 24 on 85 machines, and
16 of them have recorded a systemd-logind crash at exactly 18:50:00 on 
several different dates.  We don't have anything that happens at 18:50
(closest is a cron job that runs at 18:35 and runs a command that calls
systemd-inhibit then maybe issues a shutdown for 19:00 and sleeps for
1800 seconds.  None of these machines did in fact shut down at 19:00).

Anyway, if systemd-logind is crashing then maybe this is in fact a systemd
bug.  However, it would be nice if gdm could restart properly when
systemd-login crashes.

Comment 3 Ian Collier 2016-10-13 11:16:11 UTC
Right, the systemd crash is Bug 1371596 and that's just been fixed
so hopefully once all our machines have been rebooted to restart their
systemd this will no longer be an issue.

Comment 4 Ian Collier 2016-10-13 11:17:07 UTC

*** This bug has been marked as a duplicate of bug 1371596 ***


Note You need to log in before you can comment on or make changes to this bug.