Bug 1018196

Summary: logind fails to recognize active graphical session on seat0
Product: [Fedora] Fedora Reporter: Joseph Nuzman <jnuzman>
Component: systemdAssignee: systemd-maint
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: collura, johannbg, lbsousajr, lnykryn, msekleta, plautrba, rstrode, systemd-maint, vpavlin, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-18 09:29:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to pass "vt63" arg to non-seat0 X servers
none
GDM patch to explictly set variable XDG_VTNR for seat0
none
GDM patch to delay non-seat0 display servers startup
none
Xorg patch to keep a non-seat0 X server from touching VTs none

Description Joseph Nuzman 2013-10-11 12:17:14 UTC
I have a multiseat setup on Fedora 19, with Intel processor graphics attached to seat0, and an external AMD graphics card attached to seat-2.  Both seats come up running GDM.  However, the graphical session on seat0 is not considered "active".  As a side effect of this inactive state, the session on seat0 has issues with device permissions management and policykit privilege management.  The session on seat-2 has no such problems.

I suspect some sort of race condition, as the exact same setup has occasionally come up working perfectly fine, with the session on seat0 considered "active".  Recently, I have been unable to reproduce the good behavior, even after several reboots.

Any hints for how to debug this are appreciated.

Some details:

Session c1 on seat0 is the GDM session.  On seat0, I can switch VTs using the keyboard, and start a text session on VT 2.  Session 1 on seat0 is such a text session.  Currently, Session c1 is in the foreground.

[usernam@puente ~]$ loginctl list-sessions 
   SESSION        UID USER             SEAT            
        c1         42 gdm              seat0           
         1       1000 usernam          seat0           
         2       1000 usernam          seat-2          

Session c1 is listed as "State=online" and "Active=no", even though it is the foreground session for seat0.  VT number is listed as 0, even though I can switch to it using the keyboard as if it were VT 1.

[usernam@puente ~]$ loginctl show-session c1
Id=c1
Timestamp=Fri 2013-10-11 09:48:50 IDT
TimestampMonotonic=7139613
DefaultControlGroup=systemd:/user/42.user/c1.session
VTNr=0
Display=:0
Remote=no
Service=gdm-launch-environment
Leader=794
Audit=0
Type=x11
Class=greeter
Active=no
State=online
KillProcesses=no
IdleHint=no
IdleSinceHint=0
IdleSinceHintMonotonic=0
Name=gdm

Session 1 is correctly listed as online and inactive, since it is not in the foreground.  It is correctly listed as VT number 2.

[usernam@puente ~]$ loginctl show-session 1
Id=1
Timestamp=Fri 2013-10-11 09:49:09 IDT
TimestampMonotonic=26300158
DefaultControlGroup=systemd:/user/1000.user/1.session
VTNr=2
TTY=tty2
Remote=no
Service=login
Leader=1485
Audit=1
Type=tty
Class=user
Active=no
State=online
KillProcesses=no
IdleHint=no
IdleSinceHint=1381474208502032
IdleSinceHintMonotonic=85361402
Name=usernam

There is no ActiveSession property for seat0:

[usernam@puente ~]$ loginctl show-seat seat0
Id=seat0
CanMultiSession=yes
CanTTY=yes
CanGraphical=yes
Sessions=1 c1
IdleHint=no
IdleSinceHint=1381474208502032
IdleSinceHintMonotonic=85361402

[usernam@puente ~]$ loginctl show-seat seat-2
Id=seat-2
ActiveSession=2
CanMultiSession=no
CanTTY=no
CanGraphical=yes
Sessions=2
IdleHint=no
IdleSinceHint=0
IdleSinceHintMonotonic=0

I can activate session 1 using loginctl, but I cannot activate session c1:

[usernam@puente ~]$ sudo loginctl activate 1
[usernam@puente ~]$ loginctl show-seat seat0
Id=seat0
ActiveSession=1
CanMultiSession=yes
CanTTY=yes
CanGraphical=yes
Sessions=1 c1
IdleHint=no
IdleSinceHint=1381474208502032
IdleSinceHintMonotonic=85361401

[usernam@puente ~]$ sudo loginctl activate c1
Failed to issue method call: No such device or address

Comment 1 Laercio de Sousa 2013-10-25 17:42:36 UTC
I guess this is a GDM bug rather than a logind bug. I've seen the same problem in my multiseat setup in both GDM and KDM (for the last one, see bug #975079, already fixed). I'm also using LightDM with multiseat patches (from https://launchpad.net/~ubuntu-multiseat/+archive/ppa) without problems.

As a workaround for this problem with GDM, I do the following:

1. Dettach the secondary seat master device (in my case, a USB hub).
2. Reboot the system
3. Reattach the secondary seat master device, once GDM is up and running.

Please paste here also the output of command "ps -FC Xorg". I guess you'll have the following:

1. Primary X server is running on no TTY, although it's running with argument "vt7". This can be the cause of getting an inactive session for seat0.

2. Secondary X server is running on tty7, although no "vt*" argument is passed to it (instead, it's running with option "-sharevts").

Comment 2 Joseph Nuzman 2013-10-27 20:57:05 UTC
Created attachment 816588 [details]
Patch to pass "vt63" arg to non-seat0 X servers

Comment 3 Joseph Nuzman 2013-10-27 20:57:58 UTC
You are right that the trouble stems from the fact that the seat0 X server appears to not have a controlling TTY.  Your GDM workaround (unplug the secondary master) wouldn't work well for me as I'm using an internal PCIE card.  I do, however have a fix that works for me (see below).

My analysis of my specific problem:

The seat0 X server is started with a "vt1" argument.  The seat-2 X server is started with a "-sharevts" argument (only seat0 supports VT switching).  Even though it will not try to own a VT, the seat-2 X server does open a VT TTY (which defaults to VT 1 - /dev/tty1).  There seems to be a race between the two X serves as to who opens /dev/tty1 first, with the winner having VT1 considered its controlling TTY, and the loser no TTY.  In my case, the seat-2 X server always seems to win the race.

When creating a session on a seat that supports VTs (ie. seat0), the pam_systemd module (included in the logind source dir) will try to infer the VT number.  If the XDG_VTNR environment variable is set, it will assume that value.  GDM does not set this variable.  Otherwise, it figures out the X server PID from the X display socket, and then parses /proc/<pid>/stat for the controlling TTY.  If the X server has no controlling TTY, it will call into logind with VT number of 0, which causes all the trouble.

Note that in my case, the seat0 X server does respond properly to VT switching.  The kernel tracks one PID per VT that will get sent the switching signals for that VT.  The seat0 X server is properly registered as VT 1 owner, but as far as I can tell the PID registered for a VT is not directly exposed to user space.

There are several potential ways to tackle this problem, including:
* Change GDM to set the XDG_VTNR variable for seat0
* Improve the heuristic pam_systemd uses to infer the VT.  Perhaps parsing /proc/<pid>/cmdline for a "vtX" argument.
* Somehow ensure seat0 X server starts before other seats.

I chose to simply patch the systemd-multi-seat-x wrapper (also included in logind source dir) to pass a "vt63" argument to all non-seat0 X servers.  This is the same wrapper that already sets the "-sharedvts" option for non-seat0 X servers, among other things.  Since non-seat0 X servers will open VT devices, but can't actually own them, it is best to place them where they are unlikely to interfere with any legitimate VT sessions.

I'm attaching the patch I'm using.  (Incidently, the patch also changes the argv memcpy, which looks like it was being over-aggressive).

Comment 4 Laercio de Sousa 2013-10-29 13:29:20 UTC
> The seat0 X server is started with a "vt1" argument.  The seat-2 X server is
> started with a "-sharevts" argument (only seat0 supports VT switching). 
> Even though it will not try to own a VT, the seat-2 X server does open a VT
> TTY (which defaults to VT 1 - /dev/tty1).  There seems to be a race between
> the two X serves as to who opens /dev/tty1 first, with the winner having VT1
> considered its controlling TTY, and the loser no TTY.  In my case, the
> seat-2 X server always seems to win the race.
> 
> When creating a session on a seat that supports VTs (ie. seat0), the
> pam_systemd module (included in the logind source dir) will try to infer the
> VT number.  If the XDG_VTNR environment variable is set, it will assume that
> value.  GDM does not set this variable.  Otherwise, it figures out the X
> server PID from the X display socket, and then parses /proc/<pid>/stat for
> the controlling TTY.  If the X server has no controlling TTY, it will call
> into logind with VT number of 0, which causes all the trouble.
> 
> Note that in my case, the seat0 X server does respond properly to VT
> switching.  The kernel tracks one PID per VT that will get sent the
> switching signals for that VT.  The seat0 X server is properly registered as
> VT 1 owner, but as far as I can tell the PID registered for a VT is not
> directly exposed to user space.

Hmmm... VERY interesting analysis. 

> There are several potential ways to tackle this problem, including:
> * Change GDM to set the XDG_VTNR variable for seat0

This is the approach currently adopted in LightDM by contributors from Ubuntu Multiseat team, Richard Hansen and Alberts Muktupāvels (under consideration for merging into upstream).

This also seems to be the suggested approach in logind documentation (see http://www.freedesktop.org/wiki/Software/systemd/writing-display-managers/).

> * Improve the heuristic pam_systemd uses to infer the VT.  Perhaps parsing
> /proc/<pid>/cmdline for a "vtX" argument.

Good point.

> * Somehow ensure seat0 X server starts before other seats.

I used to think this would be the best approach, but now I'm unsure about it. A similar approach is adopted by Martin Bříza and Stefan Brüns for multiseat support in KDM on both Fedora and openSUSE: it enforces that the seat0 X server grabs the same VT used by Plymouth at boot.

Now I'm considering another approach:

* Somehow ensure non-seat0 X server grabs no VT at all, leaving all VTs free for seat0 X server.

Comment 5 Laercio de Sousa 2013-10-31 13:31:28 UTC
Created attachment 817910 [details]
GDM patch to explictly set variable XDG_VTNR for seat0

This is my alternative workaroud for the problem: a patch for GDM to set the variable XDG_VTNR for seat0. I've tested it in Ubuntu GNOME 13.10 with GDM 3.8.4, and it works.

Comment 6 Ray Strode [halfline] 2013-11-01 16:46:30 UTC
that patch is going to set XDG_VTNR to the wrong VT when user switching.

There's some code on the in-progress wayland branch to set XDG_VTNR, here:

https://git.gnome.org/browse/gdm/commit/?h=wip/wayland&id=0649b892fc02bad6c0fceafbb5c93b7f39e96446

but that commit does more than just set XDG_VTNR

Comment 7 Laercio de Sousa 2013-11-06 15:53:39 UTC
Created attachment 820504 [details]
GDM patch to delay non-seat0 display servers startup

This alternate approach for GDM induces a small delay between seat0 and non-seat0 display servers startup (seat0 one should start first), avoiding any race condition between them. No need of setting XDG_VTNR explicitly anymore.

Comment 8 Laercio de Sousa 2013-11-07 16:58:53 UTC
Created attachment 821255 [details]
Xorg patch to keep a non-seat0 X server from touching VTs

This is an aternate approach to avoid race condition between seat0 and non-seat0 X servers. With this patch, a X server shouldn't touch any VTs, if -seat option is passed with a value different from seat0. No other modification in systemd-logind and/or display managers is needed.

Comment 9 Laercio de Sousa 2014-04-07 15:16:54 UTC
This patch is already merged into upstream:
http://cgit.freedesktop.org/xorg/xserver/commit/?id=46cf2a60934076bf568062eb83121ce90b6ff596

So when xorg-x11-server release 1.16 (or a backport of this patch to 1.15) comes to Rawhide, this bug can be closed.

Comment 10 Lennart Poettering 2014-06-18 09:29:39 UTC
OK, fixed in X11, closing.