Bug 821569

Summary: Reconnection to existing xrdp sessions fails
Product: [Fedora] Fedora Reporter: Bojan Smojver <bojan>
Component: xrdpAssignee: Itamar Reis Peixoto <itamar>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 17CC: itamar, johannbg, lpoetter, metherid, notting, plautrba, systemd-maint
Target Milestone: ---Keywords: Patch
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-17 00:00:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Add a fork to satisfy systemd-logind
none
Add a fork to satisfy systemd-logind none

Description Bojan Smojver 2012-05-15 00:32:55 UTC
Description of problem:
Since upgrading to F-17, I've noticed that if the session has been left alone for some time (overnight), the reconnection to it may not work. The old session is still there (I can see Xvnc running), but a new one is create instead, with new Gnome session inside it etc.

Version-Release number of selected component (if applicable):
0.5.0-0.15.fc17

How reproducible:
Sometimes.

Steps to Reproduce:
1. Connect to an xrdp session. Work with it.
2. Leave the session alone overnight (i.e. disconnect from it).
3. Attempt to reconnect the next morning.
  
Actual results:
Cannot find existing session.

Expected results:
This worked just fine with F-16. Never had this problem before.

Additional info:
I am running a somewhat unusual resolution in the session, 1520x840, if it somehow matters.

Comment 1 Bojan Smojver 2012-05-15 02:35:36 UTC
This may be subtly related to bug #820815. I use ICAClient to connect to Citrix and RDP from there. Because of bug #820815, I have to resize the window by hand. I just noticed that some of the VNC sessions had resolution set to 1524x846.

It seems that when Gnome (or Xvnc underneath it) encounters an "unknown" resolution (i.e. the one that has not been set yet in Display properties), it somehow disconnects, instead of actually attempting to use that.

I have reported something like that here already:

http://lists.fedoraproject.org/pipermail/devel/2012-May/166904.html

I do not really know what is causing this. And because I never had to resize ICAClient window under F-16, this probably never cropped up.

Comment 2 Bojan Smojver 2012-05-15 09:59:24 UTC
I also tried a few times without resizing and that is failing as well. Instead, new sessions, with exactly the same resolutions, are being started. The old sessions are still there when I run ps.

Comment 3 Bojan Smojver 2012-05-15 11:27:46 UTC
If it matters, I have 11 xrdp-chansrv processes running right now and not a single Xvnc as a result of an RDP connection (i.e. everyone's been logged out).

Comment 4 Bojan Smojver 2012-05-16 00:53:40 UTC
This may be somehow related to systemd. When I run systemd-loginctl on my laptop, I can see my session just fine (my laptop is not the machine running xrdp - just using it as an example of what works):
-------------------
[bojan@shrek ~]$ systemd-loginctl 
   SESSION        UID USER             SEAT            
         2       5000 bojan            seat0           

1 sessions listed.
-------------------

However, if I do the same inside my xrdp session (Xvnc, really) on the remote system, I get this:
-------------------
[user@hmachine pam.d]$ systemd-loginctl 
   SESSION        UID USER             SEAT            
       c33          0 root                             

1 sessions listed
-------------------

This is my root login in one of the gnome-terminals, in order to run journalctl. If I close that, I get:
-------------------
[user@machine pam.d]$ systemd-loginctl 
   SESSION        UID USER             SEAT            

0 sessions listed.
-------------------

How can there be no sessions when I'm clearly logged in?

When I run journalctl -a, I see this:
-------------------
May 16 09:47:10 machine systemd-logind[670]: New session c30 of user user.
May 16 09:47:10 machine systemd-logind[670]: Session c30 has display :12 with nonexisting socket /tmp/.X11-unix/X12.
May 16 09:47:10 machine xrdp-sesman[28412]: pam_unix(xrdp-sesman:session): session opened for user user by (uid=0)
May 16 09:47:10 machine systemd-logind[670]: Removed session c30.
-------------------

That socket sure exists and fuser shows that is is being uses by Xvnc process.

This could be something similar to bug #749962, which has been closed as not a bug.

Comment 5 Bojan Smojver 2012-05-16 01:08:08 UTC
I'm thinking now that this problem is not actually intermittent.

I use Citrix ICAClient to get to the machine that then RDPs into my Linux machine, which runs xrdp. If I tell ICAClient to disconnect (meaning, Windows RDP client will keep running for a while), I can then "connect" to my xrdp session again (in fact, the session was never disconnected between Windows RDP client and xrdp). If I say to ICAClient to log off (or if relatively long time passes since disconnection), Window RDP client will terminate, which will end connection between it and xrdp and I will never be able to get to that xrdp session any more.

Comment 6 Bojan Smojver 2012-05-16 07:00:21 UTC
It seems that systemd-logind sends a SIGHUP to xrdp-sessvc, which then dies, which then sends SIGCHLD to xrdp-sesman, which reaps the xrdp session (in its signal handler) as if it never existed. So, naturally, on next connection attempt, xrdp has no idea that there is a PID out there still doing stuff.

Best guess...

Comment 7 Bojan Smojver 2012-05-16 09:28:43 UTC
Workaround: kick pam_systemd.so out of the xrdp-sesman config.

My config now (session bit copied from password-auth, minus systemd stuff):
---------
#%PAM-1.0
auth       include      password-auth
account    include      password-auth
#session    include      password-auth

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so
session     optional      pam_sss.so
---------

Obviously, the above won't work or everyone, as password-auth will be system specific.

The real fix would be to either fix systemd-logind (actually find that X11 socket of Xvnc) or xrdp (ignore HUP from systemd-logind).

Comment 8 Bojan Smojver 2012-05-16 23:47:36 UTC
Just connected to the session that's been left alone overnight. So, the workaround is solid.

Comment 9 Bojan Smojver 2012-05-17 04:57:43 UTC
Going to assign this to systemd folks just to see whether they have any ideas as to what the problem might be. Some kind of race between xrdp session code and pam_systemd.so?

Comment 10 Bojan Smojver 2012-05-18 04:03:53 UTC
It seems that xrdp code in session.c calls auth_start_session() (which underneath calls pam_open_session(), which then calls stuff form pam_systemd.so, which I guess uses systemd-logind to do stuff) _before_ Xvnc gets started. So, of course, pam_systemd.so cannot possibly find the socket - it has not been created yet.

Maybe xrdp should move auth_start_session() to after Xvnc is up and running? Or are we having some sort of chicken and egg here?

PS. Although the warning about Xvnc socket not being there is just that - a warning - I'm guessing it is eventually important enough for the session to be terminated. I haven't followed the code of systemd enough to verify, to be honest.

Comment 11 Bojan Smojver 2012-05-18 04:54:32 UTC
(In reply to comment #10)
 
> Maybe xrdp should move auth_start_session() to after Xvnc is up and running?

Yeah, that would not work at all. It has to be before.

> Or are we having some sort of chicken and egg here?

Maybe Xvnc that gets started by xrdp should invoke PAM session, in order to set all the required stuff there?

Comment 12 Bojan Smojver 2012-05-18 05:47:21 UTC
Maybe something like this would work:
---------------------
diff --git a/sesman/session.c b/sesman/session.c
index 36262ad..6029a91 100644
--- a/sesman/session.c
+++ b/sesman/session.c
@@ -369,7 +369,6 @@ session_start_fork(int width, int height, int bpp, char* use
   }
   else if (pid == 0) /* child sesman */
   {
-    auth_start_session(data, display);
     g_sprintf(geometry, "%dx%d", width, height);
     g_sprintf(depth, "%d", bpp);
     g_sprintf(screen, ":%d", display);
@@ -383,6 +382,7 @@ session_start_fork(int width, int height, int bpp, char* use
       env_set_user(username, 0, display);
       if (x_server_running(display))
       {
+        auth_start_session(data, display);
         auth_set_env(data);
         if (directory != 0)
         {
---------------------

Maybe we could be doing the PAM session stuff once we know we have the X server running?

Comment 13 Bojan Smojver 2012-05-18 07:37:37 UTC
Assigning back to xrdp.

Comment 14 Bojan Smojver 2012-05-18 09:11:16 UTC
With that patch, now I get:
---------------
May 18 18:13:44 machine xrdp-sesman: pam_unix(xrdp-sesman:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=  user=<user>
May 18 18:13:45 machine xrdp-sesman: pam_sss(xrdp-sesman:auth): authentication success; logname= uid=0 euid=0 tty= ruser= rhost= user=<user>
May 18 18:13:45 machine xrdp-sesman: pam_systemd(xrdp-sesman:session): Failed to create session: Access denied
May 18 18:13:45 machine xrdp-sesman: pam_unix(xrdp-sesman:session): session opened for user <user> by (uid=<uid>)
---------------

So, still not quite right, but I can reconnect. Nothing dies due to SIGHUP or anything.

Comment 15 Bojan Smojver 2012-05-18 09:24:12 UTC
D-Bus error causing the problem now:
---------------
May 18 19:19:12 machine dbus-daemon[673]: dbus[673]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.65" (uid=<uid> pid=4183 comm="/usr/sbin/xrdp-sesman ") interface="org.freedesktop.login1.Manager" member="CreateSession" error name="(unset)" requested_reply="0" destination="org.freedesktop.login1" (uid=0 pid=662 comm="/usr/lib/systemd/systemd-logind ")
May 18 19:19:12 machine dbus[673]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.65" (uid=<uid> pid=4183 comm="/usr/sbin/xrdp-sesman ") interface="org.freedesktop.login1.Manager" member="CreateSession" error name="(unset)" requested_reply="0" destination="org.freedesktop.login1" (uid=0 pid=662 comm="/usr/lib/systemd/systemd-logind ")
---------------

Comment 16 Bojan Smojver 2012-05-18 09:45:24 UTC
If I add:
---------------
                <allow send_destination="org.freedesktop.login1"
                       send_interface="org.freedesktop.login1.Manager"
                       send_member="CreateSession"/>
---------------

To /etc/dbus-1/system.d/org.freedesktop.login1.conf, I can get a session via D-Bus.

But it dies once again very shortly:
---------------
May 18 19:40:45 machine systemd-logind[678]: New session c2 of user <user>.
May 18 19:40:45 machine systemd-logind[678]: Linked /tmp/.X11-unix/X10 to /run/user/<user>/X11-display.
May 18 19:40:45 machine systemd-logind[678]: Removed session c2.
---------------

Seriously running out of ideas...

Comment 17 Bojan Smojver 2012-05-20 23:39:30 UTC
Assigning again to systemd after searching for similar problems (am am also seeing "...killed" on su sessions).

It seems that pam_systemd is the root cause of the issue here. The "nonexistent socket" warning is just a red herring and although xrdp may benefit from the patch I attached here, it won't really affect the outcome.

Also, it seems that D-Bus file for systemd is missing CreateSession entry anyway.

Comment 18 Bojan Smojver 2012-05-21 04:12:53 UTC
This is the code of systemd-logind we seem to be running through, when xrdp is firing up:
-----------------
Breakpoint 1, manager_run (m=m@entry=0x8514008) at src/login/logind.c:1204
1204                            if (event.data.u32 >= FD_FIFO_BASE)
Missing separate debuginfos, use: debuginfo-install sssd-client-1.8.3-11.fc17.i686
(gdb) n
1205                                    manager_pipe_notify_eof(m, event.data.u32 - FD_FIFO_BASE);
(gdb) s
manager_pipe_notify_eof (fd=12, m=0x8514008) at src/login/logind.c:858
858             assert_se(fd >= 0);
(gdb) n
860             assert_se(s = hashmap_get(m->fifo_fds, INT_TO_PTR(fd + 1)));
(gdb) n
861             assert(s->fifo_fd == fd);
(gdb) n
862             session_remove_fifo(s);
(gdb) n
864             session_stop(s);
(gdb) n
manager_run (m=m@entry=0x8514008) at src/login/logind.c:1169
1169                    manager_gc(m, true);
(gdb) c
Continuing.
-----------------

Now, this obviously will close the session (the event.u32 is 16, so fd is 12). So, the session is opened and shortly after manager_run() function gets a closure of fd 12 event and kills it. Still trying to find out why...

Comment 19 Bojan Smojver 2012-05-21 04:34:32 UTC
(In reply to comment #18)
> So, the session is opened and shortly after manager_run() function gets
> a closure of fd 12 event and kills it. Still trying to find out why...

All I can see for now is that this happens after startwm.sh gets executed and that particular FD is not part of the original process.

Comment 20 Bojan Smojver 2012-05-21 05:50:30 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > So, the session is opened and shortly after manager_run() function gets
> > a closure of fd 12 event and kills it. Still trying to find out why...
> 
> All I can see for now is that this happens after startwm.sh gets executed
> and that particular FD is not part of the original process.

Hmm, I think I see now what's going on. The session FIFO is being opened with O_CLOEXEC, so naturally, when xrdp session exec()'s startwm.sh, this FD get closed, which then tells systemd-logind to close the session. Interesting...

So, is exec()-ing things from the session process verboten in systemd? How is this supposed to be worked around?

Comment 21 Bojan Smojver 2012-05-21 07:22:26 UTC
There is a call like this in pam-systemd.c:
------------------
        if (!dbus_message_get_args(reply, &error,
                                   DBUS_TYPE_STRING, &id,
                                   DBUS_TYPE_OBJECT_PATH, &object_path,
                                   DBUS_TYPE_STRING, &runtime_path,
                                   DBUS_TYPE_UNIX_FD, &session_fd,
                                   DBUS_TYPE_STRING, &seat,
                                   DBUS_TYPE_UINT32, &vtnr,
                                   DBUS_TYPE_INVALID)) {
                pam_syslog(handle, LOG_ERR, "Failed to parse message: %s", bus_e
                r = PAM_SESSION_ERR;
                goto finish;
        }
------------------

The documentation for dbus_message_get_args() says:
------------------
Unix file descriptors that are read with this function will have the FD_CLOEXEC flag set. If you need them without this flag set, make sure to unset it with fcntl()
------------------

So, maybe that is what would fix the whole thing? Just guessing here...

PS. Combined with the patch to xrdp and CreateSession addition to D-Bus file of systemd.

Comment 22 Bojan Smojver 2012-05-21 07:41:44 UTC
(In reply to comment #21)
> There is a call like this in pam-systemd.c:

Er, pam-module.c.

Comment 23 Lennart Poettering 2012-05-21 12:11:53 UTC
Seems Xvnc is not calling the PAM hooks the right way. It should start the PAM session in the parent and end the session in the parent, too, after the child died. Only the child should drop privileges.

Or in pseudo code:

in parent {
    pam_open_session();
    fork();
    in child {
        setresuid();
        ...
    }
    waitpid(child);
    pam_close_session();
}

The X display name should be passed into pam_open_session().

Comment 24 Bojan Smojver 2012-05-21 21:54:41 UTC
(In reply to comment #23)
> Seems Xvnc is not calling the PAM hooks the right way. It should start the
> PAM session in the parent and end the session in the parent, too, after the
> child died. Only the child should drop privileges.

It is not Xvnc calling the hooks. It is xrdp. That's because xrdp needs to authenticate the user.

> Or in pseudo code:
> 
> in parent {
>     pam_open_session();
>     fork();
>     in child {
>         setresuid();
>         ...
>     }
>     waitpid(child);
>     pam_close_session();
> }
> 
> The X display name should be passed into pam_open_session().

The problem occurs on exec() of startwm.sh, where the session FD (the one to the open FIFO, connected to systemd-logind) gets closed due to O_CLOEXEC flag used on it. Every process that opens a PAM session, calls exec() and uses pam_systemd will fail the same way. No?

Comment 25 Lennart Poettering 2012-05-21 21:58:28 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > Seems Xvnc is not calling the PAM hooks the right way. It should start the
> > PAM session in the parent and end the session in the parent, too, after the
> > child died. Only the child should drop privileges.
> 
> It is not Xvnc calling the hooks. It is xrdp. That's because xrdp needs to
> authenticate the user.

Hmm? in your comment #1 you said xvnc. I assume both have roughly the same functionality though, right?

> > Or in pseudo code:
> > 
> > in parent {
> >     pam_open_session();
> >     fork();
> >     in child {
> >         setresuid();
> >         ...
> >     }
> >     waitpid(child);
> >     pam_close_session();
> > }
> > 
> > The X display name should be passed into pam_open_session().
> 
> The problem occurs on exec() of startwm.sh, where the session FD (the one to
> the open FIFO, connected to systemd-logind) gets closed due to O_CLOEXEC
> flag used on it. Every process that opens a PAM session, calls exec() and
> uses pam_systemd will fail the same way. No?

The fd is not supposed to be retained by the child. It is supposed to be retained in the parent only, and will be opened by pam_open_session() and closed in pam_close_session(). We explicitly want O_CLOEXEC set for it.

If in your case the fd ends up being closed too early this indicates that pam_close_session() is called too early or that the parent goes away too early by some other way.

Comment 26 Bojan Smojver 2012-05-21 22:04:01 UTC
(In reply to comment #25)

> Hmm? in your comment #1 you said xvnc. I assume both have roughly the same
> functionality though, right?

The xrdp code starts Xvnc underneath, but it is xrdp code that calls pam hooks.
 
> The fd is not supposed to be retained by the child. It is supposed to be
> retained in the parent only, and will be opened by pam_open_session() and
> closed in pam_close_session(). We explicitly want O_CLOEXEC set for it.

O_CLOEXEC closes the FD on exec(), not on fork().

> If in your case the fd ends up being closed too early this indicates that
> pam_close_session() is called too early or that the parent goes away too
> early by some other way.

The pam_close_session() never gets called, as far as I can tell. I think the reason for systemd-logind closing the session is the fact that when exec() is executed, FD gets closed on that side, which sends EOF to the other end.

Comment 27 Bojan Smojver 2012-05-21 22:08:02 UTC
(In reply to comment #26)
 
> O_CLOEXEC closes the FD on exec(), not on fork().

Let me simplify. Imagine there is no fork() - just one process that calls pam_open_session() (and is configured to use pam_systemd). So, this tells systemd-logind to open a session. Then this process calls exec(), which will close the FIFO FD, causing the session to close. No?

Comment 28 Lennart Poettering 2012-05-21 23:18:12 UTC
(In reply to comment #26)
> (In reply to comment #25)
> 
> > Hmm? in your comment #1 you said xvnc. I assume both have roughly the same
> > functionality though, right?
> 
> The xrdp code starts Xvnc underneath, but it is xrdp code that calls pam
> hooks.
>  
> > The fd is not supposed to be retained by the child. It is supposed to be
> > retained in the parent only, and will be opened by pam_open_session() and
> > closed in pam_close_session(). We explicitly want O_CLOEXEC set for it.
> 
> O_CLOEXEC closes the FD on exec(), not on fork().

Yes, that is indeed the case. But this only closes the fd of the child. The copy of the parent is retained.

> 
> > If in your case the fd ends up being closed too early this indicates that
> > pam_close_session() is called too early or that the parent goes away too
> > early by some other way.
> 
> The pam_close_session() never gets called, as far as I can tell. I think the
> reason for systemd-logind closing the session is the fact that when exec()
> is executed, FD gets closed on that side, which sends EOF to the other end.

Uh, if pam_close_session() is never called than this is a bug of xrdb/xvnc.

But no, the exec() will only cause the child fd to be closed, not the copy in the parent.

Comment 29 Lennart Poettering 2012-05-21 23:21:40 UTC
(In reply to comment #27)
> (In reply to comment #26)
>  
> > O_CLOEXEC closes the FD on exec(), not on fork().
> 
> Let me simplify. Imagine there is no fork() - just one process that calls
> pam_open_session() (and is configured to use pam_systemd). So, this tells
> systemd-logind to open a session. Then this process calls exec(), which will
> close the FIFO FD, causing the session to close. No?

This is not how PAM's pam_open_session() and pam_close_session() work. (Yes, the documentation of this really sucks, but the algorithm in #23 is how this is supposed to work).

You have to match every pam_open_session() with a pam_close_session() and you have to fork() of the actual session process off it. You have to call both fork() and exec(). exec() alone is not sufficient.

Comment 30 Bojan Smojver 2012-05-21 23:27:36 UTC
(In reply to comment #29)
 
> This is not how PAM's pam_open_session() and pam_close_session() work. (Yes,
> the documentation of this really sucks, but the algorithm in #23 is how this
> is supposed to work).
> 
> You have to match every pam_open_session() with a pam_close_session() and
> you have to fork() of the actual session process off it. You have to call
> both fork() and exec(). exec() alone is not sufficient.

That is what xrdp does, AFAICT. Never mind, I can just take pam_systemd out of the stack then.

Comment 31 Bojan Smojver 2012-05-22 01:38:53 UTC
(In reply to comment #29)

> You have to match every pam_open_session() with a pam_close_session() and
> you have to fork() of the actual session process off it. You have to call
> both fork() and exec(). exec() alone is not sufficient.

Actually, pam_close_session() never gets called. Here is why:

- original process, which calls pam_open_session()
- fork
  - parent exec()'s Xvnc, which closes FIFO FD, but not fatal at this
point
  - child waits for Xvnc to be up and then exec()-s startwm.sh, which
closes FIFO FD and the session is gone

The PAM session is never explicitly closed (because both parent and
child exec()), but that surely cannot be a problem - programs crash all
the time - PAM should be able to handle this. They also exec() all the
time. Surely, the session should end when all processes from the session
are done, not when something gets exec()-ed.

I guess one way to fix this in xrdp is to fork one more time, before the
fork above.

However, I still think that what pam_systemd does by setting O_CLOEXEC
is not correct. Clearly, that process is part of the session, even if it
exec()-s something else.

Comment 32 Bojan Smojver 2012-05-22 02:10:19 UTC
Created attachment 585916 [details]
Add a fork to satisfy systemd-logind

This adds another fork just before startwm.sh is started and it moves PAM session start to after Xvnc is up, so that systemd-logind can link the socket. It also introduces auth_stop_session() function.

Comment 33 Bojan Smojver 2012-05-22 02:35:46 UTC
Created attachment 585917 [details]
Add a fork to satisfy systemd-logind

OOPS! sorry, forgot one exit. This should be better.

Comment 34 Bojan Smojver 2012-05-22 02:45:44 UTC
Assigned back to xrdp. Please include the patch in a new build. I just tested it on my system and it works fine, including session teardown.

I can now see the xrdp session with loginctl. I can also reconnect successfully, after RDP client has been shut down.

Comment 35 Bojan Smojver 2012-05-23 01:06:15 UTC
Patch also submitted upstream:

http://sourceforge.net/tracker/?func=detail&aid=3528954&group_id=112022&atid=665248

Comment 36 Fedora Update System 2012-05-28 00:49:50 UTC
xrdp-0.6.0-0.5.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/xrdp-0.6.0-0.5.fc17

Comment 37 Fedora Update System 2012-05-29 10:29:22 UTC
Package xrdp-0.6.0-0.5.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing xrdp-0.6.0-0.5.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-8538/xrdp-0.6.0-0.5.fc17
then log in and leave karma (feedback).

Comment 38 Fedora Update System 2012-06-17 00:00:10 UTC
xrdp-0.6.0-0.5.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.