Bug 821569
Summary: | Reconnection to existing xrdp sessions fails | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bojan Smojver <bojan> | ||||||
Component: | xrdp | Assignee: | Itamar Reis Peixoto <itamar> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 17 | CC: | itamar, johannbg, lpoetter, metherid, notting, plautrba, systemd-maint | ||||||
Target Milestone: | --- | Keywords: | Patch | ||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-06-17 00:00:10 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Bojan Smojver
2012-05-15 00:32:55 UTC
This may be subtly related to bug #820815. I use ICAClient to connect to Citrix and RDP from there. Because of bug #820815, I have to resize the window by hand. I just noticed that some of the VNC sessions had resolution set to 1524x846. It seems that when Gnome (or Xvnc underneath it) encounters an "unknown" resolution (i.e. the one that has not been set yet in Display properties), it somehow disconnects, instead of actually attempting to use that. I have reported something like that here already: http://lists.fedoraproject.org/pipermail/devel/2012-May/166904.html I do not really know what is causing this. And because I never had to resize ICAClient window under F-16, this probably never cropped up. I also tried a few times without resizing and that is failing as well. Instead, new sessions, with exactly the same resolutions, are being started. The old sessions are still there when I run ps. If it matters, I have 11 xrdp-chansrv processes running right now and not a single Xvnc as a result of an RDP connection (i.e. everyone's been logged out). This may be somehow related to systemd. When I run systemd-loginctl on my laptop, I can see my session just fine (my laptop is not the machine running xrdp - just using it as an example of what works): ------------------- [bojan@shrek ~]$ systemd-loginctl SESSION UID USER SEAT 2 5000 bojan seat0 1 sessions listed. ------------------- However, if I do the same inside my xrdp session (Xvnc, really) on the remote system, I get this: ------------------- [user@hmachine pam.d]$ systemd-loginctl SESSION UID USER SEAT c33 0 root 1 sessions listed ------------------- This is my root login in one of the gnome-terminals, in order to run journalctl. If I close that, I get: ------------------- [user@machine pam.d]$ systemd-loginctl SESSION UID USER SEAT 0 sessions listed. ------------------- How can there be no sessions when I'm clearly logged in? When I run journalctl -a, I see this: ------------------- May 16 09:47:10 machine systemd-logind[670]: New session c30 of user user. May 16 09:47:10 machine systemd-logind[670]: Session c30 has display :12 with nonexisting socket /tmp/.X11-unix/X12. May 16 09:47:10 machine xrdp-sesman[28412]: pam_unix(xrdp-sesman:session): session opened for user user by (uid=0) May 16 09:47:10 machine systemd-logind[670]: Removed session c30. ------------------- That socket sure exists and fuser shows that is is being uses by Xvnc process. This could be something similar to bug #749962, which has been closed as not a bug. I'm thinking now that this problem is not actually intermittent. I use Citrix ICAClient to get to the machine that then RDPs into my Linux machine, which runs xrdp. If I tell ICAClient to disconnect (meaning, Windows RDP client will keep running for a while), I can then "connect" to my xrdp session again (in fact, the session was never disconnected between Windows RDP client and xrdp). If I say to ICAClient to log off (or if relatively long time passes since disconnection), Window RDP client will terminate, which will end connection between it and xrdp and I will never be able to get to that xrdp session any more. It seems that systemd-logind sends a SIGHUP to xrdp-sessvc, which then dies, which then sends SIGCHLD to xrdp-sesman, which reaps the xrdp session (in its signal handler) as if it never existed. So, naturally, on next connection attempt, xrdp has no idea that there is a PID out there still doing stuff. Best guess... Workaround: kick pam_systemd.so out of the xrdp-sesman config. My config now (session bit copied from password-auth, minus systemd stuff): --------- #%PAM-1.0 auth include password-auth account include password-auth #session include password-auth session optional pam_keyinit.so revoke session required pam_limits.so session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid session required pam_unix.so session optional pam_sss.so --------- Obviously, the above won't work or everyone, as password-auth will be system specific. The real fix would be to either fix systemd-logind (actually find that X11 socket of Xvnc) or xrdp (ignore HUP from systemd-logind). Just connected to the session that's been left alone overnight. So, the workaround is solid. Going to assign this to systemd folks just to see whether they have any ideas as to what the problem might be. Some kind of race between xrdp session code and pam_systemd.so? It seems that xrdp code in session.c calls auth_start_session() (which underneath calls pam_open_session(), which then calls stuff form pam_systemd.so, which I guess uses systemd-logind to do stuff) _before_ Xvnc gets started. So, of course, pam_systemd.so cannot possibly find the socket - it has not been created yet. Maybe xrdp should move auth_start_session() to after Xvnc is up and running? Or are we having some sort of chicken and egg here? PS. Although the warning about Xvnc socket not being there is just that - a warning - I'm guessing it is eventually important enough for the session to be terminated. I haven't followed the code of systemd enough to verify, to be honest. (In reply to comment #10) > Maybe xrdp should move auth_start_session() to after Xvnc is up and running? Yeah, that would not work at all. It has to be before. > Or are we having some sort of chicken and egg here? Maybe Xvnc that gets started by xrdp should invoke PAM session, in order to set all the required stuff there? Maybe something like this would work: --------------------- diff --git a/sesman/session.c b/sesman/session.c index 36262ad..6029a91 100644 --- a/sesman/session.c +++ b/sesman/session.c @@ -369,7 +369,6 @@ session_start_fork(int width, int height, int bpp, char* use } else if (pid == 0) /* child sesman */ { - auth_start_session(data, display); g_sprintf(geometry, "%dx%d", width, height); g_sprintf(depth, "%d", bpp); g_sprintf(screen, ":%d", display); @@ -383,6 +382,7 @@ session_start_fork(int width, int height, int bpp, char* use env_set_user(username, 0, display); if (x_server_running(display)) { + auth_start_session(data, display); auth_set_env(data); if (directory != 0) { --------------------- Maybe we could be doing the PAM session stuff once we know we have the X server running? Assigning back to xrdp. With that patch, now I get: --------------- May 18 18:13:44 machine xrdp-sesman: pam_unix(xrdp-sesman:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=<user> May 18 18:13:45 machine xrdp-sesman: pam_sss(xrdp-sesman:auth): authentication success; logname= uid=0 euid=0 tty= ruser= rhost= user=<user> May 18 18:13:45 machine xrdp-sesman: pam_systemd(xrdp-sesman:session): Failed to create session: Access denied May 18 18:13:45 machine xrdp-sesman: pam_unix(xrdp-sesman:session): session opened for user <user> by (uid=<uid>) --------------- So, still not quite right, but I can reconnect. Nothing dies due to SIGHUP or anything. D-Bus error causing the problem now: --------------- May 18 19:19:12 machine dbus-daemon[673]: dbus[673]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.65" (uid=<uid> pid=4183 comm="/usr/sbin/xrdp-sesman ") interface="org.freedesktop.login1.Manager" member="CreateSession" error name="(unset)" requested_reply="0" destination="org.freedesktop.login1" (uid=0 pid=662 comm="/usr/lib/systemd/systemd-logind ") May 18 19:19:12 machine dbus[673]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.65" (uid=<uid> pid=4183 comm="/usr/sbin/xrdp-sesman ") interface="org.freedesktop.login1.Manager" member="CreateSession" error name="(unset)" requested_reply="0" destination="org.freedesktop.login1" (uid=0 pid=662 comm="/usr/lib/systemd/systemd-logind ") --------------- If I add: --------------- <allow send_destination="org.freedesktop.login1" send_interface="org.freedesktop.login1.Manager" send_member="CreateSession"/> --------------- To /etc/dbus-1/system.d/org.freedesktop.login1.conf, I can get a session via D-Bus. But it dies once again very shortly: --------------- May 18 19:40:45 machine systemd-logind[678]: New session c2 of user <user>. May 18 19:40:45 machine systemd-logind[678]: Linked /tmp/.X11-unix/X10 to /run/user/<user>/X11-display. May 18 19:40:45 machine systemd-logind[678]: Removed session c2. --------------- Seriously running out of ideas... Assigning again to systemd after searching for similar problems (am am also seeing "...killed" on su sessions). It seems that pam_systemd is the root cause of the issue here. The "nonexistent socket" warning is just a red herring and although xrdp may benefit from the patch I attached here, it won't really affect the outcome. Also, it seems that D-Bus file for systemd is missing CreateSession entry anyway. This is the code of systemd-logind we seem to be running through, when xrdp is firing up: ----------------- Breakpoint 1, manager_run (m=m@entry=0x8514008) at src/login/logind.c:1204 1204 if (event.data.u32 >= FD_FIFO_BASE) Missing separate debuginfos, use: debuginfo-install sssd-client-1.8.3-11.fc17.i686 (gdb) n 1205 manager_pipe_notify_eof(m, event.data.u32 - FD_FIFO_BASE); (gdb) s manager_pipe_notify_eof (fd=12, m=0x8514008) at src/login/logind.c:858 858 assert_se(fd >= 0); (gdb) n 860 assert_se(s = hashmap_get(m->fifo_fds, INT_TO_PTR(fd + 1))); (gdb) n 861 assert(s->fifo_fd == fd); (gdb) n 862 session_remove_fifo(s); (gdb) n 864 session_stop(s); (gdb) n manager_run (m=m@entry=0x8514008) at src/login/logind.c:1169 1169 manager_gc(m, true); (gdb) c Continuing. ----------------- Now, this obviously will close the session (the event.u32 is 16, so fd is 12). So, the session is opened and shortly after manager_run() function gets a closure of fd 12 event and kills it. Still trying to find out why... (In reply to comment #18) > So, the session is opened and shortly after manager_run() function gets > a closure of fd 12 event and kills it. Still trying to find out why... All I can see for now is that this happens after startwm.sh gets executed and that particular FD is not part of the original process. (In reply to comment #19) > (In reply to comment #18) > > So, the session is opened and shortly after manager_run() function gets > > a closure of fd 12 event and kills it. Still trying to find out why... > > All I can see for now is that this happens after startwm.sh gets executed > and that particular FD is not part of the original process. Hmm, I think I see now what's going on. The session FIFO is being opened with O_CLOEXEC, so naturally, when xrdp session exec()'s startwm.sh, this FD get closed, which then tells systemd-logind to close the session. Interesting... So, is exec()-ing things from the session process verboten in systemd? How is this supposed to be worked around? There is a call like this in pam-systemd.c: ------------------ if (!dbus_message_get_args(reply, &error, DBUS_TYPE_STRING, &id, DBUS_TYPE_OBJECT_PATH, &object_path, DBUS_TYPE_STRING, &runtime_path, DBUS_TYPE_UNIX_FD, &session_fd, DBUS_TYPE_STRING, &seat, DBUS_TYPE_UINT32, &vtnr, DBUS_TYPE_INVALID)) { pam_syslog(handle, LOG_ERR, "Failed to parse message: %s", bus_e r = PAM_SESSION_ERR; goto finish; } ------------------ The documentation for dbus_message_get_args() says: ------------------ Unix file descriptors that are read with this function will have the FD_CLOEXEC flag set. If you need them without this flag set, make sure to unset it with fcntl() ------------------ So, maybe that is what would fix the whole thing? Just guessing here... PS. Combined with the patch to xrdp and CreateSession addition to D-Bus file of systemd. (In reply to comment #21) > There is a call like this in pam-systemd.c: Er, pam-module.c. Seems Xvnc is not calling the PAM hooks the right way. It should start the PAM session in the parent and end the session in the parent, too, after the child died. Only the child should drop privileges. Or in pseudo code: in parent { pam_open_session(); fork(); in child { setresuid(); ... } waitpid(child); pam_close_session(); } The X display name should be passed into pam_open_session(). (In reply to comment #23) > Seems Xvnc is not calling the PAM hooks the right way. It should start the > PAM session in the parent and end the session in the parent, too, after the > child died. Only the child should drop privileges. It is not Xvnc calling the hooks. It is xrdp. That's because xrdp needs to authenticate the user. > Or in pseudo code: > > in parent { > pam_open_session(); > fork(); > in child { > setresuid(); > ... > } > waitpid(child); > pam_close_session(); > } > > The X display name should be passed into pam_open_session(). The problem occurs on exec() of startwm.sh, where the session FD (the one to the open FIFO, connected to systemd-logind) gets closed due to O_CLOEXEC flag used on it. Every process that opens a PAM session, calls exec() and uses pam_systemd will fail the same way. No? (In reply to comment #24) > (In reply to comment #23) > > Seems Xvnc is not calling the PAM hooks the right way. It should start the > > PAM session in the parent and end the session in the parent, too, after the > > child died. Only the child should drop privileges. > > It is not Xvnc calling the hooks. It is xrdp. That's because xrdp needs to > authenticate the user. Hmm? in your comment #1 you said xvnc. I assume both have roughly the same functionality though, right? > > Or in pseudo code: > > > > in parent { > > pam_open_session(); > > fork(); > > in child { > > setresuid(); > > ... > > } > > waitpid(child); > > pam_close_session(); > > } > > > > The X display name should be passed into pam_open_session(). > > The problem occurs on exec() of startwm.sh, where the session FD (the one to > the open FIFO, connected to systemd-logind) gets closed due to O_CLOEXEC > flag used on it. Every process that opens a PAM session, calls exec() and > uses pam_systemd will fail the same way. No? The fd is not supposed to be retained by the child. It is supposed to be retained in the parent only, and will be opened by pam_open_session() and closed in pam_close_session(). We explicitly want O_CLOEXEC set for it. If in your case the fd ends up being closed too early this indicates that pam_close_session() is called too early or that the parent goes away too early by some other way. (In reply to comment #25) > Hmm? in your comment #1 you said xvnc. I assume both have roughly the same > functionality though, right? The xrdp code starts Xvnc underneath, but it is xrdp code that calls pam hooks. > The fd is not supposed to be retained by the child. It is supposed to be > retained in the parent only, and will be opened by pam_open_session() and > closed in pam_close_session(). We explicitly want O_CLOEXEC set for it. O_CLOEXEC closes the FD on exec(), not on fork(). > If in your case the fd ends up being closed too early this indicates that > pam_close_session() is called too early or that the parent goes away too > early by some other way. The pam_close_session() never gets called, as far as I can tell. I think the reason for systemd-logind closing the session is the fact that when exec() is executed, FD gets closed on that side, which sends EOF to the other end. (In reply to comment #26) > O_CLOEXEC closes the FD on exec(), not on fork(). Let me simplify. Imagine there is no fork() - just one process that calls pam_open_session() (and is configured to use pam_systemd). So, this tells systemd-logind to open a session. Then this process calls exec(), which will close the FIFO FD, causing the session to close. No? (In reply to comment #26) > (In reply to comment #25) > > > Hmm? in your comment #1 you said xvnc. I assume both have roughly the same > > functionality though, right? > > The xrdp code starts Xvnc underneath, but it is xrdp code that calls pam > hooks. > > > The fd is not supposed to be retained by the child. It is supposed to be > > retained in the parent only, and will be opened by pam_open_session() and > > closed in pam_close_session(). We explicitly want O_CLOEXEC set for it. > > O_CLOEXEC closes the FD on exec(), not on fork(). Yes, that is indeed the case. But this only closes the fd of the child. The copy of the parent is retained. > > > If in your case the fd ends up being closed too early this indicates that > > pam_close_session() is called too early or that the parent goes away too > > early by some other way. > > The pam_close_session() never gets called, as far as I can tell. I think the > reason for systemd-logind closing the session is the fact that when exec() > is executed, FD gets closed on that side, which sends EOF to the other end. Uh, if pam_close_session() is never called than this is a bug of xrdb/xvnc. But no, the exec() will only cause the child fd to be closed, not the copy in the parent. (In reply to comment #27) > (In reply to comment #26) > > > O_CLOEXEC closes the FD on exec(), not on fork(). > > Let me simplify. Imagine there is no fork() - just one process that calls > pam_open_session() (and is configured to use pam_systemd). So, this tells > systemd-logind to open a session. Then this process calls exec(), which will > close the FIFO FD, causing the session to close. No? This is not how PAM's pam_open_session() and pam_close_session() work. (Yes, the documentation of this really sucks, but the algorithm in #23 is how this is supposed to work). You have to match every pam_open_session() with a pam_close_session() and you have to fork() of the actual session process off it. You have to call both fork() and exec(). exec() alone is not sufficient. (In reply to comment #29) > This is not how PAM's pam_open_session() and pam_close_session() work. (Yes, > the documentation of this really sucks, but the algorithm in #23 is how this > is supposed to work). > > You have to match every pam_open_session() with a pam_close_session() and > you have to fork() of the actual session process off it. You have to call > both fork() and exec(). exec() alone is not sufficient. That is what xrdp does, AFAICT. Never mind, I can just take pam_systemd out of the stack then. (In reply to comment #29) > You have to match every pam_open_session() with a pam_close_session() and > you have to fork() of the actual session process off it. You have to call > both fork() and exec(). exec() alone is not sufficient. Actually, pam_close_session() never gets called. Here is why: - original process, which calls pam_open_session() - fork - parent exec()'s Xvnc, which closes FIFO FD, but not fatal at this point - child waits for Xvnc to be up and then exec()-s startwm.sh, which closes FIFO FD and the session is gone The PAM session is never explicitly closed (because both parent and child exec()), but that surely cannot be a problem - programs crash all the time - PAM should be able to handle this. They also exec() all the time. Surely, the session should end when all processes from the session are done, not when something gets exec()-ed. I guess one way to fix this in xrdp is to fork one more time, before the fork above. However, I still think that what pam_systemd does by setting O_CLOEXEC is not correct. Clearly, that process is part of the session, even if it exec()-s something else. Created attachment 585916 [details]
Add a fork to satisfy systemd-logind
This adds another fork just before startwm.sh is started and it moves PAM session start to after Xvnc is up, so that systemd-logind can link the socket. It also introduces auth_stop_session() function.
Created attachment 585917 [details]
Add a fork to satisfy systemd-logind
OOPS! sorry, forgot one exit. This should be better.
Assigned back to xrdp. Please include the patch in a new build. I just tested it on my system and it works fine, including session teardown. I can now see the xrdp session with loginctl. I can also reconnect successfully, after RDP client has been shut down. Patch also submitted upstream: http://sourceforge.net/tracker/?func=detail&aid=3528954&group_id=112022&atid=665248 xrdp-0.6.0-0.5.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/xrdp-0.6.0-0.5.fc17 Package xrdp-0.6.0-0.5.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing xrdp-0.6.0-0.5.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-8538/xrdp-0.6.0-0.5.fc17 then log in and leave karma (feedback). xrdp-0.6.0-0.5.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. |