Hide Forgot
Created attachment 1218479 [details] journal when closing one of two vnc connections at the gdm login screen, killing both Description of problem: We have a system where users can connect to with vnc, each time getting a new session that starts with a gdm login screen through xdmcp. RHEL7.3 and its gdm-3.14.2-19.el7 brings us some improvement and a regression that is extremely similar to #1377987 and probably related to it. Before, whenever a user logged out before the screen would just go black and hang for a long time. The comment in the beginning of the patch may explain why: "gnome-shell and the session dbus daemon don't automatically exit when gnome-session does. They, instead, wait for the display to exit or regenerate. If the display is remote, that won't happen until the keep alive timeout." Instead of waiting for the timeout, most users would just close their vnc client window to be rid of it, and eventually stopped logging out altogether since it didn't do them much good. Thanks to the patch, logging out now works much better. Unfortunately, NOT logging out and just closing the vnc client window now kills gdm and everybodys sessions! This even happens if one closes the vnc client window at the gdm login screen where no option to logout is displayed. Version-Release number of selected component (if applicable): gdm-3.14.2-19.el7 from RHEL7.3. It does not have this problem when building it without the notify-xdmcp-about-session-end.patch, but the logout behaviour then is then no better than it was before. How reproducible: For us, very. Unfortunately for a bug report, our setup is quite custom and even uses a custom tigervnc-server. If the description above does not help point to the problem, I can provide more information and try to reproduce it using tigervnc-server from RHEL, but that will take some time.
The log shows: Nov 07 10:08:14 desktop-test1 gdm[6919]: XIO: fatal IO error 0 (Success) on X server "127.0.0.1:1" Nov 07 10:08:14 desktop-test1 gdm[6919]: after 272 requests (272 known processed) with 0 events remaining. an XIO error means the connection to the X server was cut. This could be GDM inadvertently killing itself in this loop: + /* Kill every client but ourselves, then close our own connection• + */• + for (client = 0;• + client <= highest_client;• + client += client_increment) {• +• + if (client != setup->resource_id_base)• + XKillClient (slave->priv->server_display, client);• + }• But the if (client != setup->resource_id_base) should protect it from that sort of thing. Another possibility is the server is resetting for some reason. What options do you pass to your Xvnc instance ? Does adding -noreset change the behavior ?
Adding -noreset does not help and neither does removing the entire loop above. Tried using tigervnc-server from RHEL7 with a similar setup; a socket looking something like this: [Unit] Description=VNC Socket for Per-Connection Servers [Socket] ListenStream=5900 Accept=yes KeepAlive=yes KeepAliveTimeSec=600 [Install] WantedBy=sockets.target Service looking like this: [Unit] Description=Remote desktop service (VNC) After=syslog.target network.target gdm.service [Service] Type=simple StandardInput=socket ExecStart=-/usr/sbin/Xvnc-wrapper %i WorkingDirectory=/ [Install] WantedBy=multi-user.target And a wrapper script looking like this: #!/bin/bash LAUNCHOPTIONS="-inetd -query localhost -geometry 1024x768 -depth 16 -once" LAUNCHOPTIONS+=" -fp /usr/share/X11/fonts/misc -desktop=desktop-test1" if [ "$serveraddr" = "::1:5900" -o "$serveraddr" = "127.0.0.1:5900" ]; then LAUNCHOPTIONS+=" -localhost" LAUNCHOPTIONS+=" -securitytypes=none" else LAUNCHOPTIONS+=" -securitytypes=VeNCrypt,X509None" LAUNCHOPTIONS+=" -x509key=/etc/pki/tls/private/vnc.key" LAUNCHOPTIONS+=" -x509cert=/etc/pki/tls/certs/vnc.crt" fi /usr/bin/Xvnc $LAUNCHOPTIONS
okay, if removing the loop doesn't fix it, then it must be one of the other two patches in the file. I'll have to try to reproduce.
In case it's helpful, I think it happens at the XCloseDisplay(slave->priv->server_display); line after the loop. If I remove it, closing the vnc client windows doesn't break other sessions. The other consequence is that if I log out I get a black screen for a while, followed by a new gdm login screen. Before it would wait much longer at the black screen, then report end of stream.
that's very strange, XCloseDisplay() shouldn't lead to an XIO error ! I'll investigate, there might be an xlib bug here.
so thinking about this more, I guess the real problem is that we have a display connection open at all from the main daemon process. The X server, in theory, can go away at any time, leading to an XIOError and exit. We've sort of dodged the issue in the past by not doing anything with the X connection after opening it (since we don't do anything with the connection, the connection never has a chance to notice its disconnected). Now that we do something on the connection we're susceptible to XIOErrors. Two options: 1) we can add a gdm_io_error_trap_push/pop call that side steps the XIOError/exit with longjmp or 2) we can move the X connection to separate process, so the exit is recoverable. Both aren't ideal, am I'm not sure which is the right way to go yet.
So at first I started to sketch out the longjmp approach mentioned as 1) in comment 7. I posted that experiment here: https://git.gnome.org/browse/gdm/commit/?h=wip/longjmp-off-a-short-bridge&id=87448650f98d1677772f73dd8b5796af68650941 I never got around to testing it because it really feels "wrong" to me. The fundamental problem is that xlib doesn't support graceful handling of display server disconnection. It forces an exit() call. A solution I hadn't considered in comment 7 but later thought about is to stop using xlib entirely. There's an alternative library for talking to X servers called xcb, and that library does allow for gracefully handling display connections disappearing. At first that may seem a little drastic, but GDM doesn't actually need to do much to the displays it creates. So I did that upstream here: https://bugzilla.gnome.org/show_bug.cgi?id=776059 The downstream patch is a little difference, since the codebase has shifted some, but I've built test packages here: http://people.redhat.com/rstrode/gdm/1392832/ Do you mind giving them a try and see if they work for your needs?
I built the new package and tried it. Logging out now works well, the session is closed and other connections are not abruptly cut. However, it seems that I can't log in twice with the same account now, the second session gets stopped before it really starts: Dec 23 08:48:28 desktop-test1 systemd[1]: Started Session 331 of user testuser. Dec 23 08:48:28 desktop-test1 systemd-logind[896]: New session 331 of user testuser. Dec 23 08:48:28 desktop-test1 systemd[1]: Starting Session 331 of user testuser. Dec 23 08:48:29 desktop-test1 systemd[1]: Stopping Session 331 of user testuser. Dec 23 08:48:29 desktop-test1 systemd[1]: Stopped Session 331 of user testuser. Dec 23 08:48:29 desktop-test1 systemd-logind[896]: Removed session 331. I have not investigated this much further than this yet.
can you try this with the 7.4 beta?
Unfortunately the 7.4 beta occurred at a bad time, but I tried this now for the 7.5 beta. The original bug seems to remain fixed. As in Comment 9, I still can't have a single user log in twice at the same time. This may be a completely different problem though, and while it would be nice to get a fix for it the impact is still fairly small in comparison. I'll see if I can find some clues what is going on.
The remaining issue appears to be gone as of rhel 7.6. this bug can be closed.