Created attachment 1166105 [details]
vncserver config for xinetd; error messages written when disconnect occurs.
Description of problem:
When a vnc connection is made to a port serviced by xinetd, with the attached configuration, it will disconnect from the Xsession writing "Fatal IO error 11" messages to the .xsession-errors log.
Version-Release number of selected component (if applicable):
How reproducible: Every time a connection is made to the "transient" port spawned by xinetd Every server that we upgraded to RHEL 6.8 with newer tigervnc-server will reproduce (servers were built with similar kickstart configurations)
Steps to Reproduce:
1. Connect from UltraVNC (Windows) or tigervnc client (Linux) to port 5901/5902
2. Wait approx 90 seconds, xesssion disconnects and drops back to the xdm login
3. monitor .xsession-errors log, and find "IO error 11" message
Actual results: xsession disconnects when IO errors occur.
Expected results: xsession would remain connected until user disconnects
Additional info: To resolve this issue, I can downgrade this package only to tigervnc-server 0:1.1.0-16.el6, and the system is stable.
Persistent vnc connections to a dedicated user do not have this issue, even with the newer tigervnc-server.x86_64 0:1.1.0-18.el6 installed.
Disconnect problem is not tied to a particular desktop environment, it will fail with both gnome and KDE.
just a "me too".
we are seeing exactly this behaviour and are currently working around it the same way (downgrade to 1.1.0-16).
Looking forward to the fix.
This issues was already fixed within https://rhn.redhat.com/errata/RHBA-2016-0775.html
I don't believe that to be the case: that erratum documents the tigervnc-1.1.0-18 release, which is what introduced the behaviour described here. 1.1.0-16 did not exhibit the problem, and downgrading to that version is the workaround the original reporter and others have applied.
Note the difference between -16 and -18 pretty much seems to be function prototype changes to allow the package to be built against the current X11 development packages, so I'd speculate that the bug itself could be in the X11 libraries rather than the tigervnc package itself.
I would also expect the problem to be in Xorg server as I think I did not change anything in tigervnc what could cause this issue.
Both -16 and -18 seem to link against the same libraries: the output of "ldd" lists the same paths, with the sole exception that -16 lists /lib64/libfreebl3.so where -18 lists /usr/lib64/libfreebl3.so. These are both paths to the same library.
This led me to notice that "readelf -d /usr/bin/Xvnc" reports
0x000000000000000f (RPATH) Library rpath: [/usr/lib64]
for -18, but not for -16.
Is it possible that the addition of RPATH in the build process could somehow be causing the issue? The only other explanation I can think of is if Xvnc were linked against a static library that contained the bug when -18 was built but not when -16 was built.
Do you want me to do a scratch build so you can test whether it's just a linking issue?
(In reply to Jan Grulich from comment #7)
> Do you want me to do a scratch build so you can test whether it's just a
> linking issue?
For what it's worth, I just tried removing the rpath from the executable via
chrpath -d /usr/bin/Xvnc
(using chrpath from the chrpath package in rhel-6-workstation-rpms).
and this did not fix the symptom, so this suggests that RPATH itself is not the issue.
There are other differences between the output of "readelf -d /usr/bin/Xvnc" for -16 vs. -18 (e.g., -18 reports entries for "GNU_LIBLIST," "GNU_LIBLISTSZ," "GNU_CONFLICT," and "GNU_CONFLICTSZ," where -16 does not). I have no reason to think that these entries themselves are responsible for the bug, but it leads me to speculate that -18 might have been compiled with a newer compiler version than -16. Is it possible that this is a manifestation of a compiler bug?
Can you confirm what compiler versions were used to build -16 and -18 respectively? It might be worthwhile trying a scratch build of tigervnc-server 1.1.0-18 with the compiler downgraded to the previous version, if possible, to see whether the symptom persists.
Otherwise, we need to try to figure out what else is different between the build environments for -16 and -18 that could have introduced the problem.
(In reply to Dan Astoorian from comment #8)
> There are other differences between the output of "readelf -d /usr/bin/Xvnc"
> for -16 vs. -18 (e.g., -18 reports entries for "GNU_LIBLIST,"
> "GNU_LIBLISTSZ," "GNU_CONFLICT," and "GNU_CONFLICTSZ," where -16 does not).
> I have no reason to think that these entries themselves are responsible for
> the bug, but it leads me to speculate that -18 might have been compiled with
> a newer compiler version than -16. Is it possible that this is a
> manifestation of a compiler bug?
Thos aren't emitted by the compiler. Those are added by prelink; if you have them in one and not the other, it's because one had prelink run on it.
> Can you confirm what compiler versions were used to build -16 and -18
> respectively? It might be worthwhile trying a scratch build of
> tigervnc-server 1.1.0-18 with the compiler downgraded to the previous
> version, if possible, to see whether the symptom persists.
Personally I'd suspect the X server change well before the rest of the toolchain. I suspect Xvnc is crashing. Does xinetd capture stdout/stderr of the Xvnc process in a log file somewhere? Can you attach gdb to the Xvnc process and see if it captures a SIGSEGV or other abnormal termination?
The prelink differences I observed are almost certainly due to my upgrading/downgrading package versions to troubleshoot the issue.
Xvnc is not crashing: if it were, the connection to vncviewer would be terminated and vncviewer would exit. "ps" shows that the Xvnc process is still present with the same process ID after the X session goes away.
What we're seeing instead is that the screen goes black (similar to what you would see if a screensaver had kicked in, except it happens regardless of whether the session is idle); when the mouse is subsequently moved or a key pressed, the GDM login screen appears in the same vncviewer window.
In fact, the same thing happens if one does not log in at the GDM login screen: the screen going black seems to happen exactly 3 minutes after the login window is first presented, regardless of whether or how long the user was actually logged in.
I note that os/xdmcp.c contains changes involving keepalive timeouts, including some references to XDM_DEF_DORMANCY (which is, probably not coincidentally, 3 minutes), so it seems extremely likely that the bug was introduced somewhere in that vicinity. Perhaps keepalive packets are not being sent/processed correctly anymore?
Attaching strace to the Xvnc process from -18 shows that it's reaching
XdmcpDeadSession("Alive response indicates session dead");
in recv_alive_msg() in os/xdmcp.c; presumably this is what is terminating the session.
Further instrumenting seems to show that this is happening because SessionRunning is 0 in recv_alive_msg() (presumably indicating that the Session Running field was 0 in the XDMCP "alive" response packet), even though AliveSessionID seems to be set and is equal to SessionID. (Note that the spec indicates that the Session ID field should be 0 when no session is active, but the code in the display manager seems to set the session ID unconditionally, but sets SessionRunning to 1 only if gdm_display_get_status() for the display returns GDM_DISPLAY_MANAGED.)
The value assigned to timeOutTime in XdmcpOpenDisplay() (XDM_DEF_DORMANCY * 1000, or 3 minutes) seems to determine when the connection dies; changing it to
timeOutTime = GetTimeInMillis() + 15 * 1000;
makes the symptom happen after 15 seconds instead of 3 minutes; this may be helpful to you in reproducing the problem.
It's not obvious to me what changed between -16 and -18 to trigger the symptom, however--whether SessionRunning from GDM was always 0 but the code to process the packet never actually fired for some reason, or whether there's something else at work.
*** Bug 1349169 has been marked as a duplicate of this bug. ***
Please test this build of tigervnc:
Installed the tigernvc-server-1.1.0-19.jx1.el6.x86_64 on an RHEL 6.8 server. My connection spawned via xinetd is not crashing after 3 mins. Will do additional testing and report back.
This version is working well when spawned from xinetd. I also verified it with a resumable session and that is also working well.
This build (tigervnc-1.1.0-19.jx1.el6.x86_64) resolves the issue I reported.
Confirmed for me as well: this build is not resetting the connection after 3 minutes like 1.1.0-18 did.
Perfect, I'll do an official build of tigervnc against fixed xserver.
I can confirm this is also broken on x86.
Is there an ETA on 1.1.0-19 to be generally available?
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.