Bug 1344137 - New version of tigervnc-server disconnects Xsession when initiated from xinetd
Summary: New version of tigervnc-server disconnects Xsession when initiated from xinetd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: xorg-x11-server
Version: 6.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Adam Jackson
QA Contact: Desktop QE
URL:
Whiteboard:
: 1349169 (view as bug list)
Depends On:
Blocks: 1269194 1360926 1390458
TreeView+ depends on / blocked
 
Reported: 2016-06-08 20:40 UTC by Chuck Amsler
Modified: 2021-06-10 11:21 UTC (History)
13 users (show)

Fixed In Version: xorg-x11-server-1.17.4-13.el6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1390458 (view as bug list)
Environment:
Last Closed: 2017-03-21 11:16:52 UTC
Target Upstream Version:


Attachments (Terms of Use)
vncserver config for xinetd; error messages written when disconnect occurs. (1.62 KB, text/plain)
2016-06-08 20:40 UTC, Chuck Amsler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0729 0 normal SHIPPED_LIVE xorg-x11-server bug fix and enhancement update 2017-03-21 12:43:21 UTC

Description Chuck Amsler 2016-06-08 20:40:16 UTC
Created attachment 1166105 [details]
vncserver config for xinetd; error messages written when disconnect occurs.

Description of problem:
When a vnc connection is made to a port serviced by xinetd, with the attached configuration, it will disconnect from the Xsession writing "Fatal IO error 11" messages to the .xsession-errors log.  

Version-Release number of selected component (if applicable):
tigervnc-server.x86_64 0:1.1.0-18.el6

How reproducible: Every time a connection is made to the "transient" port spawned by xinetd  Every server that we upgraded to RHEL 6.8 with newer tigervnc-server will reproduce (servers were built with similar kickstart configurations)


Steps to Reproduce:
1. Connect from UltraVNC (Windows) or tigervnc client (Linux) to port 5901/5902
2. Wait approx 90 seconds, xesssion disconnects and drops back to the xdm login
3. monitor .xsession-errors log, and find "IO error 11" message

Actual results: xsession disconnects when IO errors occur.


Expected results: xsession would remain connected until user disconnects


Additional info: To resolve this issue, I can downgrade this package only to tigervnc-server 0:1.1.0-16.el6, and the system is stable.
Persistent vnc connections to a dedicated user do not have this issue, even with the newer tigervnc-server.x86_64 0:1.1.0-18.el6 installed.
Disconnect problem is not tied to a particular desktop environment, it will fail with both gnome and KDE.

Comment 2 Greg Matthews 2016-07-25 10:16:13 UTC
just a "me too".

we are seeing exactly this behaviour and are currently working around it the same way (downgrade to 1.1.0-16). 

Looking forward to the fix.

Comment 3 Tomas Pelka 2016-07-25 12:35:40 UTC
This issues was already fixed within https://rhn.redhat.com/errata/RHBA-2016-0775.html

Comment 4 Dan Astoorian 2016-07-25 14:17:36 UTC
I don't believe that to be the case: that erratum documents the tigervnc-1.1.0-18 release, which is what introduced the behaviour described here.  1.1.0-16 did not exhibit the problem, and downgrading to that version is the workaround the original reporter and others have applied.

Note the difference between -16 and -18 pretty much seems to be function prototype changes to allow the package to be built against the current X11 development packages, so I'd speculate that the bug itself could be in the X11 libraries rather than the tigervnc package itself.

Comment 5 Jan Grulich 2016-08-01 07:54:19 UTC
I would also expect the problem to be in Xorg server as I think I did not change anything in tigervnc what could cause this issue.

Comment 6 Dan Astoorian 2016-08-02 16:45:09 UTC
Both -16 and -18 seem to link against the same libraries: the output of "ldd" lists the same paths, with the sole exception that -16 lists /lib64/libfreebl3.so where -18 lists /usr/lib64/libfreebl3.so.  These are both paths to the same library.

This led me to notice that "readelf -d /usr/bin/Xvnc" reports

 0x000000000000000f (RPATH)              Library rpath: [/usr/lib64]

for -18, but not for -16.

Is it possible that the addition of RPATH in the build process could somehow be causing the issue?  The only other explanation I can think of is if Xvnc were linked against a static library that contained the bug when -18 was built but not when -16 was built.

Comment 7 Jan Grulich 2016-08-10 11:13:09 UTC
Do you want me to do a scratch build so you can test whether it's just a linking issue?

Comment 8 Dan Astoorian 2016-08-10 21:52:31 UTC
(In reply to Jan Grulich from comment #7)
> Do you want me to do a scratch build so you can test whether it's just a
> linking issue?

For what it's worth, I just tried removing the rpath from the executable via

    chrpath -d /usr/bin/Xvnc

(using chrpath from the chrpath package in rhel-6-workstation-rpms).

and this did not fix the symptom, so this suggests that RPATH itself is not the issue.

There are other differences between the output of "readelf -d /usr/bin/Xvnc" for -16 vs. -18 (e.g., -18 reports entries for "GNU_LIBLIST," "GNU_LIBLISTSZ," "GNU_CONFLICT," and "GNU_CONFLICTSZ," where -16 does not).  I have no reason to think that these entries themselves are responsible for the bug, but it leads me to speculate that -18 might have been compiled with a newer compiler version than -16.  Is it possible that this is a manifestation of a compiler bug?

Can you confirm what compiler versions were used to build -16 and -18 respectively?  It might be worthwhile trying a scratch build of tigervnc-server 1.1.0-18 with the compiler downgraded to the previous version, if possible, to see whether the symptom persists.

Otherwise, we need to try to figure out what else is different between the build environments for -16 and -18 that could have introduced the problem.

Comment 9 Adam Jackson 2016-08-26 17:05:26 UTC
(In reply to Dan Astoorian from comment #8)

> There are other differences between the output of "readelf -d /usr/bin/Xvnc"
> for -16 vs. -18 (e.g., -18 reports entries for "GNU_LIBLIST,"
> "GNU_LIBLISTSZ," "GNU_CONFLICT," and "GNU_CONFLICTSZ," where -16 does not). 
> I have no reason to think that these entries themselves are responsible for
> the bug, but it leads me to speculate that -18 might have been compiled with
> a newer compiler version than -16.  Is it possible that this is a
> manifestation of a compiler bug?

Thos aren't emitted by the compiler. Those are added by prelink; if you have them in one and not the other, it's because one had prelink run on it.

> Can you confirm what compiler versions were used to build -16 and -18
> respectively?  It might be worthwhile trying a scratch build of
> tigervnc-server 1.1.0-18 with the compiler downgraded to the previous
> version, if possible, to see whether the symptom persists.

-16 was:

gcc 4.4.7-4.el6
binutils 2.20.51.0.2-5.41.el6
xorg-x11-server-source 1.15.0-12.el6

-18 was:

gcc 4.4.7-16.el6
binutils 2.20.51.0.2-5.43.el6
xorg-x11-server-source 1.17.4-5.el6

Personally I'd suspect the X server change well before the rest of the toolchain. I suspect Xvnc is crashing. Does xinetd capture stdout/stderr of the Xvnc process in a log file somewhere? Can you attach gdb to the Xvnc process and see if it captures a SIGSEGV or other abnormal termination?

Comment 10 Dan Astoorian 2016-08-26 19:34:53 UTC
The prelink differences I observed are almost certainly due to my upgrading/downgrading package versions to troubleshoot the issue.

Xvnc is not crashing: if it were, the connection to vncviewer would be terminated and vncviewer would exit.  "ps" shows that the Xvnc process is still present with the same process ID after the X session goes away.

What we're seeing instead is that the screen goes black (similar to what you would see if a screensaver had kicked in, except it happens regardless of whether the session is idle); when the mouse is subsequently moved or a key pressed, the GDM login screen appears in the same vncviewer window.

In fact, the same thing happens if one does not log in at the GDM login screen: the screen going black seems to happen exactly 3 minutes after the login window is first presented, regardless of whether or how long the user was actually logged in.

I note that os/xdmcp.c contains changes involving keepalive timeouts, including some references to XDM_DEF_DORMANCY (which is, probably not coincidentally, 3 minutes), so it seems extremely likely that the bug was introduced somewhere in that vicinity.  Perhaps keepalive packets are not being sent/processed correctly anymore?

Comment 11 Dan Astoorian 2016-08-26 22:04:12 UTC
Attaching strace to the Xvnc process from -18 shows that it's reaching
  XdmcpDeadSession("Alive response indicates session dead");
in recv_alive_msg() in os/xdmcp.c; presumably this is what is terminating the session.

Further instrumenting seems to show that this is happening because SessionRunning is 0 in recv_alive_msg() (presumably indicating that the Session Running field was 0 in the XDMCP "alive" response packet), even though AliveSessionID seems to be set and is equal to SessionID.  (Note that the spec indicates that the Session ID field should be 0 when no session is active, but the code in the display manager seems to set the session ID unconditionally, but sets SessionRunning to 1 only if gdm_display_get_status() for the display returns GDM_DISPLAY_MANAGED.)

The value assigned to timeOutTime in XdmcpOpenDisplay() (XDM_DEF_DORMANCY * 1000, or 3 minutes) seems to determine when the connection dies; changing it to
    timeOutTime = GetTimeInMillis() + 15 * 1000;
makes the symptom happen after 15 seconds instead of 3 minutes; this may be helpful to you in reproducing the problem.

It's not obvious to me what changed between -16 and -18 to trigger the symptom, however--whether SessionRunning from GDM was always 0 but the code to process the packet never actually fired for some reason, or whether there's something else at work.

Comment 14 Jan Grulich 2016-10-18 09:41:16 UTC
*** Bug 1349169 has been marked as a duplicate of this bug. ***

Comment 18 Adam Jackson 2016-10-27 16:46:46 UTC
Please test this build of tigervnc:

http://people.redhat.com/~ajackson/1344137/

Comment 19 Chuck Amsler 2016-10-27 18:02:09 UTC
Installed the tigernvc-server-1.1.0-19.jx1.el6.x86_64 on an RHEL 6.8 server. My connection spawned via xinetd is not crashing after 3 mins.  Will do additional testing and report back.

Comment 20 Chuck Amsler 2016-10-31 19:50:02 UTC
This version is working well when spawned from xinetd.  I also verified it with a resumable session and that is also working well.
This build (tigervnc-1.1.0-19.jx1.el6.x86_64) resolves the issue I reported.

Comment 21 Dan Astoorian 2016-10-31 20:08:08 UTC
Confirmed for me as well: this build is not resetting the connection after 3 minutes like 1.1.0-18 did.

Comment 22 Jan Grulich 2016-11-01 05:58:25 UTC
Perfect, I'll do an official build of tigervnc against fixed xserver.

Comment 25 jrr 2017-02-23 23:21:37 UTC
I can confirm this is also broken on x86.

Is there an ETA on 1.1.0-19 to be generally available?

Comment 27 errata-xmlrpc 2017-03-21 11:16:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0729.html


Note You need to log in before you can comment on or make changes to this bug.