Bug 202480

Summary: screen(1) reliably crashes on third attach
Product: [Fedora] Fedora Reporter: Adam Jackson <ajax>
Component: ncursesAssignee: Miroslav Lichvar <mlichvar>
Status: CLOSED RAWHIDE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: rawhideCC: dcantrell, dickey, mlichvar, notting
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-16 11:07:20 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 150224    
Attachments:
Description Flags
Patch fixing the tgetent bug. none

Description Adam Jackson 2006-08-14 15:00:46 EDT
Description of problem:

Start two terminals, and:
Term 1: screen -S test
Term 2: screen -rd test
Term 1: screen -rd test
Term 2: screen -rd test

(obviously waiting until the detach/attach completes for each screen -rd).  On
the third screen -rd, the master screen process goes away, the dungeon
collapses, and you die.

Version-Release number of selected component (if applicable):

screen-4.0.2-15.1

How reproducible:
Always.
Comment 1 Jesse Keating 2006-08-15 22:21:31 EDT
I'm able to replicate this after updating multiple systems to rawhide.  This is
a severe problem and makes screen rather less than useful
Comment 2 Jesse Keating 2006-08-15 22:31:11 EDT
Hrm, installing older releases of screen (4.0.1) isn't helping.  There must be
some other change in the distribution that is causing screen to freak.
Comment 3 Jesse Keating 2006-08-15 23:15:37 EDT
I coaxed a backtrace out of gdb:

#0  0x00002b139fd9388b in free () from /lib64/libc.so.6
#1  0x00002b139f4bbf32 in _nc_free_termtype () from /usr/lib64/libncurses.so.5
#2  0x00002b139f4bc643 in del_curterm () from /usr/lib64/libncurses.so.5
#3  0x00002b139f4beab0 in tgetent () from /usr/lib64/libncurses.so.5
#4  0x0000000000421e15 in getlogin ()
#5  0x00000000004233f8 in getlogin ()
#6  0x0000000000416796 in _nc_timed_wait ()
#7  0x00000000004175db in _nc_timed_wait ()
#8  0x000000000044029e in getlogin ()
#9  0x0000000000407e34 in ?? ()
#10 0x00002b139fd42aa4 in __libc_start_main () from /lib64/libc.so.6
#11 0x0000000000403309 in ?? ()
#12 0x00007ffffff0a0f8 in ?? ()
#13 0x0000000000000000 in ?? ()
Comment 4 Bill Nottingham 2006-08-15 23:23:18 EDT
I believe this is related to the fix for bug 198032 - cc'ing ncurses maintainer.
Comment 5 Bill Nottingham 2006-08-15 23:31:21 EDT
This works around it in screen, FWIW:

diff -ru screen-4.0.2/termcap.c screen-4.0.2.foo/termcap.c
--- screen-4.0.2/termcap.c      2003-09-08 10:45:36.000000000 -0400
+++ screen-4.0.2.foo/termcap.c  2006-08-15 23:38:06.000000000 -0400
@@ -1333,7 +1333,7 @@
   xseteuid(real_uid);
   xsetegid(real_gid);
 #endif
-  r = tgetent(bp, name);
+  r = tgetent(NULL, name);
 #ifdef USE_SETEUID
   xseteuid(eff_uid);
   xsetegid(eff_gid);
Comment 6 Bill Nottingham 2006-08-15 23:37:58 EDT
CC'ing upstream ncurses maintainer - this appears to be an issue with the new
ncurses tgetent caching logic.
Comment 7 Jesse Keating 2006-08-15 23:59:13 EDT
I've confirmed the work around does work around the issue.

I'm going to do some cleanup in the spec file, but we want to wait for a proper
fix before pushing a new package.
Comment 8 Miroslav Lichvar 2006-08-16 04:39:22 EDT
This is a ncurses bug, minimal code to reproduce it:

tgetent(b2, "rxvt");
tgetent(b1, "rxvt");
tgetent(b3, "screen");
tgetent(b1, "rxvt");
tgetent(b3, "screen.rxvt");
tgetent(b3, "screen");
tgetent(b1, "rxvt");
Comment 9 Thomas E. Dickey 2006-08-16 06:04:37 EDT
Someone reported a problem with the fix a couple of weeks later.

That fix (the most recent one) is in 20060715.  What patchlevel
of ncurses is the RPM based on?
Comment 10 Miroslav Lichvar 2006-08-16 06:13:06 EDT
The package is based on the 20060715 patchlevel, so this bug is something a bit
different.
Comment 11 Thomas E. Dickey 2006-08-16 06:23:32 EDT
thanks - then that could be an error in the cache logic.
I'll investigate it this evening.
Comment 12 Miroslav Lichvar 2006-08-16 10:41:05 EDT
Created attachment 134313 [details]
Patch fixing the tgetent bug.

It crashes when two cache records have the same last_bufp and both are deleted.
Another problem is that last_bufp isn't set to 0 when terminal description
wasn't found. The patch should fix that.
Comment 13 Miroslav Lichvar 2006-08-16 11:07:20 EDT
Fixed in ncurses-5.5-23.20060715.
Comment 14 Thomas E. Dickey 2006-08-16 15:47:47 EDT
The assignment to LAST_BUF appears to be unnecessary.
Otherwise the patch seems ok (tested for leaks, etc).
Comment 15 Miroslav Lichvar 2006-08-17 03:15:59 EDT
Thomas, without the assignment screen is still not working correctly.
Explanation is that after unsuccessful tgetent there will be cache record with
old LAST_BUF and when tgetent is called with this value it will delete wrong
LAST_TRM.
Comment 16 Steven Haigh 2006-08-28 08:39:23 EDT
I'm not 100% sure if this is related or not, however if I ssh into a box running
screen, and reattach an existing session (screen -DR), then log in again via
another ssh session and bring up screen in 'multi-screen' mode (screen -x), then
irssi that I always have up goes completely mental.

The displaying text scatters all over the screen and is unusable. The only way
to recover is to detatch the screwed screen (^a ^d), then use screen -DR to kill
all sessions and attach again.

This basically means that using screen in -x mode is impossible.
Comment 17 Steven Haigh 2006-08-28 08:41:08 EDT
Oh, and I forgot to mention, this is with screen-4.0.2-15.1 and
ncurses-5.5-23.20060715. Both seem to be the current version in rawhide.
Comment 18 Miroslav Lichvar 2006-08-29 10:57:24 EDT
Ah, another one. I can reproduce it, and when I disable caching in tgetent it
works fine, so it seems related to the problem.
Comment 19 Miroslav Lichvar 2006-08-31 08:07:58 EDT
ncurses-5.5-24.20060715 has a patch that modifies tgetstr function a bit, so it
returns pointer to provided area instead of internal ncurses structure. This
should make screen happy again.