Bug 202480

Summary: screen(1) reliably crashes on third attach
Product: [Fedora] Fedora Reporter: Adam Jackson <ajax>
Component: ncursesAssignee: Miroslav Lichvar <mlichvar>
Status: CLOSED RAWHIDE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: rawhideCC: dcantrell, dickey, mlichvar, notting
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-16 15:07:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 150224    
Attachments:
Description Flags
Patch fixing the tgetent bug. none

Description Adam Jackson 2006-08-14 19:00:46 UTC
Description of problem:

Start two terminals, and:
Term 1: screen -S test
Term 2: screen -rd test
Term 1: screen -rd test
Term 2: screen -rd test

(obviously waiting until the detach/attach completes for each screen -rd).  On
the third screen -rd, the master screen process goes away, the dungeon
collapses, and you die.

Version-Release number of selected component (if applicable):

screen-4.0.2-15.1

How reproducible:
Always.

Comment 1 Jesse Keating 2006-08-16 02:21:31 UTC
I'm able to replicate this after updating multiple systems to rawhide.  This is
a severe problem and makes screen rather less than useful

Comment 2 Jesse Keating 2006-08-16 02:31:11 UTC
Hrm, installing older releases of screen (4.0.1) isn't helping.  There must be
some other change in the distribution that is causing screen to freak.

Comment 3 Jesse Keating 2006-08-16 03:15:37 UTC
I coaxed a backtrace out of gdb:

#0  0x00002b139fd9388b in free () from /lib64/libc.so.6
#1  0x00002b139f4bbf32 in _nc_free_termtype () from /usr/lib64/libncurses.so.5
#2  0x00002b139f4bc643 in del_curterm () from /usr/lib64/libncurses.so.5
#3  0x00002b139f4beab0 in tgetent () from /usr/lib64/libncurses.so.5
#4  0x0000000000421e15 in getlogin ()
#5  0x00000000004233f8 in getlogin ()
#6  0x0000000000416796 in _nc_timed_wait ()
#7  0x00000000004175db in _nc_timed_wait ()
#8  0x000000000044029e in getlogin ()
#9  0x0000000000407e34 in ?? ()
#10 0x00002b139fd42aa4 in __libc_start_main () from /lib64/libc.so.6
#11 0x0000000000403309 in ?? ()
#12 0x00007ffffff0a0f8 in ?? ()
#13 0x0000000000000000 in ?? ()


Comment 4 Bill Nottingham 2006-08-16 03:23:18 UTC
I believe this is related to the fix for bug 198032 - cc'ing ncurses maintainer.

Comment 5 Bill Nottingham 2006-08-16 03:31:21 UTC
This works around it in screen, FWIW:

diff -ru screen-4.0.2/termcap.c screen-4.0.2.foo/termcap.c
--- screen-4.0.2/termcap.c      2003-09-08 10:45:36.000000000 -0400
+++ screen-4.0.2.foo/termcap.c  2006-08-15 23:38:06.000000000 -0400
@@ -1333,7 +1333,7 @@
   xseteuid(real_uid);
   xsetegid(real_gid);
 #endif
-  r = tgetent(bp, name);
+  r = tgetent(NULL, name);
 #ifdef USE_SETEUID
   xseteuid(eff_uid);
   xsetegid(eff_gid);


Comment 6 Bill Nottingham 2006-08-16 03:37:58 UTC
CC'ing upstream ncurses maintainer - this appears to be an issue with the new
ncurses tgetent caching logic.

Comment 7 Jesse Keating 2006-08-16 03:59:13 UTC
I've confirmed the work around does work around the issue.

I'm going to do some cleanup in the spec file, but we want to wait for a proper
fix before pushing a new package.

Comment 8 Miroslav Lichvar 2006-08-16 08:39:22 UTC
This is a ncurses bug, minimal code to reproduce it:

tgetent(b2, "rxvt");
tgetent(b1, "rxvt");
tgetent(b3, "screen");
tgetent(b1, "rxvt");
tgetent(b3, "screen.rxvt");
tgetent(b3, "screen");
tgetent(b1, "rxvt");

Comment 9 Thomas E. Dickey 2006-08-16 10:04:37 UTC
Someone reported a problem with the fix a couple of weeks later.

That fix (the most recent one) is in 20060715.  What patchlevel
of ncurses is the RPM based on?

Comment 10 Miroslav Lichvar 2006-08-16 10:13:06 UTC
The package is based on the 20060715 patchlevel, so this bug is something a bit
different.

Comment 11 Thomas E. Dickey 2006-08-16 10:23:32 UTC
thanks - then that could be an error in the cache logic.
I'll investigate it this evening.

Comment 12 Miroslav Lichvar 2006-08-16 14:41:05 UTC
Created attachment 134313 [details]
Patch fixing the tgetent bug.

It crashes when two cache records have the same last_bufp and both are deleted.
Another problem is that last_bufp isn't set to 0 when terminal description
wasn't found. The patch should fix that.

Comment 13 Miroslav Lichvar 2006-08-16 15:07:20 UTC
Fixed in ncurses-5.5-23.20060715.

Comment 14 Thomas E. Dickey 2006-08-16 19:47:47 UTC
The assignment to LAST_BUF appears to be unnecessary.
Otherwise the patch seems ok (tested for leaks, etc).

Comment 15 Miroslav Lichvar 2006-08-17 07:15:59 UTC
Thomas, without the assignment screen is still not working correctly.
Explanation is that after unsuccessful tgetent there will be cache record with
old LAST_BUF and when tgetent is called with this value it will delete wrong
LAST_TRM.

Comment 16 Steven Haigh 2006-08-28 12:39:23 UTC
I'm not 100% sure if this is related or not, however if I ssh into a box running
screen, and reattach an existing session (screen -DR), then log in again via
another ssh session and bring up screen in 'multi-screen' mode (screen -x), then
irssi that I always have up goes completely mental.

The displaying text scatters all over the screen and is unusable. The only way
to recover is to detatch the screwed screen (^a ^d), then use screen -DR to kill
all sessions and attach again.

This basically means that using screen in -x mode is impossible.

Comment 17 Steven Haigh 2006-08-28 12:41:08 UTC
Oh, and I forgot to mention, this is with screen-4.0.2-15.1 and
ncurses-5.5-23.20060715. Both seem to be the current version in rawhide.

Comment 18 Miroslav Lichvar 2006-08-29 14:57:24 UTC
Ah, another one. I can reproduce it, and when I disable caching in tgetent it
works fine, so it seems related to the problem.

Comment 19 Miroslav Lichvar 2006-08-31 12:07:58 UTC
ncurses-5.5-24.20060715 has a patch that modifies tgetstr function a bit, so it
returns pointer to provided area instead of internal ncurses structure. This
should make screen happy again.