Bug 2173067 - tcsh hangs on startup for several seconds
Summary: tcsh hangs on startup for several seconds
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: tcsh
Version: 37
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jan Macku
QA Contact: Fedora Extras Quality Assurance
URL: https://mailman.astron.com/pipermail/...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-23 21:24 UTC by Charles R. Anderson
Modified: 2024-01-12 22:51 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-01-12 22:51:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
strace output (43.56 KB, text/plain)
2023-02-23 21:24 UTC, Charles R. Anderson
no flags Details
ltrace output (1.50 MB, text/plain)
2023-02-23 21:29 UTC, Charles R. Anderson
no flags Details
strace --follow-forks output (85.70 KB, text/plain)
2023-02-23 22:32 UTC, Charles R. Anderson
no flags Details
ltrace -f output (1.72 MB, text/plain)
2023-02-23 22:36 UTC, Charles R. Anderson
no flags Details
patch to workaround DNS resolution hang with IPv6 remotehost (632 bytes, patch)
2023-02-23 23:51 UTC, Charles R. Anderson
no flags Details | Diff
Workaround DNS resolution hang with IPv6 remotehost on shell init (1.41 KB, patch)
2023-02-24 01:01 UTC, Charles R. Anderson
no flags Details | Diff

Description Charles R. Anderson 2023-02-23 21:24:04 UTC
Created attachment 1946004 [details]
strace output

Description of problem:

Almost every time tcsh starts, either via a login shell, or separately by typing "tcsh -f" to bypass all startup scripts, tcsh hangs for several seconds before presenting a prompt.

Version-Release number of selected component (if applicable):
tcsh-6.24.07-1.fc37.x86_64

How reproducible:
always

Steps to Reproduce:
1. strace -tt -o /tmp/strace-tcsh.txt tcsh -f
2. ltrace -tt -o /tmp/ltrace-tcsh.txt tcsh -f

Actual results:
The tcsh prompt shows up 4-5 seconds after starting tcsh -f.

Expected results:
No delay.

Additional info:

The delay seems to come from this 3 second gap in strace output right before the SIGCHLD:

16:09:46.141870 uname({sysname="Linux", nodename="gluon.lan", ...}) = 0
16:09:46.141887 rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f34cf08eb20}, NULL, 8) = 0
16:09:46.141895 pipe2([3, 4], 0)        = 0
16:09:46.141906 dup(3)                  = 5
16:09:46.141913 dup(5)                  = 6
16:09:46.141920 close(5)                = 0
16:09:46.141927 close(3)                = 0
16:09:46.141934 fcntl(6, F_SETFD, FD_CLOEXEC) = 0
16:09:46.141940 dup(4)                  = 3
16:09:46.141947 dup(3)                  = 5
16:09:46.141953 dup(5)                  = 7
16:09:46.141960 close(5)                = 0
16:09:46.141967 close(3)                = 0
16:09:46.141973 close(4)                = 0
16:09:46.141980 fcntl(7, F_SETFD, FD_CLOEXEC) = 0
16:09:46.141986 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f34cf04fa10) = 1847949
16:09:46.142027 close(7)                = 0
16:09:46.142034 read(6, "", 4096)       = 0
16:09:49.934579 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1847949, si_uid=10002, si_status=1, si_utime=0, si_stime=0} ---
16:09:49.934790 close(6)                = 0
16:09:49.935045 wait4(1847949, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 1847949
16:09:49.935298 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
16:09:49.935453 getpid()                = 1847948

The ltrace output is similar:

16:21:54.650297 sigaction(SIGCHLD, { nil, <>, 0x1000, 0x17c000 }, nil)              = 0
16:21:54.650329 pipe(0x7ffdc41672f8)                                                = 0
16:21:54.650359 dup(3, 0xffffffff, 0, 0x7fdf00ce4a5d)                               = 5
16:21:54.650389 dup(5, 0xffffffff, 0, 0x7fdf00ce49cb)                               = 6
16:21:54.650417 close(5)                                                            = 0
16:21:54.650444 close(3)                                                            = 0
16:21:54.650471 fcntl64(6, 2, 1, 0x7fdf00ce4944)                                    = 0
16:21:54.650499 dup(4, 0xffffffff, 0, 0x7fdf00ce8f54)                               = 3
16:21:54.650526 dup(3, 0xffffffff, 0, 0x7fdf00ce49cb)                               = 5
16:21:54.650554 dup(5, 0xffffffff, 0, 0x7fdf00ce49cb)                               = 7
16:21:54.650581 close(5)                                                            = 0
16:21:54.650608 close(3)                                                            = 0
16:21:54.650635 close(4)                                                            = 0
16:21:54.650661 fcntl64(7, 2, 1, 0x7fdf00ce4944)                                    = 0
16:21:54.650689 fork()                                                              = 1849058
16:21:54.650833 close(7)                                                            = 0
16:21:54.650870 read(6 <no return ...>
16:21:58.435672 --- SIGCHLD (Child exited) ---
16:21:58.435886 <... read resumed> , "", 4096)                                      = 0
16:21:58.436178 close(6)                                                            = 0
16:21:58.436934 waitpid(1849058, 0x7ffdc41672c0, 0)                                 = 1849058
16:21:58.437305 getenv("TERM")                                                      = "xterm-256color"

Comment 1 Charles R. Anderson 2023-02-23 21:29:38 UTC
Created attachment 1946006 [details]
ltrace output

Comment 2 Charles R. Anderson 2023-02-23 22:32:19 UTC
Created attachment 1946036 [details]
strace --follow-forks output

Added --follow-forks to strace.  The output now shows that this might be related to a DNS lookup:

1855262 17:21:44.570150 rt_sigprocmask(SIG_BLOCK, [HUP USR1 USR2 PIPE ALRM CHLD TSTP URG VTALRM PROF WINCH IO], [], 8) = 0
1855262 17:21:44.570198 futex(0x7fc8bf7b78e0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
1855262 17:21:44.570226 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
1855262 17:21:44.570269 connect(3, {sa_family=AF_UNIX, sun_path="/run/systemd/resolve/io.systemd.Resolve"}, 42) = 0
1855262 17:21:44.570345 sendto(3, "{\"method\":\"io.systemd.Resolve.Re"..., 87, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 87
1855262 17:21:44.570446 mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc8bf693000
1855262 17:21:44.570476 recvfrom(3, 0x7fc8bf693010, 135152, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
1855262 17:21:44.570502 ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=119, tv_nsec=999943000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {
tv_sec=116, tv_nsec=386511062})
1855262 17:21:48.184258 recvfrom(3, "{\"error\":\"io.systemd.Resolve.DNS"..., 135152, MSG_DONTWAIT, NULL, NULL) = 65
1855262 17:21:48.184749 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
1855262 17:21:48.185166 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
1855262 17:21:48.185402 exit_group(1)   = ?
1855262 17:21:48.185792 +++ exited with 1 +++
1855261 17:21:48.185818 <... read resumed>"", 4096) = 0
1855261 17:21:48.185954 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1855262, si_uid=10002, si_status=1, si_utime=0, si_stime=0} ---

Comment 3 Charles R. Anderson 2023-02-23 22:36:46 UTC
Created attachment 1946037 [details]
ltrace -f output

Added -f to ltrace to follow forks.  This is getting more interesting.  It appears that  soemthing is splitting my remote SSH client's IPv6 address on the first colon, then attempting a getaddrinfo DNS lookup on that:

1855367 17:23:23.885767 strcmp("pts/21", "pts/9")                                   = -7
1855367 17:23:23.885795 strcmp("pts/21", "tty2")                                    = -4
1855367 17:23:23.885823 strchr("fdd1:19ae:f1ae:1::e16", ':')                        = ":19ae:f1ae:1::e16"
1855367 17:23:23.885865 inet_addr("fdd1")                                           = 0xffffffff
1855367 17:23:23.885891 strlen("fdd1")                                              = 4
1855367 17:23:23.885915 getaddrinfo("fdd1", nil, 0x7ffe78d03a10, 0x7ffe78d039e8 <no return ...>
1855367 17:23:27.934793 --- SIGALRM (Alarm clock) ---
1855367 17:23:27.934965 _exit(1 <no return ...>
1855367 17:23:27.937894 +++ exited (status 1) +++
1855366 17:23:27.938045 --- SIGCHLD (Child exited) ---
1855366 17:23:27.938085 <... read resumed> , "", 4096)                              = 0
1855366 17:23:27.938239 close(6)                                                    = 0
1855366 17:23:27.938527 waitpid(1855367, 0x7ffe78d03a10, 0)                         = 1855367
1855366 17:23:27.938717 getenv("TERM")                                              = "xterm-256color"

Indeed, if I ssh into the system via IPv4, the hang disappears.

Comment 4 Charles R. Anderson 2023-02-23 23:51:57 UTC
Created attachment 1946038 [details]
patch to workaround DNS resolution hang with IPv6 remotehost

This code is splitting the remote host info from UTMP on the first colon.  The comment even says it doesn't work with IPv6 addresses.  So just #ifdef 0 out this code in the same tradition of all the other #ifdef 0's spread throughout tcsh, including an earlier one related to DNS resolution. This patch solves the DNS hang which was attempting to do a DNS lookup on a partial IPv6 address.

            /* Look for host:display.screen */
            /*                                                                                                                           
             * There is conflict with IPv6 address and X DISPLAY.  So,                                                                   
             * we assume there is no IPv6 address in utmp and don't                                                                      
             * touch here.                                                                                                               
             */
            if ((sptr = strchr(name, ':')) != NULL)
                *sptr = '\0';

Comment 5 Charles R. Anderson 2023-02-24 01:01:42 UTC
Created attachment 1946049 [details]
Workaround DNS resolution hang with IPv6 remotehost on shell init

Update comments on patch.  I've submitted this upstream.

--------------------------------------------------------------------

The REMOTEHOST implementation is splitting the remote host info from
utmp on the first colon.  As the comment indicates, it doesn't work
with IPv6 addresses.  Just #ifdef 0 out this code in the same
tradition of all the other #ifdef 0's spread throughout tcsh,
including an earlier one related to DNS resolution.  This patch
resolves a DNS hang when attempting to do a DNS lookup on a partial
IPv6 address during shell initialization.

            /* Look for host:display.screen */
            /*
             * There is conflict with IPv6 address and X DISPLAY.  So,
             * we assume there is no IPv6 address in utmp and don't
             * touch here.
             */
            if ((sptr = strchr(name, ':')) != NULL)
                *sptr = '\0';

Signed-off-by: Chuck Anderson <cra>

Comment 6 Aoife Moloney 2023-11-23 01:19:20 UTC
This message is a reminder that Fedora Linux 37 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '37'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 37 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 7 Aoife Moloney 2024-01-12 22:51:00 UTC
Fedora Linux 37 entered end-of-life (EOL) status on 2023-12-05.

Fedora Linux 37 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.