Bug 236530 - mad sshd processes left when a connection isn't closed cleanly
Summary: mad sshd processes left when a connection isn't closed cleanly
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 5
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On: 209881
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-16 07:27 UTC by Chris Jones
Modified: 2007-11-30 22:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-07-03 07:25:18 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Chris Jones 2007-04-16 07:27:14 UTC
Description of problem:
When a SSH connection doesn't die cleanly they (sometimes?) leave behind stray sshd processes which 
appear to loop infinitely, using as much CPU time as they can get.

Network problems (such as a statefull firewall being reset) seem to trigger this bug.

This problem has only appeared since the ssh update of April 3rd

sshd writes the following to the log shortly before going mad
secure:Apr 15 20:17:02 silver sshd[22667]: Disconnecting: Timeout, your session not responding.

Version-Release number of selected component (if applicable):
openssh-4.3p2-4.12.fc5

How reproducible:
5 such processes in the last 2 days, both apparently triggered by network problems

Steps to Reproduce:
Unknown

Comment 1 Tomas Mraz 2007-04-16 09:38:33 UTC
Can you try to attach a strace to the stray sshd process? Also which stray
process is that? 'ps axu | grep sshd'


Comment 2 Chris Jones 2007-04-16 15:45:23 UTC
When trying to attach strace i get the following error "attach: ptrace(PTRACE_ATTACH, ...): Permission 
denied"

The process that goes mad seems to be the one owned by the user. I will attach the relevant lines from the 
ps output next time I see a stray process.

Comment 3 Tomas Mraz 2007-04-16 16:05:26 UTC
have you tried to run the strace as root?


Comment 4 Chris Jones 2007-04-16 17:40:47 UTC
Yes, same error when running strace as a normal user and root.

There don't seem to be any selinux denials logged either.

Comment 5 Chris Jones 2007-05-04 11:01:44 UTC
Below is the output of ps axu for the sshd pair thats gone mad

root      8576  0.0  0.1  62608  3188 ?        Ss   May03   0:00 sshd: arthur 
[priv]
user    8778 38.6  0.1  62608  2092 ?        R    May03 621:14 sshd: user@notty

Comment 6 Sitsofe Wheeler 2007-05-04 15:46:20 UTC
strace didn't prove to be useful because seemingly no syscalls were being made.
Firing up gdb produced the following:

(gdb) bt ful
#0  __nscd_cache_search (type=GETAI,
    key=0x5555556bef00 "cpc1-cwma2-0-0-cust348.swan.cable.ntl.com", keylen=42,
    mapped=0x5555556cb290) at nscd_helper.c:372
        here = (struct hashentry *) 0x2aaaad6873b9
        work = 256
#1  0x00002aaaabf83b59 in __nscd_getai (
    key=0x5555556bef00 "cpc1-cwma2-0-0-cust348.swan.cable.ntl.com",
    result=0x7fffca95e298, h_errnop=0x7fffca95e2b8) at nscd_getai.c:63
        found = Variable "found" is not available.
(gdb) print here
$3 = (struct hashentry *) 0x2aaaad6873b9
(gdb) print *here
$4 = {type = GETPWBYNAME, first = false, len = -352321536, key = 4602471,
  owner = 0, next = 256, packet = 33554432, {dellist = 0x1000000001000000,
    prevp = 0x1000000001000000}}
(gdb) info locals
here = (struct hashentry *) 0x2aaaad6873b9
work = 256
(gdb) list
63                                                                mapped);
64            if (found != NULL)
65              {
66                ai_resp = &found->data[0].aidata;
67                respdata = (char *) (ai_resp + 1);
68                recend = (const char *) found->data + found->recsize;
69              }
70          }
71
72        /* If we do not have the cache mapped, try to get the data over the
(gdb) s
374           if (type == here->type && keylen == here->len
(gdb)
388           work = here->next;
(gdb)
370       while (work != ENDREF)
(gdb)
372           struct hashentry *here = (struct hashentry *) (mapped->data + work);
(gdb)
374           if (type == here->type && keylen == here->len
(gdb)
388           work = here->next;
(gdb)
370       while (work != ENDREF)
(gdb) li
365                          const struct mapped_database *mapped)
366     {
367       unsigned long int hash = __nis_hash (key, keylen) % mapped->head->module;
368
369       ref_t work = mapped->head->array[hash];
370       while (work != ENDREF)
371         {
372           struct hashentry *here = (struct hashentry *) (mapped->data + work);
373
374           if (type == here->type && keylen == here->len
(gdb)
375               && memcmp (key, mapped->data + here->key, keylen) == 0)
376             {
377               /* We found the entry.  Increment the appropriate counter.  */
378               const struct datahead *dh
379                 = (struct datahead *) (mapped->data + here->packet);
380
381               /* See whether we must ignore the entry or whether something
382                  is wrong because garbage collection is in progress.  */
383               if (dh->usable && ((char *) dh + dh->allocsize
384                                  <= (char *) mapped->head + mapped->mapsize))
(gdb)
385                 return dh;
386             }
387
388           work = here->next;
389         }
390
391       return NULL;
392     }


Comment 7 Sitsofe Wheeler 2007-05-04 15:55:24 UTC
(gdb) print here->len
$5 = 1743454208
(gdb) print type
$6 = GETAI
(gdb) print here->type
$7 = GETPWBYNAME


Comment 8 Sitsofe Wheeler 2007-05-04 16:09:44 UTC
(gdb) print here->next
$39 = 1
(gdb) print (mapped->data + work)
$40 = 0x2aaaad6873ba ""
(gdb) s
388           work = here->next;
(gdb) s
370       while (work != ENDREF)
(gdb) print (mapped->data + work)
$41 = 0x2aaaad6873b9 ""
(gdb) s
372           struct hashentry *here = (struct hashentry *) (mapped->data + work);
(gdb)
374           if (type == here->type && keylen == here->len
(gdb) print (mapped->data + work)
$42 = 0x2aaaad6873b9 ""
(gdb) s
388           work = here->next;
(gdb)
370       while (work != ENDREF)
(gdb) print (mapped->data + work)
$43 = 0x2aaaad6874b8 "�:F"
(gdb) s
372           struct hashentry *here = (struct hashentry *) (mapped->data + work);
(gdb)
374           if (type == here->type && keylen == here->len
(gdb)
388           work = here->next;
(gdb)
370       while (work != ENDREF)
(gdb) print (mapped->data + work)
$44 = 0x2aaaad6873ba ""

looks like there's a circular reference.

Comment 9 Tomas Mraz 2007-05-04 18:32:52 UTC
That's glibc/nscd code. Do these runaway processes happen if you switch nscd off?

Also what is in your /etc/nsswitch.conf?

Reassigning to glibc for now although there is still small possibility that sshd
somehow messess up the glibc internal data structures.


Comment 10 Sitsofe Wheeler 2007-05-05 08:29:44 UTC
We'll have to get back to you on the nscd front. As for nsswitch.conf:

#
# /etc/nsswitch.conf
#
# An example Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# The entry '[NOTFOUND=return]' means that the search for an
# entry should stop if the search in the previous entry turned
# up nothing. Note that if the search failed due to some other reason
# (like no NIS server responding) then the search continues with the
# next entry.
#
# Legal entries are:
#
#       nisplus or nis+         Use NIS+ (NIS version 3)
#       nis or yp               Use NIS (NIS version 2), also called YP
#       dns                     Use DNS (Domain Name Service)
#       files                   Use the local files
#       db                      Use the local database (.db) files
#       compat                  Use NIS on compat mode
#       hesiod                  Use Hesiod for user lookups
#       [NOTFOUND=return]       Stop searching if not found so far
#

# To use db, put the "db" in front of "files" for entries you want to be
# looked up first in the databases
#
# Example:
#passwd:    db files nisplus nis
#shadow:    db files nisplus nis
#group:     db files nisplus nis

passwd:     files ldap
shadow:     files ldap
group:      files ldap

#hosts:     db files nisplus nis dns
hosts:      files dns

# Example - obey only what nisplus tells us...
#services:   nisplus [NOTFOUND=return] files
#networks:   nisplus [NOTFOUND=return] files
#protocols:  nisplus [NOTFOUND=return] files
#rpc:        nisplus [NOTFOUND=return] files
#ethers:     nisplus [NOTFOUND=return] files
#netmasks:   nisplus [NOTFOUND=return] files     

bootparams: nisplus [NOTFOUND=return] files

ethers:     files
netmasks:   files
networks:   files
protocols:  files ldap
rpc:        files
services:   files ldap

netgroup:   files ldap

publickey:  nisplus

automount:  files ldap
aliases:    files nisplus

Comment 11 Sitsofe Wheeler 2007-05-05 08:30:43 UTC
(PS: selinux was stopping sshd from being straced but may have been suppressing that it was doing this)

Comment 12 Jakub Jelinek 2007-07-03 07:25:18 UTC
FC5 is no longer supported, if you can reproduce this with F7 or rawhide, please
reopen.


Note You need to log in before you can comment on or make changes to this bug.