Description of problem: When a SSH connection doesn't die cleanly they (sometimes?) leave behind stray sshd processes which appear to loop infinitely, using as much CPU time as they can get. Network problems (such as a statefull firewall being reset) seem to trigger this bug. This problem has only appeared since the ssh update of April 3rd sshd writes the following to the log shortly before going mad secure:Apr 15 20:17:02 silver sshd[22667]: Disconnecting: Timeout, your session not responding. Version-Release number of selected component (if applicable): openssh-4.3p2-4.12.fc5 How reproducible: 5 such processes in the last 2 days, both apparently triggered by network problems Steps to Reproduce: Unknown
Can you try to attach a strace to the stray sshd process? Also which stray process is that? 'ps axu | grep sshd'
When trying to attach strace i get the following error "attach: ptrace(PTRACE_ATTACH, ...): Permission denied" The process that goes mad seems to be the one owned by the user. I will attach the relevant lines from the ps output next time I see a stray process.
have you tried to run the strace as root?
Yes, same error when running strace as a normal user and root. There don't seem to be any selinux denials logged either.
Below is the output of ps axu for the sshd pair thats gone mad root 8576 0.0 0.1 62608 3188 ? Ss May03 0:00 sshd: arthur [priv] user 8778 38.6 0.1 62608 2092 ? R May03 621:14 sshd: user@notty
strace didn't prove to be useful because seemingly no syscalls were being made. Firing up gdb produced the following: (gdb) bt ful #0 __nscd_cache_search (type=GETAI, key=0x5555556bef00 "cpc1-cwma2-0-0-cust348.swan.cable.ntl.com", keylen=42, mapped=0x5555556cb290) at nscd_helper.c:372 here = (struct hashentry *) 0x2aaaad6873b9 work = 256 #1 0x00002aaaabf83b59 in __nscd_getai ( key=0x5555556bef00 "cpc1-cwma2-0-0-cust348.swan.cable.ntl.com", result=0x7fffca95e298, h_errnop=0x7fffca95e2b8) at nscd_getai.c:63 found = Variable "found" is not available. (gdb) print here $3 = (struct hashentry *) 0x2aaaad6873b9 (gdb) print *here $4 = {type = GETPWBYNAME, first = false, len = -352321536, key = 4602471, owner = 0, next = 256, packet = 33554432, {dellist = 0x1000000001000000, prevp = 0x1000000001000000}} (gdb) info locals here = (struct hashentry *) 0x2aaaad6873b9 work = 256 (gdb) list 63 mapped); 64 if (found != NULL) 65 { 66 ai_resp = &found->data[0].aidata; 67 respdata = (char *) (ai_resp + 1); 68 recend = (const char *) found->data + found->recsize; 69 } 70 } 71 72 /* If we do not have the cache mapped, try to get the data over the (gdb) s 374 if (type == here->type && keylen == here->len (gdb) 388 work = here->next; (gdb) 370 while (work != ENDREF) (gdb) 372 struct hashentry *here = (struct hashentry *) (mapped->data + work); (gdb) 374 if (type == here->type && keylen == here->len (gdb) 388 work = here->next; (gdb) 370 while (work != ENDREF) (gdb) li 365 const struct mapped_database *mapped) 366 { 367 unsigned long int hash = __nis_hash (key, keylen) % mapped->head->module; 368 369 ref_t work = mapped->head->array[hash]; 370 while (work != ENDREF) 371 { 372 struct hashentry *here = (struct hashentry *) (mapped->data + work); 373 374 if (type == here->type && keylen == here->len (gdb) 375 && memcmp (key, mapped->data + here->key, keylen) == 0) 376 { 377 /* We found the entry. Increment the appropriate counter. */ 378 const struct datahead *dh 379 = (struct datahead *) (mapped->data + here->packet); 380 381 /* See whether we must ignore the entry or whether something 382 is wrong because garbage collection is in progress. */ 383 if (dh->usable && ((char *) dh + dh->allocsize 384 <= (char *) mapped->head + mapped->mapsize)) (gdb) 385 return dh; 386 } 387 388 work = here->next; 389 } 390 391 return NULL; 392 }
(gdb) print here->len $5 = 1743454208 (gdb) print type $6 = GETAI (gdb) print here->type $7 = GETPWBYNAME
(gdb) print here->next $39 = 1 (gdb) print (mapped->data + work) $40 = 0x2aaaad6873ba "" (gdb) s 388 work = here->next; (gdb) s 370 while (work != ENDREF) (gdb) print (mapped->data + work) $41 = 0x2aaaad6873b9 "" (gdb) s 372 struct hashentry *here = (struct hashentry *) (mapped->data + work); (gdb) 374 if (type == here->type && keylen == here->len (gdb) print (mapped->data + work) $42 = 0x2aaaad6873b9 "" (gdb) s 388 work = here->next; (gdb) 370 while (work != ENDREF) (gdb) print (mapped->data + work) $43 = 0x2aaaad6874b8 "�:F" (gdb) s 372 struct hashentry *here = (struct hashentry *) (mapped->data + work); (gdb) 374 if (type == here->type && keylen == here->len (gdb) 388 work = here->next; (gdb) 370 while (work != ENDREF) (gdb) print (mapped->data + work) $44 = 0x2aaaad6873ba "" looks like there's a circular reference.
That's glibc/nscd code. Do these runaway processes happen if you switch nscd off? Also what is in your /etc/nsswitch.conf? Reassigning to glibc for now although there is still small possibility that sshd somehow messess up the glibc internal data structures.
We'll have to get back to you on the nscd front. As for nsswitch.conf: # # /etc/nsswitch.conf # # An example Name Service Switch config file. This file should be # sorted with the most-used services at the beginning. # # The entry '[NOTFOUND=return]' means that the search for an # entry should stop if the search in the previous entry turned # up nothing. Note that if the search failed due to some other reason # (like no NIS server responding) then the search continues with the # next entry. # # Legal entries are: # # nisplus or nis+ Use NIS+ (NIS version 3) # nis or yp Use NIS (NIS version 2), also called YP # dns Use DNS (Domain Name Service) # files Use the local files # db Use the local database (.db) files # compat Use NIS on compat mode # hesiod Use Hesiod for user lookups # [NOTFOUND=return] Stop searching if not found so far # # To use db, put the "db" in front of "files" for entries you want to be # looked up first in the databases # # Example: #passwd: db files nisplus nis #shadow: db files nisplus nis #group: db files nisplus nis passwd: files ldap shadow: files ldap group: files ldap #hosts: db files nisplus nis dns hosts: files dns # Example - obey only what nisplus tells us... #services: nisplus [NOTFOUND=return] files #networks: nisplus [NOTFOUND=return] files #protocols: nisplus [NOTFOUND=return] files #rpc: nisplus [NOTFOUND=return] files #ethers: nisplus [NOTFOUND=return] files #netmasks: nisplus [NOTFOUND=return] files bootparams: nisplus [NOTFOUND=return] files ethers: files netmasks: files networks: files protocols: files ldap rpc: files services: files ldap netgroup: files ldap publickey: nisplus automount: files ldap aliases: files nisplus
(PS: selinux was stopping sshd from being straced but may have been suppressing that it was doing this)
FC5 is no longer supported, if you can reproduce this with F7 or rawhide, please reopen.