Bug 133755

Summary: the commands "host" and "nslookup" fail at a certain stack size limit
Product: Red Hat Enterprise Linux 3 Reporter: Gerhard Niederwieser <gerhard.niederwieser>
Component: bindAssignee: Jason Vas Dias <jvdias>
Status: CLOSED NOTABUG QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: helmut.pedit
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-27 17:02:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gerhard Niederwieser 2004-09-27 08:43:07 UTC
Description of problem:
After setting the stack size limit to 2097151 kbytes the two commands
'hosts' and 'nslookup' failed.


Version-Release number of selected component (if applicable):
Redhat Enterprise Edition 3.0 (2.4.21-20.ELsmp);
package bind-utils-9.2.4-EL3_10 contains command 'host' and 'nslookup';


How reproducible:
You can reproduce the following errors, regardless of the physical ram
size (tested on 1, 2 and 4 Gbyte machines).


Steps to Reproduce:
1. ulimit -s 2097151
2. host

or 

1. ulimit -s 2097151
2. nslookup

  
Actual results:
sushi:/home/c102/c10253 # ulimit -s 2097151
sushi:/home/c102/c10253 # host
mem.c:653: INSIST(ctx->stats[size].gets > 0U) failed.
Aborted
sushi:/home/c102/c10253 # 

You can raise another error message if you decrease the stack size. I
haven't found out the exact stack size limit when the error message
changes.

sushi:/home/c102/c10253 # 
sushi:/home/c102/c10253 # ulimit -s 1500000
sushi:/home/c102/c10253 # host
socket.c:2367: isc_thread_create() failed
host: isc_socketmgr_create: unexpected error
sushi:/home/c102/c10253 # 

Up to a stack size of about 1000000 kbytes no error message occured.


Expected results:
sushi:/home/c102/c10253 # ulimit -s 2097151
sushi:/home/c102/c10253 # host
Usage: host [-aCdlrTwv] [-c class] [-n] [-N ndots] [-t type] [-W time]
            [-R number] hostname [server]
       -a is equivalent to -v -t *
       -c specifies query class for non-IN data
       -C compares SOA records on authoritative nameservers
       -d is equivalent to -v
       -l lists all hosts in a domain, using AXFR
       -i Use the old IN6.INT form of IPv6 reverse lookup
       -N changes the number of dots allowed before root lookup is done
       -r disables recursive processing
       -R specifies number of retries for UDP packets
       -t specifies the query type
       -T enables TCP/IP mode
       -v enables verbose output
       -w specifies to wait forever for a reply
       -W specifies how long to wait for a reply
sushi:/home/c102/c10253 #


Additional info:

- When I tried the same under Redhat 8.0 (2.4.18-27.8.0smp -
bind-utils-9.2.1-9) the commands 'host' and 'nslookup' worked without
a problem (also at a stack size limit of 2097151 kbytes). 

- The commands 'host' and 'nslookup' work fine if I set the stack size
to "unlimited" (ulimit -s unlimited). But it's necessary to modify the
stack size limit for different jobs that are running on our cluster nodes.

- When I test the stack-limits with a self-written waste stack program
the limits work fine. It seems that only the two commands 'host' and
'nslookup' don't run successfully.

[c10253@zid-cc016 stack-size]$ ulimit -s 1500000
[c10253@zid-cc016 stack-size]$ ./wasteStack 1600000
Segmentation fault
[c10253@zid-cc010 stack-size]$

- The output of command "strace host" shows that the mmap2 command
fails. Under Redhat 8.0 this memory map doesn't occur. 

...
mmap2(NULL, 1536000000, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
close(3)                                = 0
close(4)                                = 0
write(2, "socket.c:2367: ", 15socket.c:2367: )         = 15
write(2, "isc_thread_create() failed", 26isc_thread_create() failed) = 26
write(2, "\n", 1
)                       = 1
open("/usr/share/locale/en_US/libisc.cat", O_RDONLY) = -1 ENOENT (No
such file or directory)
...

Comment 1 Jason Vas Dias 2004-09-27 17:02:06 UTC
This is actually caused by glibc : pthread_create() :
If the stack size rlimit is not "unlimited", pthread_attr_init()
returns the current stack size limit as the thread stacksize 
attribute, which BIND then uses to allocate its thread stacks
by calling pthread_create .

Since the stack size rlimit is unreasonablly large in this case
(> 2GB), presumably more than the machine's physical memory,
then the stack allocation fails, pthread_create fails, and  the
BIND library is unable to create threads and generates an ABORT .

Remember that resource limits are set in KILO-BYTES, so 
'ulimit -s 2097151' sets it to 2,147,482,624 bytes (2GB).

If the stacksize rlimit is unlimited, pthread_attr_getstacksize()
returns < 0, so BIND uses a default stacksize of 2101248 bytes
(2052 kilo-bytes). 

Using a reasonable stack size, say with 'ulimit -s 2060', works fine.

BIND is correctly using the recommended thread stack size as 
returned by pthread_attr_getstacksize() - hence, this is not a 
BIND bug.

If you still believe this to be a bug, reopen it and change the 
"Component" to glibc .