Bug 172175

Summary: segfault in getaddrinfo()
Product: Red Hat Enterprise Linux 4 Reporter: Joseph Shraibman <jks>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: drepper
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-11-03 00:44:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
small c program that does not produce the same segfault none

Description Joseph Shraibman 2005-11-01 03:13:59 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc3 Firefox/1.0.7

Description of problem:
My java server program, that has worked fine on redhat 9 for the longest time, started crashing shortly after an upgraded to rh 4 es.  Examination of the core files showed that it always crashes in the same place.  The crash seems to happen randomly, from a few hours to a few days after startup.

BTW is there any way to get rhes4 to stop ip6ifying all ip4 ip addresses?  Maybe if I could do that I could find a way to work around this bug.

Version-Release number of selected component (if applicable):
glibc-2.3.4-2.13

How reproducible:
Sometimes

Steps to Reproduce:
I don't know how to reproduce the problem.

Additional info:

(gdb) bt
#0  0x00a3c7e8 in getaddrinfo () from /lib/tls/libc.so.6
#1  0x7d3cef08 in Java_java_net_Inet6AddressImpl_lookupAllHostAddr () from /usr/local/jdk1.5.0_05/jre/lib/i386/libnet.so
#2  0xb22b6838 in ?? ()
#3  0x7c56b354 in ?? ()
#4  0x7bdb69cc in ?? ()
#5  0x7bdb69c8 in ?? ()
#6  0x7bdb699c in ?? ()
#7  0x00000000 in ?? ()


[root@p3 /]# java -version
java version "1.5.0_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05)
Java HotSpot(TM) Server VM (build 1.5.0_05-b05, mixed mode)
[root@p3 /]# uname -a
Linux p3.selectacast.net 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 i686 i386 GNU/Linux

Comment 1 Joseph Shraibman 2005-11-02 14:43:13 UTC
How can a build a glibc rpm with debuginfo?

Comment 2 Jakub Jelinek 2005-11-02 14:51:22 UTC
Just grab it from
ftp://people.redhat.com/jakub/glibc/2.3.4-2.13/


Comment 3 Joseph Shraibman 2005-11-02 16:53:33 UTC
OK thanks.  That should really be easier to find.

Anyway now I have 4 different backtraces all showing the same thing.

(gdb) bt
#0  *__GI_getaddrinfo (name=0x7bcfffd8 "web64.miraclehosting.com",
service=0x7d1fcc8a "domain", hints=0x7cb03628, pai=0x7cb03624)
    at ../sysdeps/posix/getaddrinfo.c:1593
#1  0x7d1f0f08 in Java_java_net_Inet6AddressImpl_lookupAllHostAddr () from
/usr/local/jdk1.5.0_05/jre/lib/i386/libnet.so
#2  0xb22b6838 in ?? ()
#3  0x7bd221dc in ?? ()
#4  0x7cb036b4 in ?? ()
#5  0x7cb036b0 in ?? ()
#6  0x7cb03684 in ?? ()
#7  0x00000000 in ?? ()

(gdb) bt
#0  *__GI_getaddrinfo (name=0x74f80ca0 "web64.miraclehosting.com",
service=0x7d3dac8a "domain", hints=0x7b0b5a28, pai=0x7b0b5a24)
    at ../sysdeps/posix/getaddrinfo.c:1593
#1  0x7d3cef08 in Java_java_net_Inet6AddressImpl_lookupAllHostAddr () from
/usr/local/jdk1.5.0_05/jre/lib/i386/libnet.so
#2  0xb296557d in ?? ()
#3  0x7a914ed4 in ?? ()
#4  0x7b0b5a88 in ?? ()
#5  0x7b0b5a84 in ?? ()
#6  0xae314830 in ?? ()
#7  0x82249198 in ?? ()
#8  0x00000000 in ?? ()


(gdb) bt
#0  *__GI_getaddrinfo (name=0x81e0a50 "web64.miraclehosting.com",
service=0x7d77ac8a "domain", hints=0x7c6646f4, pai=0x7c6646f0)
    at ../sysdeps/posix/getaddrinfo.c:1593
#1  0x7d76ef08 in Java_java_net_Inet6AddressImpl_lookupAllHostAddr () from
/usr/local/jdk1.5.0_05/jre/lib/i386/libnet.so
#2  0xb22b6838 in ?? ()
#3  0x7b6ea55c in ?? ()
#4  0x7c664780 in ?? ()
#5  0x7c66477c in ?? ()
#6  0x7c664750 in ?? ()
#7  0x00000000 in ?? ()

(gdb) bt
#0  *__GI_getaddrinfo (name=0x87262d8 "web64.miraclehosting.com",
service=0x7d108c8a "domain", hints=0x7a81a9ac, pai=0x7a81a9a8)
    at ../sysdeps/posix/getaddrinfo.c:1593
#1  0x7d0fcf08 in Java_java_net_Inet6AddressImpl_lookupAllHostAddr () from
/usr/local/jdk1.5.0_05/jre/lib/i386/libnet.so
#2  0xb22b6838 in ?? ()
#3  0x7d211f34 in ?? ()
#4  0x7a81aa38 in ?? ()
#5  0x7a81aa34 in ?? ()
#6  0x7a81aa08 in ?? ()
#7  0x00000000 in ?? ()
(gdb)                 

Comment 4 Joseph Shraibman 2005-11-02 18:46:28 UTC
I don't understand two things.
1) How can the memory of results not be accessable?  It is declared right there
on the stack.
2) Why is the segfault happen on 1593 and not the line above it?  
         results[i].dest_addr = q;  <== this is fine
         results[i].got_source_addr = false;  <== this causes a segfault

I made a small test program to try and replicate the bug, but in my test program
the call to getaddrinfo() returns with EAI_SOCKTYPE


(gdb) p *hints
$4 = {ai_flags = 2, ai_family = 0, ai_socktype = 0, ai_protocol = 0, ai_addrlen
= 0, ai_addr = 0x0, ai_canonname = 0x0, ai_next = 0x0}
(gdb) p *pai
$5 = (struct addrinfo *) 0x14
(gdb) p **pai
$6 = {ai_flags = 0, ai_family = 0, ai_socktype = 0, ai_protocol = 0, ai_addrlen
= 0, ai_addr = 0x0, ai_canonname = 0x0, ai_next = 0x0}
(gdb) p i
$12 = 0
(gdb) p results[i]
Cannot access memory at address 0x7a812600
(gdb) p results
Cannot access memory at address 0x7a812600
(gdb) p nresults
$13 = 246
(gdb) p results[i].dest_addr
Cannot access memory at address 0x7a812600
(gdb) p results
Cannot access memory at address 0x7a812600
(gdb) list
1593              results[i].got_source_addr = false;
1594
1595              /* If we just looked up the address for a different
1596                 protocol, reuse the result.  */
1597              if (last != NULL && last->ai_addrlen == q->ai_addrlen
1598                  && memcmp (last->ai_addr, q->ai_addr, q->ai_addrlen) == 0)
1599                {
1600                  memcpy (&results[i].source_addr, &results[i - 1].source_addr,
1601                          results[i - 1].source_addr_len);
1602                  results[i].source_addr_len = results[i - 1].source_addr_len;
(gdb) list -
1583        {
1584          /* Sort results according to RFC 3484.  */
1585          struct sort_result results[nresults];
1586          struct addrinfo *q;
1587          struct addrinfo *last = NULL;
1588          char *canonname = NULL;
1589
1590          for (i = 0, q = p; q != NULL; ++i, last = q, q = q->ai_next)
1591            {
1592              results[i].dest_addr = q;



Comment 5 Joseph Shraibman 2005-11-02 18:49:00 UTC
Created attachment 120656 [details]
small c program that does not produce the same segfault

The output of this program is:
result of call is -7
EAI_SOCKTYPE
0: name: ��� 
Segmentation fault

Comment 6 Jakub Jelinek 2005-11-02 21:20:17 UTC
The testcase in #5 has many bugs.
One is that getaddrinfo fails, *res is undefined.
Another one is that:
http://www.opengroup.org/onlinepubs/009695399/functions/freeaddrinfo.html
"     In this hints structure every member other than ai_flags, ai_family,
ai_socktype, and ai_protocol shall be set to zero
     or a null pointer."
Plus those 4 fields of course need to be set to meaningful values.

Comment 7 Jakub Jelinek 2005-11-02 21:39:27 UTC
As for the segfault in #4, my guess would be that the JDK calls getaddrinfo
with prohibitively small thread stack.
The results array is VLA, sizeof (results[0]) == 136 bytes if I count well,
so for a huge number of nresults (246 in this case) that is ~ 32KB allocation
on the stack.
So, if JDK limits the thread stack size to 32K or smaller and calls getaddrinfo,
it would obviously crash.

We could consider using here __libc_use_alloca I guess (then it would only use
on i?86 at most max (4KB, thread_stack_size / 4) for the array and otherwise
fallback to alloca), but still I'd say that JDK is severely broken if it
calls glibc functions that need a lot of stack with so limited stack size
(examples would be e.g. *printf, *scanf, getaddrinfo and various others).

Comment 8 Joseph Shraibman 2005-11-02 22:24:15 UTC
That may not be the jvm's fault.  I specifically set the stack size to be small
because I wanted to squeeze in a the most number of threads possible into the
jvm.  I assumed the worst that would happen would be a
java.lang.OutOfMemoryError, which I would catch and handle.

Is there any way to tell from the core if it did run out of stack?  I'm not very
adept at using gdb.

BTW The reason I set the input the way I did in my testcase is because that is
the way it looked like it was being set in the core, according to gdb.


Comment 9 Joseph Shraibman 2005-11-03 00:44:20 UTC
OK I can reliably recreate the problem using a test java program and a small
stack size.  There still might be something to be done here about allocating
that array off the stack, but I'm going to mark this as NOTABUG for now.

BTW I can replicate the problem on fedora core 3 and redhat 9 too, but they fail
in different places.  Here is a backtrace from fc3:

(gdb) bt
#0  0x003f85f1 in phys_pages_info () from /lib/tls/libc.so.6
#1  0x003be735 in sysconf () from /lib/tls/libc.so.6
#2  0x0035bae1 in qsort () from /lib/tls/libc.so.6
#3  0x003e2468 in getaddrinfo () from /lib/tls/libc.so.6
#4  0xb229af08 in Java_java_net_Inet6AddressImpl_lookupAllHostAddr ()
   from /mnt/space/fc3local/jdk1.5.0_04/jre/lib/i386/libnet.so
#5  0xb289242b in ?? ()
#6  0x0940f934 in ?? ()
#7  0xbf878ac8 in ?? ()
#8  0xbf878ac4 in ?? ()
#9  0xbf878a98 in ?? ()
#10 0x8cc861c0 in ?? ()
#11 0xbf878ac8 in ?? ()
#12 0x8cc86758 in ?? ()
#13 0x00000000 in ?? ()