Red Hat Bugzilla – Bug 456810
Firefox crashes when resolving host name with many addresses
Last modified: 2008-07-31 15:48:01 EDT
Description of problem:
Firefox crashes when accessing Russian websites. The stack trace
points at a problem in nameserver code.
Version-Release number of selected component (if applicable):
Usually it's random, depending on what ads are served, but I found
a URL which crashes 100% at this time.
Steps to Reproduce:
1. Open a URL (see below)
Here's the 100% crash URL:
I filed the Firefox bug with Mozilla:
There, we found that the stack trace with Valgrind looks like so:
==4082== Thread 11:
==4082== Source and destination overlap in memcpy(0x16629A94, 0x16628D9A, 50049)
==4082== at 0x4A07FCA: memcpy (mc_replace_strmem.c:402)
==4082== by 0xB4C4DDF: (within /lib64/libnss_dns-2.8.90.so)
==4082== by 0xB4C522C: _nss_dns_gethostbyname4_r (in /lib64/libnss_dns-2.8.90.so)
==4082== by 0x3A7AECFEFD: (within /lib64/libc-2.8.90.so)
==4082== by 0x3A7AED1CBC: getaddrinfo (in /lib64/libc-2.8.90.so)
==4082== by 0x3A8A61D577: PR_GetAddrInfoByName (in /lib64/libnspr4.so)
==4082== by 0x3A8AE905B7: (within /usr/lib64/xulrunner-1.9/libxul.so)
==4082== by 0x3A8A629AA2: (within /lib64/libnspr4.so)
==4082== by 0x3A7BA07409: (within /lib64/libpthread-2.8.90.so)
==4082== by 0x3A7AEE8DBC: clone (in /lib64/libc-2.8.90.so)
==4082== ERROR SUMMARY: 4234 errors from 38 contexts (suppressed: 83 from 1)
==4082== malloc/free: in use at exit: 38,821,370 bytes in 175,537 blocks.
==4082== malloc/free: 1,665,276 allocs, 1,489,739 frees, 945,252,276 bytes alloc
Both 32-bit and 64-bit versions crash in the same way.
Same Firefox works on F9, so I suspect glibc or other library.
Uli, please at least look briefly at it before the URL I found becomes
useless. It may be impossible to find a reproducer later, but it really
crashes at all sorts of Russian websites randomly.
I looked at the HTML code at that URL and found this:
[zaitcev@niphredil ~]$ host body.imho.ru
;; Truncated, retrying in TCP mode.
body.imho.ru has address 18.104.22.168
body.imho.ru has address 22.214.171.124
body.imho.ru has address 126.96.36.199
body.imho.ru has address 188.8.131.52
body.imho.ru has address 184.108.40.206
body.imho.ru has address 220.127.116.11
body.imho.ru has address 18.104.22.168
body.imho.ru has address 22.214.171.124
body.imho.ru has address 126.96.36.199
body.imho.ru has address 188.8.131.52
body.imho.ru has address 184.108.40.206
body.imho.ru has address 220.127.116.11
body.imho.ru has address 18.104.22.168
body.imho.ru has address 22.214.171.124
body.imho.ru has address 126.96.36.199
body.imho.ru has address 188.8.131.52
body.imho.ru has address 184.108.40.206
body.imho.ru has address 220.127.116.11
Maybe that explains why nss tries to move 48KB of data.
This morning, the host(1) does not say ";; Truncated ..." anymore,
and the browser does not crash (it still fails to work, reports
incorrectly that "address not found"). But the number of printed A
records is the same, 18. There must have been some garbage in the
nameserver replies that made them bigger.
If the problem is in getaddrinfo then you should be able to reproduce a crash
with getent. At least the valgrind should show something.
I've tried the URLs you showed. I do see the truncate message but I don't see a
crash with getent nor does valgrind complain. I haven't tried firefox (yet)
since my rawhide machine is practically headless.
Also, it would help if you'd install the glibc debuginfo package.
And Pete, if you can reproduce the crash, capture the DNS traffic. I.e., kill
nscd, start wireshark to record port 53, and run firefox.
Actually, I can now see a problem. I'm looking at it...
Created attachment 312749 [details]
tcpdump -w ffox.dump -s 1600
Weirdness galore. Now that I have the glibc-debuginfo installed, I cannot
get the crash backtrace. The Firefox just hangs (but it crashes again
if run without debugger).
I've checked in upstream a whole bunch of patches for the new resolver code. I
have seen the problem with the listed host names although in the moment I cannot
confirm it works since the results provided right now are short enough for UDP.
Anyway, the main problem for this bug was that TCP replies weren't handled
correctly. These are only needed if the reply is really large due to many
addresses. I've tested it with my local DNS server and it seems to work nicely now.
Jakub should be able to build a new glibc real soon. Please test it when available.
Will test and close.
Tested to work with glibc-2.8.90-10.x86_64.