456810 – Firefox crashes when resolving host name with many addresses

Bug 456810 - Firefox crashes when resolving host name with many addresses

Summary: Firefox crashes when resolving host name with many addresses

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	glibc
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-07-27 15:43 UTC by Pete Zaitcev
Modified:	2008-07-31 19:48 UTC (History)
CC List:	2 users (show)
Fixed In Version:	glibc-2.8.90-10
Clone Of:
Environment:
Last Closed:	2008-07-31 19:48:01 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
tcpdump -w ffox.dump -s 1600 (3.77 KB, application/octet-stream) 2008-07-28 03:19 UTC, Pete Zaitcev	no flags	Details
View All

Description Pete Zaitcev 2008-07-27 15:43:57 UTC

Description of problem:

Firefox crashes when accessing Russian websites. The stack trace
points at a problem in nameserver code.

Version-Release number of selected component (if applicable):

firefox-3.0.1-1.fc10.x86_64
glibc-2.8.90-9.x86_64

How reproducible:

Usually it's random, depending on what ads are served, but I found
a URL which crashes 100% at this time.

Steps to Reproduce:
1. Open a URL (see below)
2.
3.
  
Actual results:

Firefox crashes

Expected results:

No crash

Additional info:

Here's the 100% crash URL:
http://rian.ru/technology/20080726/114991971.html

I filed the Firefox bug with Mozilla:
https://bugzilla.mozilla.org/show_bug.cgi?id=448151

There, we found that the stack trace with Valgrind looks like so:
==4082== Thread 11:
==4082== Source and destination overlap in memcpy(0x16629A94, 0x16628D9A, 50049)
==4082==    at 0x4A07FCA: memcpy (mc_replace_strmem.c:402)
==4082==    by 0xB4C4DDF: (within /lib64/libnss_dns-2.8.90.so)
==4082==    by 0xB4C522C: _nss_dns_gethostbyname4_r (in /lib64/libnss_dns-2.8.90.so)
==4082==    by 0x3A7AECFEFD: (within /lib64/libc-2.8.90.so)
==4082==    by 0x3A7AED1CBC: getaddrinfo (in /lib64/libc-2.8.90.so)
==4082==    by 0x3A8A61D577: PR_GetAddrInfoByName (in /lib64/libnspr4.so)
==4082==    by 0x3A8AE905B7: (within /usr/lib64/xulrunner-1.9/libxul.so)
==4082==    by 0x3A8A629AA2: (within /lib64/libnspr4.so)
==4082==    by 0x3A7BA07409: (within /lib64/libpthread-2.8.90.so)
==4082==    by 0x3A7AEE8DBC: clone (in /lib64/libc-2.8.90.so)
==4082==
==4082== ERROR SUMMARY: 4234 errors from 38 contexts (suppressed: 83 from 1)
==4082== malloc/free: in use at exit: 38,821,370 bytes in 175,537 blocks.
==4082== malloc/free: 1,665,276 allocs, 1,489,739 frees, 945,252,276 bytes alloc

Both 32-bit and 64-bit versions crash in the same way.

Same Firefox works on F9, so I suspect glibc or other library.

Uli, please at least look briefly at it before the URL I found becomes
useless. It may be impossible to find a reproducer later, but it really
crashes at all sorts of Russian websites randomly.

Comment 1 Pete Zaitcev 2008-07-27 16:08:47 UTC

I looked at the HTML code at that URL and found this:

[zaitcev@niphredil ~]$ host body.imho.ru
;; Truncated, retrying in TCP mode.
body.imho.ru has address 81.19.80.26
body.imho.ru has address 81.19.80.27
body.imho.ru has address 81.19.80.28
body.imho.ru has address 81.19.80.31
body.imho.ru has address 81.19.80.32
body.imho.ru has address 81.19.80.33
body.imho.ru has address 81.19.80.34
body.imho.ru has address 81.19.80.11
body.imho.ru has address 81.19.80.12
body.imho.ru has address 81.19.80.14
body.imho.ru has address 81.19.80.15
body.imho.ru has address 81.19.80.16
body.imho.ru has address 81.19.80.17
body.imho.ru has address 81.19.80.18
body.imho.ru has address 81.19.80.21
body.imho.ru has address 81.19.80.22
body.imho.ru has address 81.19.80.24
body.imho.ru has address 81.19.80.25
[zaitcev@niphredil ~]$ 

Maybe that explains why nss tries to move 48KB of data.

Comment 2 Pete Zaitcev 2008-07-27 21:01:16 UTC

This morning, the host(1) does not say ";; Truncated ..." anymore,
and the browser does not crash (it still fails to work, reports
incorrectly that "address not found"). But the number of printed A
records is the same, 18. There must have been some garbage in the
nameserver replies that made them bigger.

Comment 3 Ulrich Drepper 2008-07-27 23:54:56 UTC

If the problem is in getaddrinfo then you should be able to reproduce a crash
with getent.  At least the valgrind should show something.

I've tried the URLs you showed.  I do see the truncate message but I don't see a
crash with getent nor does valgrind complain.  I haven't tried firefox (yet)
since my rawhide machine is practically headless.

Also, it would help if you'd install the glibc debuginfo package.

Comment 4 Ulrich Drepper 2008-07-28 00:01:49 UTC

And Pete, if you can reproduce the crash, capture the DNS traffic.  I.e., kill
nscd, start wireshark to record port 53, and run firefox.

Comment 5 Ulrich Drepper 2008-07-28 00:05:13 UTC

Actually, I can now see a problem.  I'm looking at it...

Comment 6 Pete Zaitcev 2008-07-28 03:19:30 UTC

Created attachment 312749 [details]
tcpdump -w ffox.dump -s 1600

Comment 7 Pete Zaitcev 2008-07-28 03:29:18 UTC

Weirdness galore. Now that I have the glibc-debuginfo installed, I cannot
get the crash backtrace. The Firefox just hangs (but it crashes again
if run without debugger).

Comment 8 Ulrich Drepper 2008-07-28 23:01:31 UTC

I've checked in upstream a whole bunch of patches for the new resolver code.  I
have seen the problem with the listed host names although in the moment I cannot
confirm it works since the results provided right now are short enough for UDP.

Anyway, the main problem for this bug was that TCP replies weren't handled
correctly.  These are only needed if the reply is really large due to many
addresses.  I've tested it with my local DNS server and it seems to work nicely now.

Jakub should be able to build a new glibc real soon.  Please test it when available.

Comment 9 Pete Zaitcev 2008-07-28 23:42:48 UTC

Will test and close.

Comment 10 Pete Zaitcev 2008-07-31 19:48:01 UTC

Tested to work with glibc-2.8.90-10.x86_64.

Note You need to log in before you can comment on or make changes to this bug.