Description of problem: Firefox crashes when accessing Russian websites. The stack trace points at a problem in nameserver code. Version-Release number of selected component (if applicable): firefox-3.0.1-1.fc10.x86_64 glibc-2.8.90-9.x86_64 How reproducible: Usually it's random, depending on what ads are served, but I found a URL which crashes 100% at this time. Steps to Reproduce: 1. Open a URL (see below) 2. 3. Actual results: Firefox crashes Expected results: No crash Additional info: Here's the 100% crash URL: http://rian.ru/technology/20080726/114991971.html I filed the Firefox bug with Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=448151 There, we found that the stack trace with Valgrind looks like so: ==4082== Thread 11: ==4082== Source and destination overlap in memcpy(0x16629A94, 0x16628D9A, 50049) ==4082== at 0x4A07FCA: memcpy (mc_replace_strmem.c:402) ==4082== by 0xB4C4DDF: (within /lib64/libnss_dns-2.8.90.so) ==4082== by 0xB4C522C: _nss_dns_gethostbyname4_r (in /lib64/libnss_dns-2.8.90.so) ==4082== by 0x3A7AECFEFD: (within /lib64/libc-2.8.90.so) ==4082== by 0x3A7AED1CBC: getaddrinfo (in /lib64/libc-2.8.90.so) ==4082== by 0x3A8A61D577: PR_GetAddrInfoByName (in /lib64/libnspr4.so) ==4082== by 0x3A8AE905B7: (within /usr/lib64/xulrunner-1.9/libxul.so) ==4082== by 0x3A8A629AA2: (within /lib64/libnspr4.so) ==4082== by 0x3A7BA07409: (within /lib64/libpthread-2.8.90.so) ==4082== by 0x3A7AEE8DBC: clone (in /lib64/libc-2.8.90.so) ==4082== ==4082== ERROR SUMMARY: 4234 errors from 38 contexts (suppressed: 83 from 1) ==4082== malloc/free: in use at exit: 38,821,370 bytes in 175,537 blocks. ==4082== malloc/free: 1,665,276 allocs, 1,489,739 frees, 945,252,276 bytes alloc Both 32-bit and 64-bit versions crash in the same way. Same Firefox works on F9, so I suspect glibc or other library. Uli, please at least look briefly at it before the URL I found becomes useless. It may be impossible to find a reproducer later, but it really crashes at all sorts of Russian websites randomly.
I looked at the HTML code at that URL and found this: [zaitcev@niphredil ~]$ host body.imho.ru ;; Truncated, retrying in TCP mode. body.imho.ru has address 81.19.80.26 body.imho.ru has address 81.19.80.27 body.imho.ru has address 81.19.80.28 body.imho.ru has address 81.19.80.31 body.imho.ru has address 81.19.80.32 body.imho.ru has address 81.19.80.33 body.imho.ru has address 81.19.80.34 body.imho.ru has address 81.19.80.11 body.imho.ru has address 81.19.80.12 body.imho.ru has address 81.19.80.14 body.imho.ru has address 81.19.80.15 body.imho.ru has address 81.19.80.16 body.imho.ru has address 81.19.80.17 body.imho.ru has address 81.19.80.18 body.imho.ru has address 81.19.80.21 body.imho.ru has address 81.19.80.22 body.imho.ru has address 81.19.80.24 body.imho.ru has address 81.19.80.25 [zaitcev@niphredil ~]$ Maybe that explains why nss tries to move 48KB of data.
This morning, the host(1) does not say ";; Truncated ..." anymore, and the browser does not crash (it still fails to work, reports incorrectly that "address not found"). But the number of printed A records is the same, 18. There must have been some garbage in the nameserver replies that made them bigger.
If the problem is in getaddrinfo then you should be able to reproduce a crash with getent. At least the valgrind should show something. I've tried the URLs you showed. I do see the truncate message but I don't see a crash with getent nor does valgrind complain. I haven't tried firefox (yet) since my rawhide machine is practically headless. Also, it would help if you'd install the glibc debuginfo package.
And Pete, if you can reproduce the crash, capture the DNS traffic. I.e., kill nscd, start wireshark to record port 53, and run firefox.
Actually, I can now see a problem. I'm looking at it...
Created attachment 312749 [details] tcpdump -w ffox.dump -s 1600
Weirdness galore. Now that I have the glibc-debuginfo installed, I cannot get the crash backtrace. The Firefox just hangs (but it crashes again if run without debugger).
I've checked in upstream a whole bunch of patches for the new resolver code. I have seen the problem with the listed host names although in the moment I cannot confirm it works since the results provided right now are short enough for UDP. Anyway, the main problem for this bug was that TCP replies weren't handled correctly. These are only needed if the reply is really large due to many addresses. I've tested it with my local DNS server and it seems to work nicely now. Jakub should be able to build a new glibc real soon. Please test it when available.
Will test and close.
Tested to work with glibc-2.8.90-10.x86_64.