Description of problem: Valgrind stopped working (at least for some debugging cases) on Fedora 13/i686. When I run pidgin under valgrind I get: (21:03:41) prefs: /purple/proxy/password changed, scheduling save. vex x86->IR: unhandled instruction bytes: 0x66 0x66 0x2E 0xF ==16019== valgrind: Unrecognised instruction at address 0xa10de5. ==16019== Your program just tried to execute an instruction that Valgrind ==16019== did not recognise. There are two possible reasons for this. ==16019== 1. Your program has a bug and erroneously jumped to a non-code ==16019== location. If you are running Memcheck and you just saw a ==16019== warning about a bad jump, it's probably your program's fault. ==16019== 2. The instruction is legitimate but Valgrind doesn't handle it, ==16019== i.e. it's Valgrind's fault. If you think this is the case or ==16019== you are not sure, please let us know and we'll try to fix it. ==16019== Either way, Valgrind will now raise a SIGILL signal which will ==16019== probably kill your program. ==16019== ==16019== Process terminating with default action of signal 4 (SIGILL): dumping core ==16019== Illegal opcode at address 0xA10DE5 ==16019== at 0xA10DE5: __memset_sse2 (in /lib/libc-2.11.90.so) ==16019== by 0x22F1252: PR_CallOnce (in /lib/libnspr4.so) ==16019== by 0x43F247B: ??? (in /lib/libfreebl3.so) ==16019== by 0x43D155A: ??? (in /lib/libfreebl3.so) ==16019== by 0x4695C62: ??? (in /usr/lib/libsoftokn3.so) ==16019== by 0x4677E81: ??? (in /usr/lib/libsoftokn3.so) ==16019== by 0x467812D: ??? (in /usr/lib/libsoftokn3.so) ==16019== by 0x2349592: ??? (in /usr/lib/libnss3.so) ==16019== by 0x2349E72: ??? (in /usr/lib/libnss3.so) ==16019== by 0x235E01E: SECMOD_LoadModule (in /usr/lib/libnss3.so) ==16019== by 0x235E19E: SECMOD_LoadModule (in /usr/lib/libnss3.so) ==16019== by 0x232A002: ??? (in /usr/lib/libnss3.so) ==16019== by 0x232A752: NSS_NoDB_Init (in /usr/lib/libnss3.so) ==16019== by 0x4437AE5: ??? (in /usr/lib/purple-2/ssl-nss.so) ==16019== by 0x573226: purple_plugin_load (in /usr/lib/libpurple.so.0.6.6) ==16019== by 0x44B0857: ??? (in /usr/lib/purple-2/ssl.so) ==16019== by 0x573226: purple_plugin_load (in /usr/lib/libpurple.so.0.6.6) ==16019== by 0x5977FF: ??? (in /usr/lib/libpurple.so.0.6.6) ==16019== by 0x597832: purple_ssl_init (in /usr/lib/libpurple.so.0.6.6) ==16019== by 0x5532C9: purple_core_init (in /usr/lib/libpurple.so.0.6.6) Sounds to me a lot like the SSE issue when valgrind stopped working in Fedora 12 Alpha/Beta. Version-Release number of selected component (if applicable): valgrind-3.5.0-14.fc12.i686 gcc-4.4.3-8.fc13.i686 glibc-2.11.90-15.i686 How reproducible: Always Steps to Reproduce: G_DEBUG="gc-friendly" G_SLICE="always-malloc" valgrind --leak-check=yes --num-callers=20 /usr/bin/pidgin --debug
*** Bug 575083 has been marked as a duplicate of this bug. ***
Looking into bz, I can see only two exactly same backtraces, starting from nss_Init(). This may be also a bug of nss/nspr. See the bug 575083 for details, i.e. the backtrace with debuginfo.
Proposing as F13Target. The bug is regression and breaks curl's test-suite. minimal example available - see attachment #401229 [details]
*** Bug 577590 has been marked as a duplicate of this bug. ***
Using the minimal example, I stopped my debugger at prinit.c:807, displayed the called function and got the following sequence: (PRCallOnceFN) 0x73f9c50 <InitializeArenas> (PRCallOnceFN) 0x755e3b0 <softoken_LoadDSO> (PRCallOnceFN) 0x38f8e0 <freebl_LoadDSO> (PRCallOnceFN) 0x2dd650 <rng_init> (PRCallOnceFN) 0x2ea460 <init_blinding_params_list> (PRCallOnceFN) 0x75a1960 <error_once_function>
> (PRCallOnceFN) 0x2dd650 <rng_init> The crash happens here ^^^.
410 memset(bytes, 0, numBytes); (gdb) frame #0 rng_init () at drbg.c:410 (gdb) print &bytes $1 = (PRUint8 (*)[110]) 0xbffff06e (gdb) print numBytes $2 = 110
Created attachment 404788 [details] patch for nss-softokn The attached patch prevents NSS from crash. If I build nss-softokn with the patch applied, I am able to run NSS based applications through valgrind and the curl test-suite finally works!
Hmmm...The code looks correct. Bytes is 110 byte array and numBytes is 110 bytes long. The only thing I can think of is that we are larger than the stack that valgrind gives us. bob
(In reply to comment #8) > Created an attachment (id=404788) [details] > patch for nss-softokn > > The attached patch prevents NSS from crash. If I build nss-softokn with the > patch applied, I am able to run NSS based applications through valgrind and the > curl test-suite finally works! I can confirm that this patch stops valgrind from crashing. --Ben
(In reply to comment #9) > Hmmm...The code looks correct. Bytes is 110 byte array and numBytes is 110 > bytes long. The only thing I can think of is that we are larger than the stack > that valgrind gives us. Are you sure? 110 *bytes* of memory does not sound enough for stack overflow to me. The crash occurs always on the same line, no matter how huge is the stack leading to that point. I suspect binary incompatiblity among NSS libraries. Where can I get the latest tarballs from upstream? The URL should be IMO part of the specfile. Did I miss it somehow? Since nobody has been able to reproduce the crash out of NSS, I am reassigning the bug to nss-softokn for now.
This is really a valgrind bug, it doesn't handle: 12cbd5: 66 66 2e 0f 1f 84 00 nopw %cs:0x0(%eax,%eax,1) 12cbdc: 00 00 00 00 instruction (it handles the 10 byte 0x66 0x2e 0x0f 0x1f 0x84 0 0 0 0 0 10 byte nop, but not the 11 byte one.
Thank you for looking into this! I don't see such instruction at memset-sse2.S:258. Does it mean that the crash occurs elsewhere and the bt is misleading? I was trying to reproduce the crash with memset() on its own - various pieces of memory, various alignments ... but no hit at all.
Created attachment 404845 [details] nops.c Testcase which covers hopefully all possible instructions used for alignments by gas when optimizing for various CPUs in 32-bit and 64-bit x86/x86_64 code (except for jmp insns that are used together with lots of nops after it). valgrind for x86_64 accepts all instructions, but 32-bit i?86 valgrind doesn't grok nopw %cs:0(%eax,%eax,1) with more than one data16 prefix.
Should be fixed in valgrind-3.5.0-15.{fc12,fc13,fc14}.
(In reply to comment #15) > Should be fixed in valgrind-3.5.0-15.{fc12,fc13,fc14}. I can confirm that this works against unpatched nss on F13 i686.
(In reply to comment #15) > Should be fixed in valgrind-3.5.0-15.{fc12,fc13,fc14}. Thanks! I can confirm the crash is gone on fc12. But I can't see any build for rawhide. Is there any schedule to build it?
Confirmed. valgrind works again on FC13/i686. Thanks
valgrind-3.5.0-15.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/valgrind-3.5.0-15.fc13
valgrind-3.5.0-15.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update valgrind'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/valgrind-3.5.0-15.fc13
valgrind-3.5.0-16.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update valgrind'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/valgrind-3.5.0-16.fc13
valgrind-3.5.0-16.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.