574889 – valgrind aborts with "vex x86->IR: unhandled instruction bytes"

Bug 574889 - valgrind aborts with "vex x86->IR: unhandled instruction bytes"

Summary: valgrind aborts with "vex x86->IR: unhandled instruction bytes"

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	valgrind
Sub Component:
Version:	13
Hardware:	i686
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	575083 577590 (view as bug list)
Depends On:
Blocks:	F13Target 580078
TreeView+	depends on / blocked

Reported:	2010-03-18 19:07 UTC by Stefan Becker
Modified:	2010-04-27 02:20 UTC (History)
CC List:	7 users (show)
Fixed In Version:	valgrind-3.5.0-16.fc13
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	580078 (view as bug list)
Environment:
Last Closed:	2010-04-27 02:20:14 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch for nss-softokn (676 bytes, patch) 2010-04-06 20:56 UTC, Kamil Dudka	no flags	Details \| Diff
nops.c (1.50 KB, text/plain) 2010-04-07 06:38 UTC, Jakub Jelinek	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
KDE Software Compilation	233576	0	None	None	None	Never

Description Stefan Becker 2010-03-18 19:07:05 UTC

Description of problem:

Valgrind stopped working (at least for some debugging cases) on Fedora 13/i686. When I run pidgin under valgrind I get:

(21:03:41) prefs: /purple/proxy/password changed, scheduling save.
vex x86->IR: unhandled instruction bytes: 0x66 0x66 0x2E 0xF
==16019== valgrind: Unrecognised instruction at address 0xa10de5.
==16019== Your program just tried to execute an instruction that Valgrind
==16019== did not recognise.  There are two possible reasons for this.
==16019== 1. Your program has a bug and erroneously jumped to a non-code
==16019==    location.  If you are running Memcheck and you just saw a
==16019==    warning about a bad jump, it's probably your program's fault.
==16019== 2. The instruction is legitimate but Valgrind doesn't handle it,
==16019==    i.e. it's Valgrind's fault.  If you think this is the case or
==16019==    you are not sure, please let us know and we'll try to fix it.
==16019== Either way, Valgrind will now raise a SIGILL signal which will
==16019== probably kill your program.
==16019== 
==16019== Process terminating with default action of signal 4 (SIGILL): dumping core
==16019==  Illegal opcode at address 0xA10DE5
==16019==    at 0xA10DE5: __memset_sse2 (in /lib/libc-2.11.90.so)
==16019==    by 0x22F1252: PR_CallOnce (in /lib/libnspr4.so)
==16019==    by 0x43F247B: ??? (in /lib/libfreebl3.so)
==16019==    by 0x43D155A: ??? (in /lib/libfreebl3.so)
==16019==    by 0x4695C62: ??? (in /usr/lib/libsoftokn3.so)
==16019==    by 0x4677E81: ??? (in /usr/lib/libsoftokn3.so)
==16019==    by 0x467812D: ??? (in /usr/lib/libsoftokn3.so)
==16019==    by 0x2349592: ??? (in /usr/lib/libnss3.so)
==16019==    by 0x2349E72: ??? (in /usr/lib/libnss3.so)
==16019==    by 0x235E01E: SECMOD_LoadModule (in /usr/lib/libnss3.so)
==16019==    by 0x235E19E: SECMOD_LoadModule (in /usr/lib/libnss3.so)
==16019==    by 0x232A002: ??? (in /usr/lib/libnss3.so)
==16019==    by 0x232A752: NSS_NoDB_Init (in /usr/lib/libnss3.so)
==16019==    by 0x4437AE5: ??? (in /usr/lib/purple-2/ssl-nss.so)
==16019==    by 0x573226: purple_plugin_load (in /usr/lib/libpurple.so.0.6.6)
==16019==    by 0x44B0857: ??? (in /usr/lib/purple-2/ssl.so)
==16019==    by 0x573226: purple_plugin_load (in /usr/lib/libpurple.so.0.6.6)
==16019==    by 0x5977FF: ??? (in /usr/lib/libpurple.so.0.6.6)
==16019==    by 0x597832: purple_ssl_init (in /usr/lib/libpurple.so.0.6.6)
==16019==    by 0x5532C9: purple_core_init (in /usr/lib/libpurple.so.0.6.6)

Sounds to me a lot like the SSE issue when valgrind stopped working in Fedora 12 Alpha/Beta.

Version-Release number of selected component (if applicable):
valgrind-3.5.0-14.fc12.i686
gcc-4.4.3-8.fc13.i686
glibc-2.11.90-15.i686

How reproducible: Always

Steps to Reproduce:
G_DEBUG="gc-friendly" G_SLICE="always-malloc" valgrind --leak-check=yes --num-callers=20 /usr/bin/pidgin --debug

Comment 1 Kamil Dudka 2010-03-19 12:17:02 UTC

*** Bug 575083 has been marked as a duplicate of this bug. ***

Comment 2 Kamil Dudka 2010-03-19 12:33:34 UTC

Looking into bz, I can see only two exactly same backtraces, starting from nss_Init().  This may be also a bug of nss/nspr.  See the bug 575083 for details, i.e. the backtrace with debuginfo.

Comment 3 Kamil Dudka 2010-04-04 13:56:15 UTC

Proposing as F13Target.  The bug is regression and breaks curl's test-suite.

minimal example available - see attachment #401229 [details]

Comment 4 Kamil Dudka 2010-04-04 14:07:41 UTC

*** Bug 577590 has been marked as a duplicate of this bug. ***

Comment 5 Kamil Dudka 2010-04-06 13:49:16 UTC

Using the minimal example, I stopped my debugger at prinit.c:807, displayed the called function and got the following sequence:

(PRCallOnceFN) 0x73f9c50 <InitializeArenas>
(PRCallOnceFN) 0x755e3b0 <softoken_LoadDSO>
(PRCallOnceFN) 0x38f8e0 <freebl_LoadDSO>
(PRCallOnceFN) 0x2dd650 <rng_init>
(PRCallOnceFN) 0x2ea460 <init_blinding_params_list>
(PRCallOnceFN) 0x75a1960 <error_once_function>

Comment 6 Kamil Dudka 2010-04-06 19:06:20 UTC

> (PRCallOnceFN) 0x2dd650 <rng_init>

The crash happens here ^^^.

Comment 7 Kamil Dudka 2010-04-06 19:25:55 UTC

410                 memset(bytes, 0, numBytes);

(gdb) frame
#0  rng_init () at drbg.c:410

(gdb) print &bytes
$1 = (PRUint8 (*)[110]) 0xbffff06e

(gdb) print numBytes
$2 = 110

Comment 8 Kamil Dudka 2010-04-06 20:56:11 UTC

Created attachment 404788 [details]
patch for nss-softokn

The attached patch prevents NSS from crash.  If I build nss-softokn with the patch applied, I am able to run NSS based applications through valgrind and the curl test-suite finally works!

Comment 9 Bob Relyea 2010-04-06 22:17:59 UTC

Hmmm...The code looks correct. Bytes is 110 byte array and numBytes is 110 bytes long. The only thing I can think of is that we are larger than the stack that valgrind gives us. 

bob

Comment 10 Ben Boeckel 2010-04-06 22:56:03 UTC

(In reply to comment #8)
> Created an attachment (id=404788) [details]
> patch for nss-softokn
> 
> The attached patch prevents NSS from crash.  If I build nss-softokn with the
> patch applied, I am able to run NSS based applications through valgrind and the
> curl test-suite finally works!    

I can confirm that this patch stops valgrind from crashing.

--Ben

Comment 11 Kamil Dudka 2010-04-06 23:02:53 UTC

(In reply to comment #9)
> Hmmm...The code looks correct. Bytes is 110 byte array and numBytes is 110
> bytes long. The only thing I can think of is that we are larger than the stack
> that valgrind gives us.

Are you sure?  110 *bytes* of memory does not sound enough for stack overflow to me.  The crash occurs always on the same line, no matter how huge is the stack leading to that point.

I suspect binary incompatiblity among NSS libraries.  Where can I get the latest tarballs from upstream?  The URL should be IMO part of the specfile.  Did I miss it somehow?

Since nobody has been able to reproduce the crash out of NSS, I am reassigning the bug to nss-softokn for now.

Comment 12 Jakub Jelinek 2010-04-06 23:27:13 UTC

This is really a valgrind bug, it doesn't handle:
  12cbd5:       66 66 2e 0f 1f 84 00    nopw   %cs:0x0(%eax,%eax,1)
  12cbdc:       00 00 00 00 
instruction (it handles the 10 byte 0x66 0x2e 0x0f 0x1f 0x84 0 0 0 0 0 10 byte nop, but not the 11 byte one.

Comment 13 Kamil Dudka 2010-04-06 23:44:57 UTC

Thank you for looking into this!

I don't see such instruction at memset-sse2.S:258.  Does it mean that the crash occurs elsewhere and the bt is misleading?

I was trying to reproduce the crash with memset() on its own - various pieces of memory, various alignments ... but no hit at all.

Comment 14 Jakub Jelinek 2010-04-07 06:38:56 UTC

Created attachment 404845 [details]
nops.c

Testcase which covers hopefully all possible instructions used for alignments by gas when optimizing for various CPUs in 32-bit and 64-bit x86/x86_64 code (except for jmp insns that are used together with lots of nops after it).
valgrind for x86_64 accepts all instructions, but 32-bit i?86 valgrind doesn't grok nopw %cs:0(%eax,%eax,1) with more than one data16 prefix.

Comment 15 Jakub Jelinek 2010-04-07 15:30:07 UTC

Should be fixed in valgrind-3.5.0-15.{fc12,fc13,fc14}.

Comment 16 Ben Boeckel 2010-04-07 15:44:40 UTC

(In reply to comment #15)
> Should be fixed in valgrind-3.5.0-15.{fc12,fc13,fc14}.    

I can confirm that this works against unpatched nss on F13 i686.

Comment 17 Kamil Dudka 2010-04-07 16:33:51 UTC

(In reply to comment #15)
> Should be fixed in valgrind-3.5.0-15.{fc12,fc13,fc14}.    

Thanks!  I can confirm the crash is gone on fc12.

But I can't see any build for rawhide.  Is there any schedule to build it?

Comment 18 Stefan Becker 2010-04-07 17:36:45 UTC

Confirmed. valgrind works again on FC13/i686. Thanks

Comment 19 Fedora Update System 2010-04-09 09:45:02 UTC

valgrind-3.5.0-15.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/valgrind-3.5.0-15.fc13

Comment 20 Fedora Update System 2010-04-09 21:07:51 UTC

valgrind-3.5.0-15.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update valgrind'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/valgrind-3.5.0-15.fc13

Comment 21 Fedora Update System 2010-04-14 01:40:00 UTC

valgrind-3.5.0-16.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update valgrind'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/valgrind-3.5.0-16.fc13

Comment 22 Fedora Update System 2010-04-27 02:19:59 UTC

valgrind-3.5.0-16.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.