This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 574889 - valgrind aborts with "vex x86->IR: unhandled instruction bytes"
valgrind aborts with "vex x86->IR: unhandled instruction bytes"
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: valgrind (Show other bugs)
13
i686 Linux
high Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Fedora Extras Quality Assurance
: Regression
: 575083 577590 (view as bug list)
Depends On:
Blocks: F13Target 580078
  Show dependency treegraph
 
Reported: 2010-03-18 15:07 EDT by Stefan Becker
Modified: 2010-04-26 22:20 EDT (History)
7 users (show)

See Also:
Fixed In Version: valgrind-3.5.0-16.fc13
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 580078 (view as bug list)
Environment:
Last Closed: 2010-04-26 22:20:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
patch for nss-softokn (676 bytes, patch)
2010-04-06 16:56 EDT, Kamil Dudka
no flags Details | Diff
nops.c (1.50 KB, text/plain)
2010-04-07 02:38 EDT, Jakub Jelinek
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
KDE Software Compilation 233576 None None None Never

  None (edit)
Description Stefan Becker 2010-03-18 15:07:05 EDT
Description of problem:

Valgrind stopped working (at least for some debugging cases) on Fedora 13/i686. When I run pidgin under valgrind I get:

(21:03:41) prefs: /purple/proxy/password changed, scheduling save.
vex x86->IR: unhandled instruction bytes: 0x66 0x66 0x2E 0xF
==16019== valgrind: Unrecognised instruction at address 0xa10de5.
==16019== Your program just tried to execute an instruction that Valgrind
==16019== did not recognise.  There are two possible reasons for this.
==16019== 1. Your program has a bug and erroneously jumped to a non-code
==16019==    location.  If you are running Memcheck and you just saw a
==16019==    warning about a bad jump, it's probably your program's fault.
==16019== 2. The instruction is legitimate but Valgrind doesn't handle it,
==16019==    i.e. it's Valgrind's fault.  If you think this is the case or
==16019==    you are not sure, please let us know and we'll try to fix it.
==16019== Either way, Valgrind will now raise a SIGILL signal which will
==16019== probably kill your program.
==16019== 
==16019== Process terminating with default action of signal 4 (SIGILL): dumping core
==16019==  Illegal opcode at address 0xA10DE5
==16019==    at 0xA10DE5: __memset_sse2 (in /lib/libc-2.11.90.so)
==16019==    by 0x22F1252: PR_CallOnce (in /lib/libnspr4.so)
==16019==    by 0x43F247B: ??? (in /lib/libfreebl3.so)
==16019==    by 0x43D155A: ??? (in /lib/libfreebl3.so)
==16019==    by 0x4695C62: ??? (in /usr/lib/libsoftokn3.so)
==16019==    by 0x4677E81: ??? (in /usr/lib/libsoftokn3.so)
==16019==    by 0x467812D: ??? (in /usr/lib/libsoftokn3.so)
==16019==    by 0x2349592: ??? (in /usr/lib/libnss3.so)
==16019==    by 0x2349E72: ??? (in /usr/lib/libnss3.so)
==16019==    by 0x235E01E: SECMOD_LoadModule (in /usr/lib/libnss3.so)
==16019==    by 0x235E19E: SECMOD_LoadModule (in /usr/lib/libnss3.so)
==16019==    by 0x232A002: ??? (in /usr/lib/libnss3.so)
==16019==    by 0x232A752: NSS_NoDB_Init (in /usr/lib/libnss3.so)
==16019==    by 0x4437AE5: ??? (in /usr/lib/purple-2/ssl-nss.so)
==16019==    by 0x573226: purple_plugin_load (in /usr/lib/libpurple.so.0.6.6)
==16019==    by 0x44B0857: ??? (in /usr/lib/purple-2/ssl.so)
==16019==    by 0x573226: purple_plugin_load (in /usr/lib/libpurple.so.0.6.6)
==16019==    by 0x5977FF: ??? (in /usr/lib/libpurple.so.0.6.6)
==16019==    by 0x597832: purple_ssl_init (in /usr/lib/libpurple.so.0.6.6)
==16019==    by 0x5532C9: purple_core_init (in /usr/lib/libpurple.so.0.6.6)

Sounds to me a lot like the SSE issue when valgrind stopped working in Fedora 12 Alpha/Beta.

Version-Release number of selected component (if applicable):
valgrind-3.5.0-14.fc12.i686
gcc-4.4.3-8.fc13.i686
glibc-2.11.90-15.i686

How reproducible: Always

Steps to Reproduce:
G_DEBUG="gc-friendly" G_SLICE="always-malloc" valgrind --leak-check=yes --num-callers=20 /usr/bin/pidgin --debug
Comment 1 Kamil Dudka 2010-03-19 08:17:02 EDT
*** Bug 575083 has been marked as a duplicate of this bug. ***
Comment 2 Kamil Dudka 2010-03-19 08:33:34 EDT
Looking into bz, I can see only two exactly same backtraces, starting from nss_Init().  This may be also a bug of nss/nspr.  See the bug 575083 for details, i.e. the backtrace with debuginfo.
Comment 3 Kamil Dudka 2010-04-04 09:56:15 EDT
Proposing as F13Target.  The bug is regression and breaks curl's test-suite.

minimal example available - see attachment #401229 [details]
Comment 4 Kamil Dudka 2010-04-04 10:07:41 EDT
*** Bug 577590 has been marked as a duplicate of this bug. ***
Comment 5 Kamil Dudka 2010-04-06 09:49:16 EDT
Using the minimal example, I stopped my debugger at prinit.c:807, displayed the called function and got the following sequence:

(PRCallOnceFN) 0x73f9c50 <InitializeArenas>
(PRCallOnceFN) 0x755e3b0 <softoken_LoadDSO>
(PRCallOnceFN) 0x38f8e0 <freebl_LoadDSO>
(PRCallOnceFN) 0x2dd650 <rng_init>
(PRCallOnceFN) 0x2ea460 <init_blinding_params_list>
(PRCallOnceFN) 0x75a1960 <error_once_function>
Comment 6 Kamil Dudka 2010-04-06 15:06:20 EDT
> (PRCallOnceFN) 0x2dd650 <rng_init>

The crash happens here ^^^.
Comment 7 Kamil Dudka 2010-04-06 15:25:55 EDT
410                 memset(bytes, 0, numBytes);

(gdb) frame
#0  rng_init () at drbg.c:410

(gdb) print &bytes
$1 = (PRUint8 (*)[110]) 0xbffff06e

(gdb) print numBytes
$2 = 110
Comment 8 Kamil Dudka 2010-04-06 16:56:11 EDT
Created attachment 404788 [details]
patch for nss-softokn

The attached patch prevents NSS from crash.  If I build nss-softokn with the patch applied, I am able to run NSS based applications through valgrind and the curl test-suite finally works!
Comment 9 Bob Relyea 2010-04-06 18:17:59 EDT
Hmmm...The code looks correct. Bytes is 110 byte array and numBytes is 110 bytes long. The only thing I can think of is that we are larger than the stack that valgrind gives us. 

bob
Comment 10 Ben Boeckel 2010-04-06 18:56:03 EDT
(In reply to comment #8)
> Created an attachment (id=404788) [details]
> patch for nss-softokn
> 
> The attached patch prevents NSS from crash.  If I build nss-softokn with the
> patch applied, I am able to run NSS based applications through valgrind and the
> curl test-suite finally works!    

I can confirm that this patch stops valgrind from crashing.

--Ben
Comment 11 Kamil Dudka 2010-04-06 19:02:53 EDT
(In reply to comment #9)
> Hmmm...The code looks correct. Bytes is 110 byte array and numBytes is 110
> bytes long. The only thing I can think of is that we are larger than the stack
> that valgrind gives us.

Are you sure?  110 *bytes* of memory does not sound enough for stack overflow to me.  The crash occurs always on the same line, no matter how huge is the stack leading to that point.

I suspect binary incompatiblity among NSS libraries.  Where can I get the latest tarballs from upstream?  The URL should be IMO part of the specfile.  Did I miss it somehow?

Since nobody has been able to reproduce the crash out of NSS, I am reassigning the bug to nss-softokn for now.
Comment 12 Jakub Jelinek 2010-04-06 19:27:13 EDT
This is really a valgrind bug, it doesn't handle:
  12cbd5:       66 66 2e 0f 1f 84 00    nopw   %cs:0x0(%eax,%eax,1)
  12cbdc:       00 00 00 00 
instruction (it handles the 10 byte 0x66 0x2e 0x0f 0x1f 0x84 0 0 0 0 0 10 byte nop, but not the 11 byte one.
Comment 13 Kamil Dudka 2010-04-06 19:44:57 EDT
Thank you for looking into this!

I don't see such instruction at memset-sse2.S:258.  Does it mean that the crash occurs elsewhere and the bt is misleading?

I was trying to reproduce the crash with memset() on its own - various pieces of memory, various alignments ... but no hit at all.
Comment 14 Jakub Jelinek 2010-04-07 02:38:56 EDT
Created attachment 404845 [details]
nops.c

Testcase which covers hopefully all possible instructions used for alignments by gas when optimizing for various CPUs in 32-bit and 64-bit x86/x86_64 code (except for jmp insns that are used together with lots of nops after it).
valgrind for x86_64 accepts all instructions, but 32-bit i?86 valgrind doesn't grok nopw %cs:0(%eax,%eax,1) with more than one data16 prefix.
Comment 15 Jakub Jelinek 2010-04-07 11:30:07 EDT
Should be fixed in valgrind-3.5.0-15.{fc12,fc13,fc14}.
Comment 16 Ben Boeckel 2010-04-07 11:44:40 EDT
(In reply to comment #15)
> Should be fixed in valgrind-3.5.0-15.{fc12,fc13,fc14}.    

I can confirm that this works against unpatched nss on F13 i686.
Comment 17 Kamil Dudka 2010-04-07 12:33:51 EDT
(In reply to comment #15)
> Should be fixed in valgrind-3.5.0-15.{fc12,fc13,fc14}.    

Thanks!  I can confirm the crash is gone on fc12.

But I can't see any build for rawhide.  Is there any schedule to build it?
Comment 18 Stefan Becker 2010-04-07 13:36:45 EDT
Confirmed. valgrind works again on FC13/i686. Thanks
Comment 19 Fedora Update System 2010-04-09 05:45:02 EDT
valgrind-3.5.0-15.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/valgrind-3.5.0-15.fc13
Comment 20 Fedora Update System 2010-04-09 17:07:51 EDT
valgrind-3.5.0-15.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update valgrind'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/valgrind-3.5.0-15.fc13
Comment 21 Fedora Update System 2010-04-13 21:40:00 EDT
valgrind-3.5.0-16.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update valgrind'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/valgrind-3.5.0-16.fc13
Comment 22 Fedora Update System 2010-04-26 22:19:59 EDT
valgrind-3.5.0-16.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.