Bug 1495320

Summary: Valgrind on armv7hl reports illegal instruction within libcrypto.so
Product: [Fedora] Fedora Reporter: Pablo Greco <pablo>
Component: valgrindAssignee: Mark Wielaard <mjw>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: rawhideCC: dodji, fche, jakub, mjw
Target Milestone: ---   
Target Release: ---   
Hardware: arm   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-26 06:34:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pablo Greco 2017-09-25 19:47:25 UTC
Description of problem:
Valgrind reports "Unrecognised instruction" on any program that requires libcrypt.so


Version-Release number of selected component (if applicable):
valgrind-3.13.0-7.fc28.armv7hl
openssl-libs-1.1.0f-9.fc27.armv7hl

How reproducible:
Always

Steps to Reproduce:
1. Run "valgrind ssh"
2.
3.

Actual results:
disInstr(arm): unhandled instruction: 0xEC510F1E
                 cond=14(0xE) 27:20=197(0xC5) 4:4=1 3:0=14(0xE)
==1246== valgrind: Unrecognised instruction at address 0x48fafa8.
==1246==    at 0x48FAFA8: ??? (in /usr/lib/libcrypto.so.1.1.0f)
==1246== Your program just tried to execute an instruction that Valgrind
==1246== did not recognise.  There are two possible reasons for this.
==1246== 1. Your program has a bug and erroneously jumped to a non-code
==1246==    location.  If you are running Memcheck and you just saw a
==1246==    warning about a bad jump, it's probably your program's fault.
==1246== 2. The instruction is legitimate but Valgrind doesn't handle it,
==1246==    i.e. it's Valgrind's fault.  If you think this is the case or
==1246==    you are not sure, please let us know and we'll try to fix it.
==1246== Either way, Valgrind will now raise a SIGILL signal which will
==1246== probably kill your program.


Expected results:
No unhandled instruction, just run the program

Additional info:

Tested on fully updated fedora rawhide, bananapi-m1 (A20)
Linux bpi-fedora 4.14.0-0.rc1.git4.1.fc28.armv7hl #1 SMP Fri Sep 22 23:35:46 UTC 2017 armv7l armv7l armv7l GNU/Linux
Also tested on Centos 7.4 with an older openssl version with the same results.

Comment 1 Mark Wielaard 2017-09-25 20:27:53 UTC
Would you be able to run the same with valgrind --vgdb-error=0 ssh
And then in another terminal gdb ssh
(gdb) target remote | vgdb
(gdb) continue
It should then stop when reporting the SIGILL
Then (gdb) disassamble
so we can see exactly which instruction it was?

See also http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver

Comment 2 Pablo Greco 2017-09-25 21:48:47 UTC
(In reply to Mark Wielaard from comment #1)
> Would you be able to run the same with valgrind --vgdb-error=0 ssh
> And then in another terminal gdb ssh
> (gdb) target remote | vgdb
> (gdb) continue
> It should then stop when reporting the SIGILL
> Then (gdb) disassamble
> so we can see exactly which instruction it was?
> 
> See also
> http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.
> gdbserver

#gdb ssh
GNU gdb (GDB) Fedora 8.0-25.fc28
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "armv7hl-redhat-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ssh...Reading symbols from /root/ssh...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Missing separate debuginfos, use: dnf debuginfo-install openssh-clients-7.5p1-5.fc27.armv7hl
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 1425
warning: remote target does not support file transfer, attempting to access files from local filesystem.
Reading symbols from /lib/ld-linux-armhf.so.3...(no debugging symbols found)...done.
0x04000c00 in _start () from /lib/ld-linux-armhf.so.3
(gdb) continue
Continuing.
Cannot parse expression `.L1170 4@r4'.
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.


Program received signal SIGILL, Illegal instruction.
0x048fafa8 in _armv7_tick () from /lib/libcrypto.so.1.1
(gdb) disas
Dump of assembler code for function _armv7_tick:
=> 0x048fafa8 <+0>:	mrrc	15, 1, r0, r1, cr14
   0x048fafac <+4>:	bx	lr
End of assembler dump.
(gdb)

Comment 3 Mark Wielaard 2017-09-25 22:03:47 UTC
Thanks, this looks like upstream bug: https://bugs.kde.org/show_bug.cgi?id=331178

In which case it might be that libcrypto is deliberately trying to get a SIGILL (to determine if the instruction is supported). Does the program run under valgrind without extra messages if you use --sigill-diagnostics=no ?

Comment 4 Pablo Greco 2017-09-25 22:10:58 UTC
Yes, just normal valgrind messages.

# valgrind --sigill-diagnostics=no ssh
==1625== Memcheck, a memory error detector
==1625== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1625== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1625== Command: ssh
==1625== 
==1625== Warning: invalid file descriptor 1024 in syscall close()
==1625== Warning: invalid file descriptor 1025 in syscall close()
==1625== Warning: invalid file descriptor 1026 in syscall close()
==1625== Warning: invalid file descriptor 1027 in syscall close()
==1625==    Use --log-fd=<number> to select an alternative log fd.
==1625== Warning: invalid file descriptor 1028 in syscall close()
==1625== Warning: invalid file descriptor 1029 in syscall close()
==1625== Warning: invalid file descriptor 1030 in syscall close()
usage: ssh [-1246AaCfGgKkMNnqsTtVvXxYy] [-b bind_address] [-c cipher_spec]
           [-D [bind_address:]port] [-E log_file] [-e escape_char]
           [-F configfile] [-I pkcs11] [-i identity_file]
           [-J [user@]host[:port]] [-L address] [-l login_name] [-m mac_spec]
           [-O ctl_cmd] [-o option] [-p port] [-Q query_option] [-R address]
           [-S ctl_path] [-W host:port] [-w local_tun[:remote_tun]]
           [user@]hostname [command]
==1625== 
==1625== HEAP SUMMARY:
==1625==     in use at exit: 2,404 bytes in 37 blocks
==1625==   total heap usage: 148 allocs, 111 frees, 58,254 bytes allocated
==1625== 
==1625== LEAK SUMMARY:
==1625==    definitely lost: 112 bytes in 1 blocks
==1625==    indirectly lost: 2,220 bytes in 27 blocks
==1625==      possibly lost: 0 bytes in 0 blocks
==1625==    still reachable: 72 bytes in 9 blocks
==1625==         suppressed: 0 bytes in 0 blocks
==1625== Rerun with --leak-check=full to see details of leaked memory
==1625== 
==1625== For counts of detected and suppressed errors, rerun with: -v
==1625== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Comment 5 Mark Wielaard 2017-09-26 06:34:53 UTC
In that case this isn't really a bug since the program deliberately uses an non-existing instruction and handles the resulting SIGILL. If you don't want to see the message please run with -q or --sigill-diagnostics=no.