Bug 1036615

Summary: SIGILL: Illegal operand at address 0x803096BF8
Product: Red Hat Enterprise Linux 7 Reporter: Miroslav Franc <mfranc>
Component: valgrindAssignee: Mark Wielaard <mjw>
Status: CLOSED CURRENTRELEASE QA Contact: Miroslav Franc <mfranc>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: jakub, mbenitez, mfranc, ohudlick
Target Milestone: rc   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: valgrind-3.9.0-2.3.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 12:53:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Comment 1 Mark Wielaard 2013-12-02 12:17:39 UTC
Could you run with valgrind --vgdb-error=0 and then in another window attach gdb as suggested, continue and when crashed, disassemble to see what instruction this really is about?

Comment 2 Mark Wielaard 2013-12-02 13:06:27 UTC
# valgrind --vgdb-error=0 convert -motion-blur 20x5 munich.jpg munich-with.jpg
==42128== Memcheck, a memory error detector
==42128== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==42128== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==42128== Command: convert -motion-blur 20x5 munich.jpg munich-with.jpg
==42128== 
==42128== (action at startup) vgdb me ... 
==42128== 
==42128== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==42128==   /path/to/gdb convert
==42128== and then give GDB the following command
==42128==   target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=42128
==42128== --pid is optional if only one valgrind process is running
==42128== 

# gdb convert
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-45.el7
Reading symbols from /usr/bin/convert...Reading symbols from /usr/lib/debug/usr/bin/convert.debug...done.
done.
(gdb) target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=42128
Remote debugging using | /usr/lib64/valgrind/../../bin/vgdb --pid=42128
relaying data between gdb and process 42128
Reading symbols from /lib/ld64.so.1...Reading symbols from /usr/lib/debug/usr/lib64/ld-2.17.so.debug...done.
done.
Loaded symbols for /lib/ld64.so.1
0x00000042b3b342d0 in _start () from /lib/ld64.so.1
(gdb) c
Continuing.

Program received signal SIGILL, Illegal instruction.
MotionBlurImageChannel (image=0x456a580, channel=<optimized out>, 
    radius=<optimized out>, sigma=5, angle=0, exception=0x4536b40)
    at magick/effect.c:2676
2676	  point.x=(double) width*sin(DegreesToRadians(angle));
(gdb) disassemble 
Dump of assembler code for function MotionBlurImageChannel:
   0x0000000004103fdc <+0>:	stmg	%r6,%r15,48(%r15)
   0x0000000004103fe2 <+6>:	larl	%r13,0x42303a0
   0x0000000004103fe8 <+12>:	lgr	%r14,%r15
[...]
   0x00000000041041c0 <+484>:	cij	%r2,0,8,0x41045b2 <MotionBlurImageChannel+1494>
   0x00000000041041c6 <+490>:	cxgbr	%f4,%r8
   0x00000000041041ca <+494>:	cgij	%r8,0,4,0x4104540 <MotionBlurImageChannel+1380>
=> 0x00000000041041d0 <+500>:	lxdbr	%f0,%f14
   0x00000000041041d4 <+504>:	ldxbr	%f1,%f4
   0x00000000041041d8 <+508>:	ldr	%f10,%f1
   0x00000000041041da <+510>:	ld	%f4,32(%r13)
[...]
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) c
Continuing.

Program terminated with signal SIGILL, Illegal instruction.
The program no longer exists.

==42128== Process terminating with default action of signal 4 (SIGILL)
==42128==  Illegal operand at address 0x803197080
==42128==    at 0x41041CC: MotionBlurImageChannel (effect.c:2676)
==42128==    by 0x41041BF: MotionBlurImageChannel (effect.c:2668)
==42128== 
==42128== HEAP SUMMARY:
==42128==     in use at exit: 28,396,739 bytes in 817 blocks
==42128==   total heap usage: 1,350 allocs, 533 frees, 28,691,980 bytes allocated
==42128== 
==42128== LEAK SUMMARY:
==42128==    definitely lost: 384 bytes in 1 blocks
==42128==    indirectly lost: 0 bytes in 0 blocks
==42128==      possibly lost: 0 bytes in 0 blocks
==42128==    still reachable: 28,396,355 bytes in 816 blocks
==42128==         suppressed: 0 bytes in 0 blocks
==42128== Rerun with --leak-check=full to see details of leaked memory
==42128== 
==42128== For counts of detected and suppressed errors, rerun with: -v
==42128== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction (core dumped)

Comment 3 Mark Wielaard 2013-12-02 13:19:50 UTC
Hmmm, that address 0x803197080 is probably not in the program itself. Must be a bad valgrind translation then?

Some basic block output (valgrind --vex-guest-chase-thresh=0 --trace-flags=10000000 --trace-notbelow=10243 --run-libc-freeres=no convert -motion-blur 20x5 munich.jpg munich-with.jpg):

==== SB 10243 (evchecks 14075350) [tid 1] 0x41041d0 MotionBlurImageChannel+500 /usr/lib64/libMagickCore.so.5.0.0+0xdf1d0

------------------------ Front end ------------------------

lxdbr    %f0,%f14
              ------ IMark(0x41041D0, 4, 0) ------
              t0 = GET:F64(176)
              PUT(64) = F128HItoF64(F64toF128(t0))
              PUT(80) = F128LOtoF64(F64toF128(t0))
              PUT(336) = 0x41041D4:I64

ldxbr    %f1,0,%f4,0
              ------ IMark(0x41041D4, 4, 0) ------
              t2 = And32(GET:I32(328),0x7:I32)
              t3 = And32(Sub32(0x4:I32,ITE(CmpLE32S(t2,0x3:I32),t2,0x0:I32)),0x3:I32)
              t1 = F128toF64(t3,F64HLtoF128(GET:F64(96),GET:F64(112)))
              PUT(72) = t1
              PUT(336) = 0x41041D8:I64

ldr      %f10,%f1
              ------ IMark(0x41041D8, 2, 0) ------
              PUT(144) = GET:F64(72)
              PUT(336) = 0x41041DA:I64

ld       %f4,32(%r13)
              ------ IMark(0x41041DA, 4, 0) ------
              t4 = Add64(Add64(0x20:I64,GET:I64(296)),0x0:I64)
              PUT(96) = LDbe:F64(t4)
              PUT(336) = 0x41041DE:I64

ld       %f6,40(%r13)
              ------ IMark(0x41041DE, 4, 0) ------
              t5 = Add64(Add64(0x28:I64,GET:I64(296)),0x0:I64)
              PUT(112) = LDbe:F64(t5)
              PUT(336) = 0x41041E2:I64

ld       %f1,16(%r13)
              ------ IMark(0x41041E2, 4, 0) ------
              t6 = Add64(Add64(0x10:I64,GET:I64(296)),0x0:I64)
              PUT(72) = LDbe:F64(t6)
              PUT(336) = 0x41041E6:I64

ld       %f3,24(%r13)
              ------ IMark(0x41041E6, 4, 0) ------
              t7 = Add64(Add64(0x18:I64,GET:I64(296)),0x0:I64)
              PUT(88) = LDbe:F64(t7)
              PUT(336) = 0x41041EA:I64

la       %r2,232(%r15)
              ------ IMark(0x41041EA, 4, 0) ------
              t8 = Add64(Add64(0xE8:I64,GET:I64(312)),0x0:I64)
              PUT(208) = t8
              PUT(336) = 0x41041EE:I64

mxbr     %f0,%f4
              ------ IMark(0x41041EE, 4, 0) ------
              t12 = And32(GET:I32(328),0x7:I32)
              t13 = And32(Sub32(0x4:I32,ITE(CmpLE32S(t12,0x3:I32),t12,0x0:I32)),0x3:I32)
              t9 = F64HLtoF128(GET:F64(64),GET:F64(80))
              t10 = F64HLtoF128(GET:F64(96),GET:F64(112))
              t11 = MulF128(t13,t9,t10)
              PUT(64) = F128HItoF64(t11)
              PUT(80) = F128LOtoF64(t11)
              PUT(336) = 0x41041F2:I64

la       %r3,224(%r15)
              ------ IMark(0x41041F2, 4, 0) ------
              t14 = Add64(Add64(0xE0:I64,GET:I64(312)),0x0:I64)
              PUT(216) = t14
              PUT(336) = 0x41041F6:I64

dxbr     %f0,%f1
              ------ IMark(0x41041F6, 4, 0) ------
              t18 = And32(GET:I32(328),0x7:I32)
              t19 = And32(Sub32(0x4:I32,ITE(CmpLE32S(t18,0x3:I32),t18,0x0:I32)),0x3:I32)
              t15 = F64HLtoF128(GET:F64(64),GET:F64(80))
              t16 = F64HLtoF128(GET:F64(72),GET:F64(88))
              t17 = DivF128(t19,t15,t16)
              PUT(64) = F128HItoF64(t17)
              PUT(80) = F128LOtoF64(t17)
              PUT(336) = 0x41041FA:I64

ldxbr    %f4,0,%f0,0
              ------ IMark(0x41041FA, 4, 0) ------
              t21 = And32(GET:I32(328),0x7:I32)
              t22 = And32(Sub32(0x4:I32,ITE(CmpLE32S(t21,0x3:I32),t21,0x0:I32)),0x3:I32)
              t20 = F128toF64(t22,F64HLtoF128(GET:F64(64),GET:F64(80)))
              PUT(96) = t20
              PUT(336) = 0x41041FE:I64

ldr      %f0,%f4
              ------ IMark(0x41041FE, 2, 0) ------
              PUT(64) = GET:F64(96)
              PUT(336) = 0x4104200:I64

brasl    %r14,.-662392
              ------ IMark(0x4104200, 6, 0) ------
              PUT(304) = 0x4104206:I64
              PUT(336) = 0x4062688:I64
              PUT(336) = GET:I64(336); exit-Call

GuestBytes 41041D0 54  B3 05 00 0E B3 45 00 14 28 A1 68 40 D0 20 68 60 D0 28 68 10 D0 10 68 30 D0 18 41 20 F0 E8 B3 4C 00 04 41 30 F0 E0 B3 4D 00 01 B3 45 00 40 28 04 C0 E5 FF FA F2 44  49461AE0

VexExpansionRatio 54 984   182 :10

==43858== 
==43858== Process terminating with default action of signal 4 (SIGILL)
==43858==  Illegal operand at address 0x80380DC58

Comment 5 Miroslav Franc 2013-12-03 15:26:01 UTC
(In reply to Mark Wielaard from comment #3)
> Hmmm, that address 0x803197080 is probably not in the program itself. Must
> be a bad valgrind translation then?

With --vex-guest-max-insns=8 and smaller the problem went away for me.

Comment 6 Miroslav Franc 2013-12-05 12:28:24 UTC
(In reply to Mark Wielaard from comment #3)
> Hmmm, that address 0x803197080 is probably not in the program itself. Must
> be a bad valgrind translation then?

So I couldn't help myself and did some further digging.  Apparently valgrind is dying upon execution of the following instruction:

# ldxbr   %f10,%f13 

Looks like you cannot use just any FP register pair with it.  Even gas is screaming at me:

# Fatal error: invalid floating point register pair.  Valid fp register pair operands are 0, 1, 4, 5, 8, 9, 12 or 13.

If you compile and execute the following it will die of SIGILL. But funnily enough when you run the same under valgrind it's fine because valgrind is using %8 and %13 which is ok.

---
int
main (void)
{ 

  /* ldxbr   %f10,%f13  ~> very bad */
  asm volatile (".byte 0xB3,0x45,0x00,0xAD");

  /* ldxbr   %f8,%f13   ~> it's ok */
/*  asm volatile (".byte 0xB3,0x45,0x00,0x8D"); */

  return 0;
}
---

So the question seems to be why is valgrind choosing the wrong register pair for ldxbr instruction in original translation?

Comment 7 Miroslav Franc 2013-12-05 16:30:53 UTC
Good source is probably on...
http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr009.pdf

In section 19-34 there's a description of LDXBR and it's stated there that...
"For LDXBR, LDXBRA, LEXBR, and LEXBRA, the R1
and R2 fields must designate valid floating-point-reg-
ister pairs; otherwise, a specification exception is rec-
ognized."

And in section 2-4 there's a description which registers could be organized into pairs.

Comment 11 Miroslav Franc 2014-03-20 11:44:26 UTC
Works with valgrind-3.9.0-6.el7.

Comment 12 Ludek Smid 2014-06-13 12:53:47 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.