Bug 1149721 - calc: tests hang on s390x
Summary: calc: tests hang on s390x
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: calc
Version: 24
Hardware: s390x
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Matthew Miller
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2014-10-06 14:18 UTC by Jakub Čajka
Modified: 2017-07-23 03:53 UTC (History)
5 users (show)

Fixed In Version: calc-2.12.5.6-1.fc26
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-23 03:53:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jakub Čajka 2014-10-06 14:18:48 UTC
tests in %check hangs on s390x.

Running make check, instead of make chk, shows this output(except):


"
.
.
.
8828: exp(64) ^ (2+3i) == power(exp(64), 2+3i)
8829: pi() ^ (257+127i) == power(pi(), 257+127i)
8830: pi() ^ asin(-2) == power(pi(), asin(-2)
8831: (3+4i) ^ (2+3i) == power(3+4i, 2+3i)
8832: ln(-10) ^ (2+3i) == power(ln(-10), 2+3i)
8833: (pi()*1i) ^ asin(-2) == power(pi()*1i, asin(-2))
8834: (exp(1)+pi()*1i) ^ asin(-2) == power(exp(1)+pi()*1i, asin(-2))
8835: Ending test_somenew

8900: Starting test of calc resource functions by Christoph Zurnieden
8901: read -once "test8900"
8902: about to run test8900(1,,8903)"


and hangs...

I have dug bit in to it and it seems as problem in func.c and/or in gcc as compiling calc with -fPIC, -O > 0 and -fstack-protector-strong flags only(make DEBUG="...") makes tests hang, but when func.c is compiled with -O0 or without -fstack-protector-strong flag (even any -O level with -fstack-protector(-all)) tests pass.

Also package builds successfully for f19(as build is using -fstack-protector), and hangs on f20+.

Note: Commenting out test cases with complex numbers (in file cal/test8900.cal) make tests pass.

Failed build: http://s390.koji.fedoraproject.org/koji/taskinfo?taskID=1578560

Comment 1 Matthew Miller 2014-10-06 14:21:38 UTC
(In reply to Jakub Čajka from comment #0)
> Note: Commenting out test cases with complex numbers (in file
> cal/test8900.cal) make tests pass.
> 
> Failed build: http://s390.koji.fedoraproject.org/koji/taskinfo?taskID=1578560

Hmmm, interesting. This release specifically fixes at least one bug in complex number handling. Does the previous version exhibit the same failure?

Comment 2 Jakub Čajka 2014-10-07 07:19:23 UTC
The bug seems to be introduced/triggered by addition of cal/test8900.cal. Version 2.12.4.8 builds(doesn't contain test8900.cal), newer versions hangs(they do contain test8900.cal)(all builds done for f21). So it doesn't seem to be related to recent changes.

Last successful builds on s390 koji:
http://s390.koji.fedoraproject.org/koji/buildinfo?buildID=190711 (f20)
http://s390.koji.fedoraproject.org/koji/buildinfo?buildID=272650 (f19)

Comment 3 Matthew Miller 2014-10-07 11:54:04 UTC
Do the 8900 tests fail with the old version of calc built with the new flags? You can run the test where it hangs like this:

calc -f /usr/share/calc/test8900.cal 'test8900(1,, 8903);'

(With, of course, the right path if you're using a version that doesn't include that.)

Let's see if we can identify a really small simple case that hangs, and if not bisecting back to the beginning of time, at least identify if it's a new bug or just something that's newly triggered.

Comment 4 Jakub Čajka 2014-10-08 13:44:34 UTC
In mock(f20, f21) with 'freshly build old versions', it hangs, at least since beginning of fedora time(tried initial(3rd and some in between) commit in fedora repo), on:
"
.
.
.
t031() defined
t032() defined
t033() defined
t034() defined
t035() defined
f(x) defined
t036() defined
t037() defined
test8900(verbose,tnum,testnum)"

8900.cal is used from latest version

But with build of version 2.12.4.7-2.fc20 from koji(link in comment 2) tests pass
(continuation of up posted snip)
"
8903: no errors in test t01
8904: no errors in test t02
8905: no errors in test t03
8906: no errors in test t04
8907: no errors in test t05
8908: no errors in test t06
8909: no errors in test t07
.
.
."

It is getting more perplexing...

Version 2.12.4.7-2.fc20 was build using gcc-4.8.0-6.fc19 with flags:

"-DCALC_SRC -UCUSTOM -Wall -W -Wno-comment   -fPIC -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -march=z9-109 -mtune=z10"(f19 flags...)

versions used in mock were build by recent gcc versions(4.8.3-7.fc20/4.9.1-9.fc21) and flags:

"DCALC_SRC -UCUSTOM -Wall -W -Wno-comment   -fPIC -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -march=z9-109 -mtune=z10"(f21)

"-DCALC_SRC -UCUSTOM -Wall -W -Wno-comment   -fPIC -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -march=z9-109 -mtune=z10"(f20)

It seems that bug is triggered by gcc flags(OP) and functions used with complex number arguments in test8900.cal on s390x..., omitting optflags make tests pass(in oldest fedora version)...

Comment 5 Matthew Miller 2014-10-08 14:51:56 UTC
> It seems that bug is triggered by gcc flags(OP) and functions used with complex number arguments in test8900.cal on s390x..., omitting optflags make tests pass(in oldest fedora version)...

Hmmm. This is going to get into gcc knowledge beyond my expertise, I'm afraid. Is there a gcc expert we can pull in here?

Comment 6 Jakub Čajka 2014-10-10 13:08:32 UTC
Changing component to gcc, as I don't know any.

Please, take a look if this is a legitimate gcc related issue. Thanks

Comment 7 Matthew Miller 2014-10-13 15:31:06 UTC
Jakub, can you check just to be sure that the problem still exists in calc 2.15.5.0? (Building in rawhide now.)

Comment 8 Jakub Jelinek 2014-10-13 16:51:49 UTC
Why do you think it has anything to do with gcc?  Just verified it builds just fine in f19, where the compiler is the same (4.8.3-7) and also the same -march/-mtune compiler options.

Comment 9 Jakub Jelinek 2014-10-13 17:08:34 UTC
Ah, but there is a -fstack-protector vs. -fstack-protector-strong difference between those.

As valgrind complains on that test though, even when built without -fstack-protector{,-strong} and opcodes.o built with -O0, I'm not going to spend any further time on this though, first fix it up so that it doesn't trigger undefined behavior:

8900: Starting test of calc resource functions by Christoph Zurnieden
8901: read -once "test8900"
8902: about to run test8900(1,,8903)
==5787== Conditional jump or move depends on uninitialised value(s)
==5787==    at 0x409D458: o_assign (opcodes.c:842)
==5787==    by 0x409D735: o_assignpop (opcodes.c:895)
==5787==    by 0x40A8975: calculate (opcodes.c:3914)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x406D207: evaluate (codegen.c:293)
==5787==    by 0x408D0DD: f_eval (func.c:214)
==5787==    by 0x408F157: builtinfunc (func.c:9191)
==5787== 
==5787== Conditional jump or move depends on uninitialised value(s)
==5787==    at 0x409D2BA: o_assign (opcodes.c:815)
==5787==    by 0x409D735: o_assignpop (opcodes.c:895)
==5787==    by 0x40A8975: calculate (opcodes.c:3914)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x406D207: evaluate (codegen.c:293)
==5787==    by 0x408D0DD: f_eval (func.c:214)
==5787==    by 0x408F157: builtinfunc (func.c:9191)
==5787== 
==5787== Conditional jump or move depends on uninitialised value(s)
==5787==    at 0x409D502: o_assign (opcodes.c:856)
==5787==    by 0x409D735: o_assignpop (opcodes.c:895)
==5787==    by 0x40A8975: calculate (opcodes.c:3914)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x406D207: evaluate (codegen.c:293)
==5787==    by 0x408D0DD: f_eval (func.c:214)
==5787==    by 0x408F157: builtinfunc (func.c:9191)
==5787== 
==5787== Conditional jump or move depends on uninitialised value(s)
==5787==    at 0x409D580: o_assign (opcodes.c:861)
==5787==    by 0x409D735: o_assignpop (opcodes.c:895)
==5787==    by 0x40A8975: calculate (opcodes.c:3914)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x406D207: evaluate (codegen.c:293)
==5787==    by 0x408D0DD: f_eval (func.c:214)
==5787==    by 0x408F157: builtinfunc (func.c:9191)
==5787== 
==5787== Conditional jump or move depends on uninitialised value(s)
==5787==    at 0x409D5F8: o_assign (opcodes.c:866)
==5787==    by 0x409D735: o_assignpop (opcodes.c:895)
==5787==    by 0x40A8975: calculate (opcodes.c:3914)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x40A50DF: o_usercall (opcodes.c:2720)
==5787==    by 0x40A89F3: calculate (opcodes.c:3922)
==5787==    by 0x406D207: evaluate (codegen.c:293)
==5787==    by 0x408D0DD: f_eval (func.c:214)
==5787==    by 0x408F157: builtinfunc (func.c:9191)
==5787==

Comment 10 Jakub Jelinek 2014-10-13 17:29:08 UTC
Building it on x86_64 with -fsanitize=undefined,address (on f21) finds another issue:
9815: read -once prompt
=================================================================
==22904==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d00001e261 at pc 0x7ff6ecd5962d bp 0x7fff49d89480 sp 0x7fff49d89440
READ of size 1 at 0x61d00001e261 thread T0
    #0 0x7ff6ecd5962c in strcmp (/lib64/libasan.so.1+0x3862c)
    #1 0x7ff6ec72267f in findglobal /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/symbol.c:149
    #2 0x7ff6ec723f89 in symboltype /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/symbol.c:763
    #3 0x7ff6ec5b5e99 in getfunction /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/codegen.c:388
    #4 0x7ff6ec5b5e99 in getcommands /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/codegen.c:137
    #5 0x7ff6ec5b5bee in getcommands /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/codegen.c:180
    #6 0x7ff6ec5b5bee in getcommands /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/codegen.c:180
    #7 0x7ff6ec5b5bee in getcommands /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/codegen.c:180
    #8 0x404983 in main /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/calc.c:599
    #9 0x327981ffdf in __libc_start_main (/lib64/libc.so.6+0x327981ffdf)
    #10 0x405dc6 (/home/jakub/rpmbuild/BUILD/calc-2.12.4.14/calc+0x405dc6)

0x61d00001e261 is located 481 bytes inside of 2000-byte region [0x61d00001e080,0x61d00001e850)
freed by thread T0 here:
    #0 0x7ff6ecd78a96 in __interceptor_realloc (/lib64/libasan.so.1+0x57a96)
    #1 0x7ff6ec7161ba in addstr /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/str.c:100

previously allocated by thread T0 here:
    #0 0x7ff6ecd787b7 in malloc (/lib64/libasan.so.1+0x577b7)
    #1 0x7ff6ec715c91 in initstr /home/jakub/rpmbuild/BUILD/calc-2.12.4.14/str.c:65

SUMMARY: AddressSanitizer: heap-use-after-free ??:0 strcmp
Shadow bytes around the buggy address:
  0x0c3a7fffbbf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c3a7fffbc00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c3a7fffbc10: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c3a7fffbc20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c3a7fffbc30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c3a7fffbc40: fd fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd fd
  0x0c3a7fffbc50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c3a7fffbc60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c3a7fffbc70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c3a7fffbc80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c3a7fffbc90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Contiguous container OOB:fc
  ASan internal:           fe
==22904==ABORTING

Comment 11 Jakub Jelinek 2014-10-13 18:39:44 UTC
Even the valgrind errors are reproduceable on x86_64.
And, you can check yourself if it builds on s390x, just run
s390-koji build --scratch f21 calc*.src.rpm
Note, the valgrind errors on the 89?? test aren't the only errors reported, clearly the package is in bad shape.

Comment 12 Matthew Miller 2014-10-13 18:43:47 UTC
Thanks Jakub. The code is an academic project with a long history. I'll take this back upstream.

Comment 13 Christoph Zurnieden 2014-10-13 19:47:57 UTC
Hi,

I'm the poor guy who's name showed up here several times in relation to a compile failure but I'm not able to reproduce them for lack of access to the related hardware. The two places listed here Valgrind bristled at seem to be an overreaction of Valgrind at the second time (yes, there is a general problem with realloc but I don't think it applies here) and it seems to be one the first time, too, but that needs some more digging (I didn't write the calc engine, that was Landon Curt Noll, I wrote only some scripts for it).

So, because of the reasons mentioned above (lack of hardware) I would appreciate if somebody could send me a listing of the relevant errors and warnings or better: publish it somewhere for others to read, too.

CZ

Comment 14 Jakub Jelinek 2014-10-13 19:55:04 UTC
On x86_64, the full list of valgrind errors is http://ur1.ca/id6do

Comment 15 Christoph Zurnieden 2014-10-14 18:18:06 UTC
Ah, great, thanks!

This may take a while, please be patient.

CZ

Comment 16 Fedora End Of Life 2015-11-04 15:02:29 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 17 Jan Kurik 2016-02-24 13:16:38 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase

Comment 18 Matthew Miller 2017-05-22 13:55:47 UTC
I think this is fixed now, because the s390 build succeeded in Rawhide here: https://koji.fedoraproject.org/koji/buildinfo?buildID=896009

Comment 19 Fedora Update System 2017-05-22 13:58:19 UTC
calc-2.12.5.6-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-a57c2f7391

Comment 20 Fedora Update System 2017-05-23 18:15:05 UTC
calc-2.12.5.6-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-a57c2f7391

Comment 21 Fedora Update System 2017-07-23 03:53:57 UTC
calc-2.12.5.6-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.