Bug 1555151 - gcc: uninitialized value on armhfp with -O2
Summary: gcc: uninitialized value on armhfp with -O2
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: 31
Hardware: armhfp
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1555753
TreeView+ depends on / blocked
 
Reported: 2018-03-14 03:59 UTC by Jerry James
Modified: 2020-11-24 16:40 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-24 16:40:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Test case showing problem on 32-bit ARM (10.42 KB, text/x-csrc)
2018-03-14 03:59 UTC, Jerry James
no flags Details
Reduced test case (1.45 KB, text/plain)
2019-02-25 04:04 UTC, Jerry James
no flags Details
Reduced test case (7.87 KB, text/x-csrc)
2019-02-28 16:59 UTC, Jerry James
no flags Details
Reduced test case (7.87 KB, text/x-csrc)
2019-02-28 17:05 UTC, Jerry James
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 89546 0 P2 RESOLVED [8/9 Regression] Suspected arm flint miscompilation starting with r255510 2020-02-17 11:27:49 UTC

Description Jerry James 2018-03-14 03:59:42 UTC
Created attachment 1407796 [details]
Test case showing problem on 32-bit ARM

Description of problem:
The flint package failed the mass rebuild on 32-bit ARM, due to a failing test.  The test passes on all other architectures, and passes on all architectures including 32-bit ARM with previous versions of gcc (F27 and older).  Running under valgrind shows a lot of "conditional jump or move depends on uninitialised value(s)" warnings, none of which appear when run under valgrind on other architectures (or, again, on 32-bit ARM when compiled with older versions of gcc).

The problem does not appear at optimization levels -O0 or -O1, but does at -O2.

I have reduced the failing test down to sources which I will attach to this bug.  This is the first warning issued by valgrind:

==9442== Conditional jump or move depends on uninitialised value(s)
==9442==    at 0x11B7C: std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int> >::_M_get_insert_unique_pos(int const&) (stl_tree.h:2055)
==9442==    by 0x11CA3: std::pair<std::_Rb_tree_iterator<int>, bool> std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int> >::_M_insert_unique<int const&>(int const&) (stl_tree.h:2106)
==9442==    by 0x1200F: insert (stl_set.h:511)
==9442==    by 0x1200F: doit (test.cpp:322)
==9442==    by 0x1200F: doit (test.cpp:323)
==9442==    by 0x1200F: std::set<int, std::less<int>, std::allocator<int> > values<flint::tuple<type_n<0, true>, flint::tuple<type_n<1, true>, flint::tuple<type_n<2, true>, flint::empty_tuple> > > >(flint::tuple<type_n<0, true>, flint::tuple<type_n<1, true>, flint::tuple<type_n<2, true>, flint::empty_tuple> > > const&) (test.cpp:347)
==9442==    by 0x11253: main (test.cpp:396)

Version-Release number of selected component (if applicable):
gcc-c++-8.0.1-0.17.fc29.armv7hl

How reproducible:
Always

Steps to Reproduce:
1. Build the attached sources with g++ -O2 -o test test.cpp
2. Run ./test
3.

Actual results:
On 32-bit ARM, the test fails:
FAIL
test.cpp:396: assertion vals1.size() == 4 failed

On all other Fedora architectures, the test passes.

Expected results:
The test should pass on all architectures.

Additional info:

Comment 1 Jonathan Wakely 2018-03-14 12:32:38 UTC
I see constructors that don't initialize members:

    tuple() {};

  type_n() {};

Does changing them to avoid uninitialized members make any difference?

Comment 2 Jonathan Wakely 2018-03-14 12:39:46 UTC
Those constructors look especially suspect given the valgrind output shows:

flint::tuple<type_n<2, true>, flint::empty_tuple>

tuple<type_n<2, true>, empty_tuple> has a default constructor that doesn't explicitly initialize its type_n<2, true> member, and that member has a default constructor which doesn't initialize its int member.

Try:

  type_n() : payload() {}

Comment 3 Jerry James 2018-03-23 01:58:31 UTC
Thanks for the comments, Jonathan.  Sorry to take so long to get back around to this.  I've had soooo many packages break in the last 3 months, it's been difficult to find time for them all.

I changed the constructors like so:

tuple() : head(), tail() {}
type_n() : payload(0) {}

That didn't change the outcome.  This is curious: if I add either -fsanitize=address or -fsanitize=undefined to the compiler flags, the resulting executable passes the test, displays no errors from the sanitizer, and shows no errors under valgrind.  I worked my way through the optimizer differences between -O1 and -O2, and found that -fno-tree-tail-merge also makes the test pass.

To summarize, the tests all pass and valgrind shows no complaints if:
- Any architecture but 32-bit ARM is used
- GCC 7 or earlier is used
- -O0 or -O1 is given
- -fsanitize=address or -fsanitize=undefined is given
- -fno-tree-tail-merge is given

Comment 4 Jonathan Wakely 2018-03-23 10:43:55 UTC
(In reply to Jerry James from comment #3)
> Thanks for the comments, Jonathan.  Sorry to take so long to get back around
> to this.  I've had soooo many packages break in the last 3 months, it's been
> difficult to find time for them all.

Yes, I've noticed how much time you've had to spend in bugzilla!

> I changed the constructors like so:
> 
> tuple() : head(), tail() {}
> type_n() : payload(0) {}
> 
> That didn't change the outcome.

OK, thanks for checking. If that had been the problem I'd have expected it to show up on other arches anyway.

Comment 5 Jerry James 2018-07-24 16:41:35 UTC
I should have mentioned back in May that I added -fno-tree-tail-merge to the flint build flags, on 32-bit ARM only.  But it would be nice if somebody could figure out what is really going wrong here, in particular whether this is a gcc bug or a flint bug.

Comment 6 Jan Kurik 2018-08-14 10:11:00 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle.
Changing version to '29'.

Comment 7 Jerry James 2019-02-20 03:50:23 UTC
The behavior is unchanged with gcc 9.0.1-0.4.fc30.armv7hl.  The test passes with -fno-tree-tail-merge, but fails with just -O2.  Please advise whether this is a gcc bug or a bug in the test code.

Comment 8 Jerry James 2019-02-25 04:04:14 UTC
Created attachment 1538290 [details]
Reduced test case

I've got a smaller, though sadly less comprehensible, test case due to C-Reduce.  For the bug to manifest, all of the following seem to be needed:
- The variable declaration in the inner block must have the same name as the one declared after the inner block.
- Both the inner block and the outer block must end with the same function call (changing exit(0) to return 0 makes the bug stop manifesting).
- Complex variable types.  My attempts at changing the nested template types to something simpler have also made the bug stop manifesting.

Comment 9 Jakub Jelinek 2019-02-28 15:39:54 UTC
I get FAIL printed and valgrind complaining about "Conditional jump or move depends on uninitialised value" both when it is built with -O0 and -O2, g++ 9.0.1 as well as 8.2.1 on the #c8 testcase.
Are you sure it is valid?

Comment 10 Jakub Jelinek 2019-02-28 15:43:50 UTC
And on x86_64 as well (also at -O0).

Comment 11 Jerry James 2019-02-28 16:09:29 UTC
Yes, I let C-Reduce reduce to invalid code. :-(  I'm working on another reduction right now that will hopefully avoid that issue.  I'll attach it here when it is ready.

Comment 12 Jerry James 2019-02-28 16:59:32 UTC
Created attachment 1539610 [details]
Reduced test case

I seem to be having trouble preventing C-Reduce from introducing uninitialized values.  In the meantime, then, here is a hand-reduced version with no preprocessor directives.  However, note that the recipe for triggering the issue has now changed.

When built with "g++ -O1 -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations -Wall -Wextra -o test test.cpp", the program exits with exit code 1.

When built with "g++ -O1 -fno-tree-ter -Wall -Wextra -o test test.cpp", the program exits with exit code 0.

In neither case are any warnings emitted.  None of valgrind, -fsanitize=address, or -fsanitize=undefined report any issues.

Comment 13 Jerry James 2019-02-28 17:04:21 UTC
Argh.  That's -fno-tree-sra to get exit code 0, not -fno-tree-ter.  Sorry about that.

Comment 14 Jerry James 2019-02-28 17:05:47 UTC
Created attachment 1539611 [details]
Reduced test case

Comment 15 Jakub Jelinek 2019-02-28 23:27:53 UTC
I've bisected the #c0 testcase with the #c3 fixes to http://gcc.gnu.org/r255510 GCC change, which doesn't mean anything, appart from that we should study how it changed the code generation and whether something is wrong.

Comment 16 Ben Cotton 2019-08-13 16:56:06 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 17 Ben Cotton 2019-08-13 19:14:13 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 18 Ben Cotton 2020-11-03 15:00:24 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 19 Ben Cotton 2020-11-24 16:40:41 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.