Bug 2109065

Summary: g++ -O2 compiles buggy code with union aliasing.
Product: Red Hat Enterprise Linux 9 Reporter: Kirby Zhou <kirbyzhou>
Component: gccAssignee: Marek Polacek <mpolacek>
gcc sub component: system-version QA Contact: qe-baseos-tools-bugs
Status: CLOSED UPSTREAM Docs Contact:
Severity: high    
Priority: unspecified CC: ahajkova, bstinson, fweimer, jakub, jwboyer, ohudlick, sipoyare
Version: CentOS StreamKeywords: Reopened
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-22 13:17:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The buggy demo code. none

Description Kirby Zhou 2022-07-20 11:16:48 UTC
Created attachment 1898267 [details]
The buggy demo code.

Description of problem:

g++ -O2 compiles buggy code with union aliasing.

I extracted a code fragment to demo the bug.
The code is from https://github.com/preshing/cpp11-on-multicore which used by apache-pegasus and many other tools.


Version-Release number of selected component (if applicable):

gcc-c++-11.3.1-2.1.el9 
gcc-c++-11.2.1-9.4.el9

How reproducible:

100%


Steps to Reproduce:

1. download the attachement as bitbug.cpp
2. g++ -O2 bitbug.cpp && ./a.out 


Actual results:

good: 00000001 expect 00000001
bad : 00100001 expect 00000001

Expected results:

good: 00000001 expect 00000001
bad : 00000001 expect 00000001

Additional info:

The buggy code is:

            newStatus = oldStatus;
            newStatus.writers-=1;
            waitToRead = oldStatus.waitToRead;
            if (waitToRead > 0) {
                newStatus.waitToRead = 0;
                newStatus.readers = waitToRead;
            }
            m_status = newStatus;


The buggy assembly like:

mov    %eax,%edx
lea    -0x100000(%rax),%ecx
and    $0xfff00000,%eax
shr    $0xa,%edx
and    $0x3ff,%edx
or     %edx,%eax
test   %edx,%edx
cmove  %ecx,%eax

"lea    -0x100000(%rax),%ecx" is related to  "newStatus.writers-=1".
But %ecx is not merged to eax which related to "newStatus.readers = waitToRead".

So if waitToRead>0 then newStatus.writers will not be changed.

Comment 1 Kirby Zhou 2022-07-20 11:24:19 UTC
The bug happens at both x86_64 and aarch64

The bug wont happens at gcc-c++-10.3.1-1.2.el8_5

Comment 2 Marek Polacek 2022-07-21 01:37:17 UTC
This changed with https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d119f34c952f8718fdbabc63e2f369a16e92fa07, so the old behavior can be brought by -fno-ipa-modref.

However, I think the program invokes undefined behavior.  For clarity, this is the union after
preprocessing:

union Status {
  struct Wrapper { uint32_t value; };
  Wrapper wrapper;
  Status(uint32_t v = 0) { wrapper.value = v; } 
  Status &operator=(uint32_t v) { wrapper.value = v; return *this; }
  operator uint32_t &() { return wrapper.value; }
  operator uint32_t() const { return wrapper.value; }
  typedef uint32_t StorageType;
  BitFieldMember<StorageType, 0, 10> readers;
  BitFieldMember<StorageType, 10, 10> waitToRead;
  BitFieldMember<StorageType, 20, 10> writers;
};

so this union has 4 members.  [class.union] says that "At most one of the
non-static data members of an object of union type can be active at any time,
that is, the value of at most one of the non-static data members can be stored
in a union at any time."

Then the program does this:

    Status status; // activates .wrapper
    status.writers = 1; // activates .writers
    status.waitToRead = 1; // activates .waitToRead
    NonRecursiveRWLock lock;
    lock.m_status = status; // invokes Status::operator uint32_t&, which uses the non-active member .wrapper, which is UB

So I think this program is invalid.

Comment 3 Kirby Zhou 2022-07-21 04:06:27 UTC
But the union is actually a union of bitfields.

union {
   int value;
   struct {
      int readers : 10;
      int waitToRead : 10;
      int writers : 10;
   };
}


And this bit-field structure pattern is widely spread in years referred as "Safe Bitfields in C++".

Such as:

https://preshing.com/20150324/safe-bitfields-in-cpp/
https://zditect.com/code/cpp/safe-bitfields-in-c.html
https://m.blog.naver.com/framkang/220414843423
https://www.reddit.com/r/programming/comments/3i6q2w/safe_bitfields_in_c/
https://www.intensedebate.com/people/UUD
https://gfjiangly.github.io/cpu_parallel/you-can-do-any-kind-of-atomic-read-modify-write-operation.html

So if gcc-11 breaks this pattern, a lot of existing code will be hurt.

Comment 4 Marek Polacek 2022-07-21 12:43:50 UTC
Ah, sorry; I missed that BitFieldMember and Wrapper actually *are* standard-layout classes, therefore the exception
about a common initial sequence (here, a uint32_t) in [class.union.general] kicks in.

I'm going to file an upstream bug, sorry again and thanks for the report.

Comment 6 Marek Polacek 2022-07-21 17:25:16 UTC
Upstream PR opened.

Comment 7 Marek Polacek 2022-07-22 13:17:43 UTC
We have to fix this upstream first.  A workaround is to use -fno-strict-aliasing.

Comment 8 Kirby Zhou 2023-06-21 09:31:31 UTC
(In reply to Marek Polacek from comment #4)
> Ah, sorry; I missed that BitFieldMember and Wrapper actually *are*
> standard-layout classes, therefore the exception
> about a common initial sequence (here, a uint32_t) in [class.union.general]
> kicks in.
> 
> I'm going to file an upstream bug, sorry again and thanks for the report.

Is there any progress in upstream?

Comment 9 Marek Polacek 2023-06-21 10:59:21 UTC
(In reply to Kirby Zhou from comment #8)
> (In reply to Marek Polacek from comment #4)
> > Ah, sorry; I missed that BitFieldMember and Wrapper actually *are*
> > standard-layout classes, therefore the exception
> > about a common initial sequence (here, a uint32_t) in [class.union.general]
> > kicks in.
> > 
> > I'm going to file an upstream bug, sorry again and thanks for the report.
> 
> Is there any progress in upstream?

It looks like the developers don't want to implement this rule; rather, users should use may_alias:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106389#c5