Bug 2269010

Summary: clang-18 seems to miscompile systemd code, unit tests fail
Product: [Fedora] Fedora Reporter: Zbigniew Jędrzejewski-Szmek <zbyszek>
Component: clangAssignee: Tom Stellard <tstellar>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 40CC: agurenko, airlied, jchecahi, kkleine, npopov, sergesanspaille, siddharth.kde, tbaeder, tstellar, tuliom
Target Milestone: ---Flags: tstellar: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-05-28 08:52:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zbigniew Jędrzejewski-Szmek 2024-03-11 13:58:05 UTC
$ git branch --show-current; git describe  
v254-stable
v253.17-1-ge4f1fb16ae
$ rpm -q clang       
clang-18.1.0~rc4-2.fc40.x86_64
$ CC=clang CXX=clang++ meson setup build-clang -Dbuildtype=debugoptimized && ninja -C build-clang
...
$ build-clang/test-in-addr-util
...
/* test_in_addr_is_null */
Assertion 'in_addr_is_null(AF_INET6, &i) == true' failed at src/test/test-in-addr-util.c:172, function test_in_addr_is_null(). Aborting.

This is very basic code that hasn't changed in a long time and is regularly tested on different distros, with different compilers. No issue with gcc or with clang-17 or when optimization is disabled.


Reproducible: Always

Comment 1 Zbigniew Jędrzejewski-Szmek 2024-03-11 14:16:09 UTC
https://github.com/systemd/systemd/pull/31633

Comment 2 Timm Bäder 2024-03-11 15:53:10 UTC
Standalone reproducer: https://paste.centos.org/view/610fd24f

This aborts when using clang -O2 (clang 19 here), works with -O2 and with gcc (both O0 and O2).

CC @nikic since it only happens with optimizations.

Comment 3 Timm Bäder 2024-03-11 16:08:25 UTC
The main function just gets optimized to:

; Function Attrs: noreturn nounwind uwtable
define dso_local noundef i32 @main() local_unnamed_addr #1 {
  tail call void @abort() #3
  unreachable
}

Comment 4 Nikita Popov 2024-03-11 16:58:53 UTC
This is not related to LLVM optimizations, the difference also exists with -O2 -Xclang -disable-llvm-optzns

Looking at the diff between the two, I think the relevant part is that the union is initialed differently. With -O0 there is a copy from a constant (of the whole union). With -O2 only the first member of the union is initialized and the rest stores undef values.

I think this is https://github.com/llvm/llvm-project/issues/78034. Clangs interpretation of how union initialization with = {} works (chosen when this was a language extension, and I believe in a way that aligns with C++ semantics) does not match what was standardized in C23, which requires this to actually zero the whole union rather than just the first member.

Based on the discussion on the issue this looks like a complex issue, a full fix to that issue looks quite involved, but maybe we can figure out what change to clang caused the codegen difference (going from memcpy to full expansion) here and revert that to somewhat mitigate the issue.

Comment 5 Nikita Popov 2024-03-11 17:02:21 UTC
I had a hunch, and it looks like this patch does indeed "fix" the issue: https://github.com/llvm/llvm-project/pull/84230

(Doesn't really fix anything, but returns it to the status quo of mostly working out in practice.)

Comment 6 Tom Stellard 2024-03-12 19:26:20 UTC
Is this a blocker for getting clang18 into f40 ?

Comment 7 Zbigniew Jędrzejewski-Szmek 2024-03-13 08:45:24 UTC
That is a good question. Do you think that the issue affects other code?
If it's at least moderately common, I don't think we should publish a beta
candidate with that version of the compiler.

Comment 8 Tom Stellard 2024-03-13 13:37:38 UTC
I have not seen this issue come up in our rebuild testing, so it does not seem to be widespread.  Does systemd need clang to build in Fedora or is this failure only in upstream CI?  It seems we have a workaround that we could backport now.  I could pull this workaround into f40 now if we are going to block the update on it, but if we aren't going to block the update, I'd prefer to wait for something to land in upstream rather than doing something Fedora specific.

Comment 9 Zbigniew Jędrzejewski-Szmek 2024-03-13 13:52:09 UTC
No, systemd does not use clang. We compile it with clang for testing only, and in this particular case, I did it to test the clang update on Fedora.

OK, so let's merge this.

Comment 10 Nikita Popov 2024-05-28 08:52:49 UTC
LLVM with the fix has been updated in rawhide and f40 since, so this should be fixed.