Bug 1940964

Summary: FTBFS: LLVM JIT related tests fail mesarably on s390x: incompatible data layouts
Product: [Fedora] Fedora Reporter: Honza Horak <hhorak>
Component: postgresqlAssignee: Filip Januš <fjanus>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 34CC: anon.amish, devrim, fjanus, hhorak, jmlich83, panovotn, pkubat, praiskup, sguelton, tgl, tstellar
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-10 23:01:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 467765, 1571215, 1883001    

Description Honza Horak 2021-03-19 16:24:51 UTC
Description of problem:
https://koji.fedoraproject.org/koji/taskinfo?taskID=64122369

ERROR:  failed to JIT module: Added modules have incompatible data layouts: E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64 (module) vs E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit)


Version-Release number of selected component (if applicable):
postgresql-13.2-3.fc34

How reproducible:
constantly for few last weeks

Steps to Reproduce:
1. rebuild the postgresql package
2.
3.

Actual results:
Build fails with this error:
ERROR:  failed to JIT module: Added modules have incompatible data layouts: E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64 (module) vs E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit)

Expected results:
Build succeeds

Additional info:
This looks like caused by LLVM 12 rebase as the last build that succeeded was the one with llvm 11.x

Comment 1 Honza Horak 2021-03-19 16:28:10 UTC
@sguelton @tstellar Does it sound familiar?

Comment 2 Honza Horak 2021-03-19 16:28:57 UTC
Any tips how to debug/fix? (I'm personally clueless so far)

Comment 3 Tom Lane 2021-03-19 19:13:57 UTC
FWIW, this is thought to work upstream, though not with PG releases older than 13.2.  Our resident expert thinks it sounds like a mismatch between libllvm and clang versions:

https://www.postgresql.org/message-id/20210319190047.7o4bwhbp5dzkqif3%40alap3.anarazel.de

Comment 4 Honza Horak 2021-03-22 16:53:50 UTC
(In reply to Tom Lane from comment #3)
> FWIW, this is thought to work upstream, though not with PG releases older
> than 13.2.  Our resident expert thinks it sounds like a mismatch between
> libllvm and clang versions:
> 
> https://www.postgresql.org/message-id/20210319190047.
> 7o4bwhbp5dzkqif3%40alap3.anarazel.de

Thanks for the pointer, Tom.

However, I see it with these versions that do not seem to be in a mismatch:
$> rpm -q clang llvm
clang-12.0.0-0.7.rc3.fc35.s390x
llvm-12.0.0-0.7.rc3.fc35.s390x

As F34 getting close and plpython2 removal (https://src.fedoraproject.org/rpms/postgresql/pull-request/28) being blocked by this now, it makes me think we can disable llvmjit for s390x till this is solved, as removing plpython2 later will not be possible.

Comment 5 Honza Horak 2021-03-22 17:55:00 UTC
(In reply to Honza Horak from comment #4)
> (In reply to Tom Lane from comment #3)
> > FWIW, this is thought to work upstream, though not with PG releases older
> > than 13.2.  Our resident expert thinks it sounds like a mismatch between
> > libllvm and clang versions:
> > 
> > https://www.postgresql.org/message-id/20210319190047.
> > 7o4bwhbp5dzkqif3%40alap3.anarazel.de
> 
> Thanks for the pointer, Tom.
> 
> However, I see it with these versions that do not seem to be in a mismatch:
> $> rpm -q clang llvm
> clang-12.0.0-0.7.rc3.fc35.s390x
> llvm-12.0.0-0.7.rc3.fc35.s390x

Actually, I indeed see some llvm v11 artifact left in the buildroot:
llvm11-libs

annobin pulls it in. annobin is pulled in by redhat-rpm-config. I didn't investigate properly yet, but hopefully successful rebuild of annobin might help.

Comment 6 Patrik Novotný 2021-03-23 13:25:02 UTC
I will test this in copr. Disabling JIT until this is fixed seems like a reasonable idea to me. I'll update here when I have this tested.

Comment 7 Honza Horak 2021-03-24 07:24:19 UTC
I tried to rebuild annobin in copr to get rid of llvm11-libs and while postgresql is still failing on s390x, the failures look differently: https://copr.fedorainfracloud.org/coprs/hhorak/test-pgsql-llvmjit/build/2092931/

FAILED (test process exited with exit code 2)

https://download.copr.fedorainfracloud.org/results/hhorak/test-pgsql-llvmjit/fedora-34-s390x/02092931-postgresql/build.log.gz

Comment 8 Honza Horak 2021-04-19 08:42:43 UTC
Even after getting rid of llvm11-libs from the buildroot (it is not pulled in in F34 any more) it does not work, still same error.

Comment 9 Honza Horak 2021-04-19 08:43:24 UTC
(In reply to Honza Horak from comment #8)
> Even after getting rid of llvm11-libs from the buildroot (it is not pulled
> in in F34 any more) it does not work, still same error.

Visible on the scratch build:
https://koji.fedoraproject.org/koji/taskinfo?taskID=66082182

Comment 10 Tom Stellard 2021-04-19 16:52:14 UTC
From what I can tell, this is a bug in postgresql. At runtime, it creates a JIT instance using the host CPU target, which has the DataLayout of the host.  However, when compiling JIT code, it is pulling the DataLayout from a bitcode file that is compiled at build time with no specific CPU target and thus a different DataLayout.

Comment 11 Tom Stellard 2021-04-19 20:10:02 UTC
Proposed fix for Fedora: https://src.fedoraproject.org/rpms/postgresql/pull-request/29

Comment 12 Honza Horak 2021-04-22 16:20:23 UTC
Related upstream discussion on bugs list:
https://www.postgresql.org/message-id/20210420225228.qr4x6zv3hqjorh5t%40alap3.anarazel.de

Comment 13 Filip Januš 2021-05-21 06:46:29 UTC
Same issue with postgresql 12.7:
+ERROR:  failed to JIT module: Added modules have incompatible data layouts: E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64 (module) vs E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit)

https://koji.fedoraproject.org/koji/taskinfo?taskID=68313568

The proposed workaround[1] needs to be updated to be suitable for postgresql12.7

[1] https://src.fedoraproject.org/rpms/postgresql/blob/41cd60000b91c121e1286c194284bffec770081b/f/postgresql-datalayout-mismatch-on-s390.patch

Comment 14 Honza Horak 2022-01-10 23:01:27 UTC
The postgresql package builds fine for some time on s390x even with JIT:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1866221

Closing this for now, as it looks like it's fixed.

Comment 15 Red Hat Bugzilla 2023-09-15 01:03:44 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days