Bug 2279527
Summary: | [Tracker for bug https://bugzilla.redhat.com/show_bug.cgi?id=2297113] [IBM Z]rook-ceph-mgr and rook-ceph-mon pod are in CrashLoopBackOff state (dynamic linker and libtcmalloc recursive call loop) | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Maya Anilson <manilson> | ||||||||||||||||||
Component: | unclassified | Assignee: | Ken Dreyer (Red Hat) <kdreyer> | ||||||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||||||
Priority: | urgent | ||||||||||||||||||||
Version: | 4.16 | CC: | akandath, amakarau, bhubbard, bkunal, bniver, fweimer, gsitlani, jcaratza, kramdoss, kseeger, mcaldeir, mschaefe, muagarwa, nojha, odf-bz-bot, prsurve, rlaberin, rzarzyns, sapillai, sheggodu, sostapov, srai, tnielsen, tstober | ||||||||||||||||||
Target Milestone: | --- | Keywords: | TestBlocker | ||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||
Hardware: | s390x | ||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | |||||||||||||||||||||
: | 2297111 (view as bug list) | Environment: | |||||||||||||||||||
Last Closed: | 2024-09-30 06:41:03 UTC | Type: | Bug | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||
Bug Blocks: | 2297111, 2297113 | ||||||||||||||||||||
Attachments: |
|
Description
Maya Anilson
2024-05-07 11:27:35 UTC
I am seeing this issue in IBM Z platform. Reproduced the issue on 4.16.0-97 fresh installation of ODF. [root@m4204001 ~]# oc -n openshift-storage get pod NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-758fb9b869-tmz7f 2/2 Running 0 37m csi-cephfsplugin-5xh7f 2/2 Running 1 (9m33s ago) 10m csi-cephfsplugin-hshlx 2/2 Running 0 10m csi-cephfsplugin-provisioner-75874b57d9-4dxr7 6/6 Running 0 10m csi-cephfsplugin-provisioner-75874b57d9-ptk8q 6/6 Running 2 (9m19s ago) 10m csi-cephfsplugin-sjcc2 2/2 Running 1 (9m37s ago) 10m csi-rbdplugin-cmsmj 3/3 Running 1 (9m37s ago) 10m csi-rbdplugin-g9kf6 3/3 Running 0 10m csi-rbdplugin-provisioner-5d7bd5f786-pfn7q 6/6 Running 2 (9m24s ago) 10m csi-rbdplugin-provisioner-5d7bd5f786-pp8jh 6/6 Running 1 (9m34s ago) 10m csi-rbdplugin-tgxb8 3/3 Running 1 (9m33s ago) 10m noobaa-operator-8589d9d66d-57dv7 1/1 Running 0 39m ocs-client-operator-console-65776cbdb6-5hgxj 1/1 Running 0 39m ocs-client-operator-controller-manager-6ff7679cc7-gr5qb 2/2 Running 0 39m ocs-operator-7d787f6d56-plmlx 1/1 Running 0 39m odf-console-7d78c85cf8-qgqrn 1/1 Running 0 39m odf-operator-controller-manager-5c84f48f7b-zhvfm 2/2 Running 0 39m rook-ceph-mon-a-f9885955d-7hxvh 0/2 Init:CrashLoopBackOff 6 (3m21s ago) 9m11s rook-ceph-operator-bf7656f9c-mbml2 1/1 Running 0 38m ux-backend-server-86d577447c-4k8b8 2/2 Running 0 39m [root@m4204001 ~]# [root@m4204001 ~]# oc -n openshift-storage get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.16.0-97.stable NooBaa Operator 4.16.0-97.stable Succeeded ocs-client-operator.v4.16.0-97.stable OpenShift Data Foundation Client 4.16.0-97.stable Succeeded ocs-operator.v4.16.0-97.stable OpenShift Container Storage 4.16.0-97.stable Succeeded odf-csi-addons-operator.v4.16.0-97.stable CSI Addons 4.16.0-97.stable Succeeded odf-operator.v4.16.0-97.stable OpenShift Data Foundation 4.16.0-97.stable Succeeded odf-prometheus-operator.v4.16.0-97.stable Prometheus Operator 4.16.0-97.stable Succeeded recipe.v4.16.0-97.stable Recipe 4.16.0-97.stable Succeeded rook-ceph-operator.v4.16.0-97.stable Rook-Ceph 4.16.0-97.stable Succeeded [root@m4204001 ~]# Created attachment 2032493 [details]
rook-ceph-operator-log
Created attachment 2032494 [details]
ocs-operator-log
Created attachment 2033136 [details]
rook-ceph-operator-log
[root@m4204001 ~]# oc logs rook-ceph-mon-a-57745d545f-txldj Defaulted container "mon" out of: mon, log-collector, chown-container-data-dir (init), init-mon-fs (init) Error from server (BadRequest): container "mon" in pod "rook-ceph-mon-a-57745d545f-txldj" is waiting to start: PodInitializing [root@m4204001 ~]# oc logs rook-ceph-mon-a-57745d545f-txldj -c init-mon-fs -p unable to retrieve container logs for cri-o://e86b050ca568944fe6188157184e6e8835684aac1ba4b34348f00434a3c277b0 I have looked at the rook-ceph-mon-x deployments and tried to pull the image referred in their initContainers - these are the ones actually crashlooping. $ oc get pod -l app=rook-ceph-mon NAME READY STATUS RESTARTS AGE rook-ceph-mon-a-95d644fc5-dmw6z 0/2 Init:CrashLoopBackOff 27 (3m10s ago) 116m rook-ceph-mon-b-d74f79cf9-lb4x7 0/2 Init:CrashLoopBackOff 23 (2m37s ago) 95m rook-ceph-mon-c-5cd7cfc5d9-qpp2f 0/2 Init:CrashLoopBackOff 19 (2m42s ago) 75m $ oc describe pod/rook-ceph-mon-a-95d644fc5-dmw6z ... init-mon-fs: Container ID: cri-o://c48ce0fc8733f297c6232dbe01ee221bd167025f3722428435d403801a7855f1 Image: registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2 Image ID: registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:40cfef4cf12f20a20344ba101a60004f437ba3e862ddf9ca042ed1fef8a7be3e ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 117m default-scheduler Successfully assigned openshift-storage/rook-ceph-mon-a-95d644fc5-dmw6z to worker-2.odf-ci-2.test.ocs Normal AddedInterface 117m multus Add eth0 [10.128.2.22/23] from ovn-kubernetes Normal Pulled 117m kubelet Container image "registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2" already present on machine Normal Created 117m kubelet Created container chown-container-data-dir Normal Started 117m kubelet Started container chown-container-data-dir Normal Pulled 115m (x5 over 117m) kubelet Container image "registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2" already present on machine Normal Created 115m (x5 over 117m) kubelet Created container init-mon-fs Normal Started 115m (x5 over 117m) kubelet Started container init-mon-fs Warning BackOff 2m9s (x531 over 117m) kubelet Back-off restarting failed container init-mon-fs in pod rook-ceph-mon-a-95d644fc5-dmw6z_openshift-storage(f8626ff3-1fc7-4a57-ae16-68a9d8812589) Pulling registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2 gives me a manifest unknown: $ podman pull registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2 Trying to pull registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2... Error: initializing source docker://registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2: reading manifest sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2 in registry.redhat.io/rhceph/rhceph-7-rhel9: manifest unknown It cannot be an authentication issue on my side. Obviously the image reference is stale or erroneous: $ podman pull registry.redhat.io/rhceph/rhceph-7-rhel9:latest Trying to pull registry.redhat.io/rhceph/rhceph-7-rhel9:latest... Getting image source signatures Checking if image destination supports signatures Copying blob b5e915a48307 skipped: already exists Copying blob 9ef500aca087 skipped: already exists Copying config 4339069a48 done Writing manifest to image destination Storing signatures 4339069a488e6e962b369f26db774f36592c0c971cf4a383dad03233c6b0befc The mon pods are pulling the image successfully and starting up, but the init container "init-mon-fs" is failing. The question is why this init container is failing, as it is not returning its logs as Maya reported. We really need this log to show what the ceph init-mon-fs failure is. [root@m4204001 ~]# oc logs rook-ceph-mon-a-57745d545f-txldj -c init-mon-fs -p unable to retrieve container logs for cri-o://e86b050ca568944fe6188157184e6e8835684aac1ba4b34348f00434a3c277b0 Does it make a difference without the -p flag? # oc logs rook-ceph-mon-a-57745d545f-txldj -c init-mon-fs No output for [root@m4204001 ~]# oc logs rook-ceph-mon-a-57745d545f-c488g -c init-mon-fs [root@m4204001 ~]# [root@m4204001 ~]# oc exec -it rook-ceph-mon-a-58b4cd5b8b-w2tm4 -c init-mon-fs sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. sh-5.1# ceph-mon --fsid=543968ca-b0df-4cae-9dec-0ad47bcc8e58 --keyring=/etc/ceph/keyring-store/keyring --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true --default-log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false --mon-host=[v2:172.31.137.183:3300],[v2:172.31.202.37:3300],[v2:172.31.15.98:3300] --mon-initial-members=a,c,b --id=a --setuser=ceph --setgroup=ceph --public-addr=172.31.137.183 --mkfs Segmentation fault (core dumped) Had a debug session with @Subham Rai these are the outputs collected during that oc debug node/worker-1.m1301015.lnxero1.boe Starting pod/worker-1m1301015lnxero1boe-debug-85x8x ... To use host binaries, run `chroot /host` Pod IP: 172.23.232.81 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# chroot /host sh-5.1# crictl ps -a | grep mon 513ba3caa4083 6338836cc3a4f7734362a7bd20788f87c98d8fd610981d6622266020ff498770 About a minute ago Exited init-mon-fs 17 db72a799f0d41 rook-ceph-mon-c-7f8c67dd6c-fvr6w sh-5.1# crictl logs 513ba3caa4083 sh-5.1# sh-5.1# sh-5.1# crictl logs 513ba3caa4083 sh-5.1# sh-5.1# sh-5.1# crictl start 513ba3caa4083 E0522 10:55:00.507152 1309519 remote_runtime.go:343] "StartContainer from runtime service failed" err="rpc error: code = Unknown desc = container 513ba3caa4083808b02111ca9f1a408469bdfd7e981b92d0eb6b84fe5e513945 is not in created state: stopped" containerID="513ba3caa4083" FATA[0000] starting the container "513ba3caa4083": rpc error: code = Unknown desc = container 513ba3caa4083808b02111ca9f1a408469bdfd7e981b92d0eb6b84fe5e513945 is not in created state: stopped sh-5.1# cd coredump/ sh-5.1# ls core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1252306.1716371438000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1252367.1716371440000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1252671.1716371455000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1253159.1716371482000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1253796.1716371524000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1255256.1716371619000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1257763.1716371788000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1262272.1716372092000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1266858.1716372407000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1271352.1716372712000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1275908.1716373024000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1280446.1716373334000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1284893.1716373637000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1289307.1716373939000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1293815.1716374242000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1298285.1716374548000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1302719.1716374850000000.zst core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1307473.1716375163000000.zst [root@m1301015 ~]# oc exec rook-ceph-operator-747b8d84cc-bcqgm -it sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. sh-5.1$ sh-5.1$ ceph crash ls Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)') sh-5.1$ sh-5.1$ ceph crash ls --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --connect-timeout=5 timed out sh-5.1$ I am uploading the coredumps. Created attachment 2034584 [details]
coredump
Created attachment 2034585 [details]
coredump-1
Created attachment 2034586 [details]
coredump-2
This issue is a test blocker for IBM Z. Per Maya, all deployments run into this issue. They are unable to proceed with any testing on 4.16. Created attachment 2034587 [details]
coredump-3
Just to clarify this problem is only occurring on IBM Z platform and not seen on IBM P platform. Current update: This might be an issue from IBM Z side and is being reviewed by the team. No AI on ODF/Ceph engineering atm. Managed to deploy ODF 4.16.0-110.stable successfully after changing the ceph image to "quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" which is having rhel 9.3 and ceph version 18.2.1-136. But the default image used by ODF has rhel 9.4 and ceph version 18.2.1-188. Not sure what is causing the issue here, Is it possible to get latest ceph image build on rhel 9.3 ? simply executing ceph-mon, ceph-osd, ceph-mgr commands without any argument on this container image throws a segmentation fault. - registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:17e899c9c4f2f64bc7acea361446a64927b829d6766e6dde42f8d0336b9125a4 - quay.io/rhceph-dev/rhceph/rhceph-7-rhel9@sha256:17e899c9c4f2f64bc7acea361446a64927b829d6766e6dde42f8d0336b9125a4 [root@6ea84ace2311 /]# ceph-mon Segmentation fault (core dumped) [root@6ea84ace2311 /]# [root@6ea84ace2311 /]# ceph-osd Segmentation fault (core dumped) [root@6ea84ace2311 /]# [root@6ea84ace2311 /]# ceph-mgr Segmentation fault (core dumped) [root@6ea84ace2311 /]# [root@m1301015 ~]# podman run -it quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5 bash [root@82bfd8a29a24 /]# [root@82bfd8a29a24 /]# ceph -v ceph version 18.2.1-136.el9cp (e7edde2b655d0dd9f860dda675f9d7954f07e6e3) reef (stable) [root@82bfd8a29a24 /]# [root@82bfd8a29a24 /]# ceph-mon ceph-mon: -h or --help for usage [root@82bfd8a29a24 /]# [root@82bfd8a29a24 /]# cat /etc/os-release NAME="Red Hat Enterprise Linux" VERSION="9.3 (Plow)" ID="rhel" ID_LIKE="fedora" VERSION_ID="9.3" PLATFORM_ID="platform:el9" PRETTY_NAME="Red Hat Enterprise Linux 9.3 (Plow)" ANSI_COLOR="0;31" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos" HOME_URL="https://www.redhat.com/" DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9" REDHAT_BUGZILLA_PRODUCT_VERSION=9.3 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="9.3" [root@82bfd8a29a24 /]# [root@m1301015 ~]# podman run -it quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2 bash [root@fa200b32eb21 /]# [root@fa200b32eb21 /]# ceph -v ceph version 18.2.1-167.el9cp (e8c836edb24adb7717a6c8ba1e93a07e3efede29) reef (stable) [root@fa200b32eb21 /]# [root@fa200b32eb21 /]# ceph-mon Segmentation fault (core dumped) [root@fa200b32eb21 /]# [root@fa200b32eb21 /]# cat /etc/os-release NAME="Red Hat Enterprise Linux" VERSION="9.4 (Plow)" ID="rhel" ID_LIKE="fedora" VERSION_ID="9.4" PLATFORM_ID="platform:el9" PRETTY_NAME="Red Hat Enterprise Linux 9.4 (Plow)" ANSI_COLOR="0;31" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos" HOME_URL="https://www.redhat.com/" DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9" REDHAT_BUGZILLA_PRODUCT_VERSION=9.4 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="9.4" [root@fa200b32eb21 /]# the tcmalloc lib are identical in both the systems: quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5 79acb80d055cec93ba16ccdc3fde97bd /lib64/libtcmalloc.so.4.5.9 quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2 79acb80d055cec93ba16ccdc3fde97bd /lib64/libtcmalloc.so.4.5.9 GDB debug: sh-5.1# gdb /usr/bin/ceph-mon GNU gdb (GDB) Red Hat Enterprise Linux 10.2-13.el9 ... Reading symbols from /usr/bin/ceph-mon... Reading symbols from .gnu_debugdata for /usr/bin/ceph-mon... (No debugging symbols found in .gnu_debugdata for /usr/bin/ceph-mon) Missing separate debuginfos, use: dnf debuginfo-install ceph-mon-18.2.1-167.el9cp.s390x (gdb) r Starting program: /usr/bin/ceph-mon warning: Error disabling address space randomization: Function not implemented [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. _dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:728 728 dl-tls.c: No such file or directory. (gdb) bt #0 _dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:728 #1 0x000003ff9861315c in update_get_addr (ti=0x3ff979d0d10, gen=<optimized out>) at dl-tls.c:922 #2 0x000003ff979bab34 in tc_free () from /lib64/libtcmalloc.so.4 #3 0x000003ff98613072 in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50 #4 _dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:828 #5 0x000003ff9861315c in update_get_addr (ti=0x3ff979d0d10, gen=<optimized out>) at dl-tls.c:922 #6 0x000003ff979bab34 in tc_free () from /lib64/libtcmalloc.so.4 #7 0x000003ff98613072 in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50 #8 _dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:828 #9 0x000003ff9861315c in update_get_addr (ti=0x3ff979d0d10, gen=<optimized out>) at dl-tls.c:922 #10 0x000003ff979bab34 in tc_free () from /lib64/libtcmalloc.so.4 #11 0x000003ff98613072 in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50 #12 _dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:828 ... Here is the dmesg output for the system with failing ceph: [526151.580822] [<000003ff7fc13156>] __dm_stat_init_temporary_percpu_totals+0xe6/0x270 [dm_mod] [526462.504134] User process fault: interruption code 0011 ilc:3 in libtcmalloc.so.4.5.9[3ff9a400000+4e000] [526462.504148] Failing address: 000003ffd997a000 TEID: 000003ffd997a400 [526462.504150] Fault in primary space mode while using user ASCE. [526462.504152] AS:00000005512d81c7 R3:00000003ff500007 S:000000033bc2d800 P:0000000000000400 [526462.504158] CPU: 15 PID: 3553346 Comm: ceph-mon Not tainted 5.14.0-427.el9.s390x #1 [526462.504162] Hardware name: IBM 3906 M04 701 (z/VM 7.3.0) [526462.504164] User PSW : 0705200180000000 000003ff9a43ab00 [526462.504166] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3 [526462.504169] User GPRS: fffffffffffff001 000003ff9a43ab00 0000000000000000 0000000000000002 [526462.504171] 0000000000000002 000003ff9b0e5d00 000000000000000c 000003ff9b0e65e0 [526462.504172] 000003ff9b0e6690 000000000000000b 000003ff9b0e8320 000000000000000c [526462.504174] 000003ff9a44ff38 0000000000000000 000003ff9b093072 000003ffd997af88 [526462.504183] User Code: 000003ff9a43aafa: 0707 bcr 0,%r7 000003ff9a43aafc: 0707 bcr 0,%r7 #000003ff9a43aafe: 0707 bcr 0,%r7 >000003ff9a43ab00: eb9ff0480024 stmg %r9,%r15,72(%r15) 000003ff9a43ab06: c0c00000aa19 larl %r12,000003ff9a44ff38 000003ff9a43ab0c: e3f0ff60ff71 lay %r15,-160(%r15) 000003ff9a43ab12: c4180000b38f lgrl %r1,000003ff9a451230 000003ff9a43ab18: ec160058007c cgij %r1,0,6,000003ff9a43abc8 [526462.504201] Last Breaking-Event-Address: [526462.504202] [<000003ff9b093070>] 0x3ff9b093070 [526772.494593] User process fault: interruption code 0011 ilc:3 in libtcmalloc.so.4.5.9[3ffad400000+4e000] [526772.494611] Failing address: 000003ffd70fa000 TEID: 000003ffd70fa400 [526772.494612] Fault in primary space mode while using user ASCE. [526772.494615] AS:000000060b0b41c7 R3:000000055f1a4007 S:00000002539eb800 P:0000000000000400 [526772.494620] CPU: 4 PID: 3557863 Comm: ceph-mon Not tainted 5.14.0-427.el9.s390x #1 [526772.494622] Hardware name: IBM 3906 M04 701 (z/VM 7.3.0) [526772.494624] User PSW : 0705200180000000 000003ffad43ab00 [526772.494625] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3 [526772.494628] User GPRS: fffffffffffff001 000003ffad43ab00 0000000000000000 0000000000000002 [526772.494630] 0000000000000002 000003ffae0e5d00 000000000000000c 000003ffae0e65e0 [526772.494631] 000003ffae0e6690 000000000000000b 000003ffae0e8320 000000000000000c [526772.494633] 000003ffad44ff38 0000000000000000 000003ffae093072 000003ffd70faf68 [526772.494640] User Code: 000003ffad43aafa: 0707 bcr 0,%r7 000003ffad43aafc: 0707 bcr 0,%r7 #000003ffad43aafe: 0707 bcr 0,%r7 >000003ffad43ab00: eb9ff0480024 stmg %r9,%r15,72(%r15) 000003ffad43ab06: c0c00000aa19 larl %r12,000003ffad44ff38 000003ffad43ab0c: e3f0ff60ff71 lay %r15,-160(%r15) 000003ffad43ab12: c4180000b38f lgrl %r1,000003ffad451230 000003ffad43ab18: ec160058007c cgij %r1,0,6,000003ffad43abc8 [526772.494657] Last Breaking-Event-Address: [526772.494657] [<000003ffae093070>] 0x3ffae093070 adding a comments from a slack conversation to this: Radosław Zarzyński there is already a very clear indicator there is clash between tcmalloc (in upstream it's unsupported on s390, BTW) and the dynamic linker of libc Thomas Stober We see different behavior: ceph 18.2.1-136 and rhel 9.3. ceph 18.2.1-188 and rhel 9.4 and trying to find out what has changed. (The first works fine, the second doesnt) I tested this in an s390x VM. In the RHEL 9.3 container (cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5), when I update glibc, ceph-mon crashes. This happens for both ceph-mon-18.2.1-136.el9cp and ceph-mon-18.2.1-167.el9cp Works: glibc-2.34-83.el9_3.12.s390x Crashes: glibc-2.34-100.el9_4.2.s390x glibc-2.34-100.el9 also crashes. CentOS Stream has more granular glibc builds, so I was able to bisect this further. glibc-2.34-87.el9 (https://kojihub.stream.centos.org/koji/buildinfo?buildID=40721) is the first glibc build that causes Ceph to crash. The change is https://gitlab.com/redhat/centos-stream/rpms/glibc/-/commit/2ea2e4b80215f5f1eb5146d5cab677b4357780e0 Try reproducing under valgrind? $ valgrind --trace-children=yes --show-reachable=yes --track-origins=yes --read-var-info=yes --tool=memcheck --leak-check=full --num-callers=50 -v --log-file=leaky.log /usr/bin/ceph-mon Created attachment 2035781 [details]
valgrind leaky.log with glibc-2.34-100.el9_4.2
To generate this log, I started with the RHEL 9.3 container image (with ceph-18.2.1-136.el9cp.s390x), then updated glibc to a version that leads to the crash (glibc-2.34-100.el9_4.2). Interestingly ceph-mon did not crash when I run it under valgrind.
It's a bit of a stab in the dark, but I made a test build with a potential glibc-side workaround: https://people.redhat.com/~fweimer/dUJsQxSPin50/glibc-2.34-100.el9_4.2.0.0.testfix.1.RHEL36148/ You can update a test system using: yum update --nogpgcheck --repofrompath=test,https://people.redhat.com/~fweimer/dUJsQxSPin50/glibc-2.34-100.el9_4.2.0.0.testfix.1.RHEL36148/ There is an upstream discussion about the phenomenon: New TLS usage in libgcc_s.so.1, compatibility impact <https://inbox.sourceware.org/gcc/8734v1ieke.fsf@oldenburg.str.redhat.com/> The root cause is a malloc replacement that uses dynamic TLS for its thread-local data structures. For compatibility reasons, glibc uses malloc to allocate its internal dynamic TLS data structures. This circularity obviously leads to problems. The proper fix is to build malloc replacements with initial-exec (static) TLS. The patch in the test build avoids calling free in some cases. For the previously encountered issue, it was sufficient, but obviously it does not address the root cause. (In reply to Florian Weimer from comment #56) > It's a bit of a stab in the dark, but I made a test build with a potential > glibc-side workaround: > > https://people.redhat.com/~fweimer/dUJsQxSPin50/glibc-2.34-100.el9_4.2.0.0. > testfix.1.RHEL36148/ > > You can update a test system using: > > yum update --nogpgcheck > --repofrompath=test,https://people.redhat.com/~fweimer/dUJsQxSPin50/glibc-2. > 34-100.el9_4.2.0.0.testfix.1.RHEL36148/ > > There is an upstream discussion about the phenomenon: > > New TLS usage in libgcc_s.so.1, compatibility impact > > <https://inbox.sourceware.org/gcc/8734v1ieke.fsf@oldenburg.str.redhat.com/> > > The root cause is a malloc replacement that uses dynamic TLS for its > thread-local data structures. For compatibility reasons, glibc uses malloc > to allocate its internal dynamic TLS data structures. This circularity > obviously leads to problems. The proper fix is to build malloc replacements > with initial-exec (static) TLS. The patch in the test build avoids calling > free in some cases. For the previously encountered issue, it was sufficient, > but obviously it does not address the root cause. Thanks very much for the information Florian, much appreciated, and I will test this shortly. I'm wondering why we are only seeing this so far on z390x? Would you have any thoughts on that? I proposed the glibc-side workaround upstream: [PATCH] elf: Avoid some free (NULL) calls in _dl_update_slotinfo <https://inbox.sourceware.org/libc-alpha/87bk4i2yxn.fsf@oldenburg.str.redhat.com/> I'm not sure if upstream will accept this patch. For RHEL integration, it's probably best to file a separate RHEL ticket on issues.redhat.com. The ticket linked here (RHEL-39415) may turn out to be a duplicate, but it concerns a completely different piece of software with a different interposed malloc. I don't know how quickly the reporters for RHEL-39415 will be able to verify the workaround, so a separate ticket makes sense. In parallel, you really should start building tcmalloc with -ftls-model=initial-exec. The glibc-side workaround probably does not cover all eventualities (and some crashes might not even be new). We copy tcmalloc from epel9 (https://src.fedoraproject.org/rpms/gperftools/tree/epel9). What change do we need to make there? I wonder if Tom's gperftools-2.7.90-disable-generic-dynamic-tls.patch is relevant? (In reply to Ken Dreyer (Red Hat) from comment #63) > We copy tcmalloc from epel9 > (https://src.fedoraproject.org/rpms/gperftools/tree/epel9). What change do > we need to make there? I wonder if Tom's > gperftools-2.7.90-disable-generic-dynamic-tls.patch is relevant? It seems to go in the opposite direction of what's required. You really need to build tcmalloc in such a way that it only uses static TLS (so no references to symbols like __tls_get_addr, __tls_get_addr_opt, __tls_get_offset). Maybe Tom's patch is relevant to other parts of gperftools, those that are expected to be loaded by dlopen. (We encountered a similar mix in lttng.) But building tcmalloc with global-dynamic TLS does not make sense because it cannot be dlopen'ed anyway, so using initial-exec TLS would be complete fine. (In reply to Ken Dreyer (Red Hat) from comment #63) > We copy tcmalloc from epel9 > (https://src.fedoraproject.org/rpms/gperftools/tree/epel9). What change do > we need to make there? I wonder if Tom's > gperftools-2.7.90-disable-generic-dynamic-tls.patch is relevant? Hi Ken, did I understand it correctly, that you copied the source code of tcmalloc from epel9 and built it for CEPH? If this is the case, could you please build it again with the option -ftls-model=initial-exec ? Hi Team, Tom has build the updated tcmalloc package: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-4e05a33ad0 Does anybody know who is the contact for the: 1) RHEL 9.4 build? 2) UBI9 build? Thank you all!!! (In reply to Aliaksei Makarau (IBM) from comment #66) > Hi Team, > > Tom has build the updated tcmalloc package: > https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-4e05a33ad0 > > Does anybody know who is the contact for the: > > 1) RHEL 9.4 build? > 2) UBI9 build? The gperftools package is not part of RHEL/UBI. It's build as part of Ceph and maintained by the Ceph team. Ken Dreyer did the last build/import from Fedora. |