Bug 2279527

Summary:

[Tracker for bug https://bugzilla.redhat.com/show_bug.cgi?id=2297113] [IBM Z]rook-ceph-mgr and rook-ceph-mon pod are in CrashLoopBackOff state (dynamic linker and libtcmalloc recursive call loop)

Product:

[Red Hat Storage] Red Hat OpenShift Data Foundation

Reporter:

Maya Anilson <manilson>

Component:

unclassified

Assignee:

Ken Dreyer (Red Hat) <kdreyer>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Elad <ebenahar>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

4.16

CC:

akandath, amakarau, bhubbard, bkunal, bniver, fweimer, gsitlani, jcaratza, kramdoss, kseeger, mcaldeir, mschaefe, muagarwa, nojha, odf-bz-bot, prsurve, rlaberin, rzarzyns, sapillai, sheggodu, sostapov, srai, tnielsen, tstober

Target Milestone:

---

Keywords:

TestBlocker

Target Release:

---

Hardware:

s390x

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

2297111 (view as bug list)

Environment:

Last Closed:

2024-09-30 06:41:03 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

2297111, 2297113

Attachments:

Description	Flags
rook-ceph-operator-log	none
ocs-operator-log	none
rook-ceph-operator-log	none
coredump	none
coredump-1	none
coredump-2	none
coredump-3	none
valgrind leaky.log with glibc-2.34-100.el9_4.2	none

Description Maya Anilson 2024-05-07 11:27:35 UTC

Description of problem (please be detailed as possible and provide log
snippests):


rook-ceph-mgr-a-95fb79f8f-htnx6                                   2/3     CrashLoopBackOff        5 (41s ago)    3m48s
rook-ceph-mgr-b-67c98446fd-fxzhp                                  2/3     CrashLoopBackOff        5 (19s ago)    3m30s
rook-ceph-mon-a-56b6499f84-h2f4k                                  2/2     Running                 0              25h
rook-ceph-mon-b-76c5468d48-d8b2w                                  2/2     Running                 0              25h
rook-ceph-mon-e-6d47c4b7fb-tlvfl                                  0/2     Init:CrashLoopBackOff   3 (16s ago)    76s


Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  BackOff           8s (x7 over 100s)   kubelet            Back-off restarting failed container init-mon-fs in pod rook-ceph-mon-e-6d47c4b7fb-tlvfl_openshift-storage(973bc8c3-9360-4f75-b617-c03122b5c788)



Version of all relevant components (if applicable):
ODF 4.16.0-94

rook-ceph-operator.v4.16.0-94.stable        Rook-Ceph                          4.16.0-94.stable   rook-ceph-operator.v4.16.0-93.stable        Succeeded
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?

no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:

The pods should be running state


Additional info:

Comment 3 Maya Anilson 2024-05-07 13:42:00 UTC

I am seeing this issue in IBM Z platform.

Comment 7 Abdul Kandathil (IBM) 2024-05-10 09:05:34 UTC

Reproduced the issue on 4.16.0-97 fresh installation of ODF.

[root@m4204001 ~]# oc -n openshift-storage get pod
NAME                                                      READY   STATUS                  RESTARTS        AGE
csi-addons-controller-manager-758fb9b869-tmz7f            2/2     Running                 0               37m
csi-cephfsplugin-5xh7f                                    2/2     Running                 1 (9m33s ago)   10m
csi-cephfsplugin-hshlx                                    2/2     Running                 0               10m
csi-cephfsplugin-provisioner-75874b57d9-4dxr7             6/6     Running                 0               10m
csi-cephfsplugin-provisioner-75874b57d9-ptk8q             6/6     Running                 2 (9m19s ago)   10m
csi-cephfsplugin-sjcc2                                    2/2     Running                 1 (9m37s ago)   10m
csi-rbdplugin-cmsmj                                       3/3     Running                 1 (9m37s ago)   10m
csi-rbdplugin-g9kf6                                       3/3     Running                 0               10m
csi-rbdplugin-provisioner-5d7bd5f786-pfn7q                6/6     Running                 2 (9m24s ago)   10m
csi-rbdplugin-provisioner-5d7bd5f786-pp8jh                6/6     Running                 1 (9m34s ago)   10m
csi-rbdplugin-tgxb8                                       3/3     Running                 1 (9m33s ago)   10m
noobaa-operator-8589d9d66d-57dv7                          1/1     Running                 0               39m
ocs-client-operator-console-65776cbdb6-5hgxj              1/1     Running                 0               39m
ocs-client-operator-controller-manager-6ff7679cc7-gr5qb   2/2     Running                 0               39m
ocs-operator-7d787f6d56-plmlx                             1/1     Running                 0               39m
odf-console-7d78c85cf8-qgqrn                              1/1     Running                 0               39m
odf-operator-controller-manager-5c84f48f7b-zhvfm          2/2     Running                 0               39m
rook-ceph-mon-a-f9885955d-7hxvh                           0/2     Init:CrashLoopBackOff   6 (3m21s ago)   9m11s
rook-ceph-operator-bf7656f9c-mbml2                        1/1     Running                 0               38m
ux-backend-server-86d577447c-4k8b8                        2/2     Running                 0               39m
[root@m4204001 ~]#


[root@m4204001 ~]# oc -n openshift-storage get csv
NAME                                        DISPLAY                            VERSION            REPLACES   PHASE
mcg-operator.v4.16.0-97.stable              NooBaa Operator                    4.16.0-97.stable              Succeeded
ocs-client-operator.v4.16.0-97.stable       OpenShift Data Foundation Client   4.16.0-97.stable              Succeeded
ocs-operator.v4.16.0-97.stable              OpenShift Container Storage        4.16.0-97.stable              Succeeded
odf-csi-addons-operator.v4.16.0-97.stable   CSI Addons                         4.16.0-97.stable              Succeeded
odf-operator.v4.16.0-97.stable              OpenShift Data Foundation          4.16.0-97.stable              Succeeded
odf-prometheus-operator.v4.16.0-97.stable   Prometheus Operator                4.16.0-97.stable              Succeeded
recipe.v4.16.0-97.stable                    Recipe                             4.16.0-97.stable              Succeeded
rook-ceph-operator.v4.16.0-97.stable        Rook-Ceph                          4.16.0-97.stable              Succeeded
[root@m4204001 ~]#

Comment 8 Maya Anilson 2024-05-10 11:19:39 UTC

Created attachment 2032493 [details]
rook-ceph-operator-log

Comment 9 Maya Anilson 2024-05-10 11:21:04 UTC

Created attachment 2032494 [details]
ocs-operator-log

Comment 12 Maya Anilson 2024-05-14 09:53:57 UTC

Created attachment 2033136 [details]
rook-ceph-operator-log

Comment 18 Maya Anilson 2024-05-16 12:25:20 UTC

[root@m4204001 ~]# oc logs rook-ceph-mon-a-57745d545f-txldj 
Defaulted container "mon" out of: mon, log-collector, chown-container-data-dir (init), init-mon-fs (init)
Error from server (BadRequest): container "mon" in pod "rook-ceph-mon-a-57745d545f-txldj" is waiting to start: PodInitializing

[root@m4204001 ~]# oc logs rook-ceph-mon-a-57745d545f-txldj  -c init-mon-fs -p
unable to retrieve container logs for cri-o://e86b050ca568944fe6188157184e6e8835684aac1ba4b34348f00434a3c277b0

Comment 19 Michael Schaefer 2024-05-16 14:03:20 UTC

I have looked at the rook-ceph-mon-x deployments and tried to pull the image referred in their initContainers - these are the ones actually crashlooping.

$ oc get pod -l app=rook-ceph-mon
NAME                               READY   STATUS                  RESTARTS         AGE
rook-ceph-mon-a-95d644fc5-dmw6z    0/2     Init:CrashLoopBackOff   27 (3m10s ago)   116m
rook-ceph-mon-b-d74f79cf9-lb4x7    0/2     Init:CrashLoopBackOff   23 (2m37s ago)   95m
rook-ceph-mon-c-5cd7cfc5d9-qpp2f   0/2     Init:CrashLoopBackOff   19 (2m42s ago)   75m

$ oc describe pod/rook-ceph-mon-a-95d644fc5-dmw6z
...
  init-mon-fs:
    Container ID:  cri-o://c48ce0fc8733f297c6232dbe01ee221bd167025f3722428435d403801a7855f1
    Image:         registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2
    Image ID:      registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:40cfef4cf12f20a20344ba101a60004f437ba3e862ddf9ca042ed1fef8a7be3e
...
Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       117m                   default-scheduler  Successfully assigned openshift-storage/rook-ceph-mon-a-95d644fc5-dmw6z to worker-2.odf-ci-2.test.ocs
  Normal   AddedInterface  117m                   multus             Add eth0 [10.128.2.22/23] from ovn-kubernetes
  Normal   Pulled          117m                   kubelet            Container image "registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2" already present on machine
  Normal   Created         117m                   kubelet            Created container chown-container-data-dir
  Normal   Started         117m                   kubelet            Started container chown-container-data-dir
  Normal   Pulled          115m (x5 over 117m)    kubelet            Container image "registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2" already present on machine
  Normal   Created         115m (x5 over 117m)    kubelet            Created container init-mon-fs
  Normal   Started         115m (x5 over 117m)    kubelet            Started container init-mon-fs
  Warning  BackOff         2m9s (x531 over 117m)  kubelet            Back-off restarting failed container init-mon-fs in pod rook-ceph-mon-a-95d644fc5-dmw6z_openshift-storage(f8626ff3-1fc7-4a57-ae16-68a9d8812589)

Pulling registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2 gives me a manifest unknown:

$ podman pull registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2
Trying to pull registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2...
Error: initializing source docker://registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2: reading manifest sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2 in registry.redhat.io/rhceph/rhceph-7-rhel9: manifest unknown  

It cannot be an authentication issue on my side. Obviously the image reference is stale or erroneous:

$ podman pull registry.redhat.io/rhceph/rhceph-7-rhel9:latest
Trying to pull registry.redhat.io/rhceph/rhceph-7-rhel9:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob b5e915a48307 skipped: already exists  
Copying blob 9ef500aca087 skipped: already exists  
Copying config 4339069a48 done  
Writing manifest to image destination
Storing signatures
4339069a488e6e962b369f26db774f36592c0c971cf4a383dad03233c6b0befc

Comment 20 Travis Nielsen 2024-05-16 19:10:41 UTC

The mon pods are pulling the image successfully and starting up, but the init container "init-mon-fs" is failing.

The question is why this init container is failing, as it is not returning its logs as Maya reported.
We really need this log to show what the ceph init-mon-fs failure is.

[root@m4204001 ~]# oc logs rook-ceph-mon-a-57745d545f-txldj  -c init-mon-fs -p
unable to retrieve container logs for cri-o://e86b050ca568944fe6188157184e6e8835684aac1ba4b34348f00434a3c277b0

Does it make a difference without the -p flag?

# oc logs rook-ceph-mon-a-57745d545f-txldj -c init-mon-fs

Comment 21 Maya Anilson 2024-05-17 04:22:59 UTC

No output for 

[root@m4204001 ~]# oc logs rook-ceph-mon-a-57745d545f-c488g -c init-mon-fs
[root@m4204001 ~]#

Comment 22 Maya Anilson 2024-05-17 11:01:14 UTC

[root@m4204001 ~]# oc exec -it rook-ceph-mon-a-58b4cd5b8b-w2tm4 -c init-mon-fs sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
sh-5.1# ceph-mon   --fsid=543968ca-b0df-4cae-9dec-0ad47bcc8e58 --keyring=/etc/ceph/keyring-store/keyring --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true  --default-log-stderr-prefix=debug   --default-log-to-file=false --default-mon-cluster-log-to-file=false --mon-host=[v2:172.31.137.183:3300],[v2:172.31.202.37:3300],[v2:172.31.15.98:3300] --mon-initial-members=a,c,b  --id=a   --setuser=ceph  --setgroup=ceph   --public-addr=172.31.137.183  --mkfs
Segmentation fault (core dumped)

Comment 23 Maya Anilson 2024-05-22 11:37:52 UTC

Had a debug session with @Subham Rai

these are the outputs collected during that

 oc debug node/worker-1.m1301015.lnxero1.boe
Starting pod/worker-1m1301015lnxero1boe-debug-85x8x ...
To use host binaries, run `chroot /host`

Pod IP: 172.23.232.81
If you don't see a command prompt, try pressing enter.
sh-4.4# 
sh-4.4# chroot /host
sh-5.1# crictl ps -a | grep mon
513ba3caa4083       6338836cc3a4f7734362a7bd20788f87c98d8fd610981d6622266020ff498770                                                                            About a minute ago   Exited              init-mon-fs                             17                  db72a799f0d41       rook-ceph-mon-c-7f8c67dd6c-fvr6w
sh-5.1# crictl logs 513ba3caa4083 
sh-5.1# 
sh-5.1# 
sh-5.1# crictl logs 513ba3caa4083
sh-5.1# 
sh-5.1# 
sh-5.1# crictl start 513ba3caa4083
E0522 10:55:00.507152 1309519 remote_runtime.go:343] "StartContainer from runtime service failed" err="rpc error: code = Unknown desc = container 513ba3caa4083808b02111ca9f1a408469bdfd7e981b92d0eb6b84fe5e513945 is not in created state: stopped" containerID="513ba3caa4083"
FATA[0000] starting the container "513ba3caa4083": rpc error: code = Unknown desc = container 513ba3caa4083808b02111ca9f1a408469bdfd7e981b92d0eb6b84fe5e513945 is not in created state: stopped 
sh-5.1# cd coredump/
sh-5.1# ls
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1252306.1716371438000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1252367.1716371440000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1252671.1716371455000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1253159.1716371482000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1253796.1716371524000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1255256.1716371619000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1257763.1716371788000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1262272.1716372092000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1266858.1716372407000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1271352.1716372712000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1275908.1716373024000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1280446.1716373334000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1284893.1716373637000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1289307.1716373939000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1293815.1716374242000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1298285.1716374548000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1302719.1716374850000000.zst
core.ceph-mon.0.4ebde0cced644250a4d40759a64e11b3.1307473.1716375163000000.zst

Comment 24 Maya Anilson 2024-05-22 11:39:05 UTC

[root@m1301015 ~]# oc exec rook-ceph-operator-747b8d84cc-bcqgm -it sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
sh-5.1$ 
sh-5.1$ ceph crash ls
Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')
sh-5.1$ 
sh-5.1$ ceph crash ls --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring  --connect-timeout=5 
timed out
sh-5.1$ 


I am uploading the coredumps.

Comment 25 Maya Anilson 2024-05-22 11:40:21 UTC

Created attachment 2034584 [details]
coredump

Comment 26 Maya Anilson 2024-05-22 12:00:19 UTC

Created attachment 2034585 [details]
coredump-1

Comment 27 Maya Anilson 2024-05-22 12:02:44 UTC

Created attachment 2034586 [details]
coredump-2

Comment 28 krishnaram Karthick 2024-05-22 12:04:19 UTC

This issue is a test blocker for IBM Z. Per Maya, all deployments run into this issue.
They are unable to proceed with any testing on 4.16.

Comment 29 Maya Anilson 2024-05-22 12:05:43 UTC

Created attachment 2034587 [details]
coredump-3

Comment 35 Maya Anilson 2024-05-28 08:23:20 UTC

Just to clarify this problem is only occurring on IBM Z platform and not seen on IBM P platform.

Comment 36 Mudit Agarwal 2024-05-28 14:10:42 UTC

Current update:
This might be an issue from IBM Z side and is being reviewed by the team. No AI on ODF/Ceph engineering atm.

Comment 41 Abdul Kandathil (IBM) 2024-05-29 08:23:46 UTC

Managed to deploy ODF 4.16.0-110.stable successfully after changing the ceph image to "quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" which is having rhel 9.3 and ceph version 18.2.1-136.

But the default image used by ODF has rhel 9.4 and ceph version 18.2.1-188.

Not sure what is causing the issue here, Is it possible to get latest ceph image build on rhel 9.3 ?

Comment 42 Abdul Kandathil (IBM) 2024-05-29 12:22:46 UTC

simply executing ceph-mon, ceph-osd, ceph-mgr commands without any argument on this container image throws a segmentation fault.

- registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:17e899c9c4f2f64bc7acea361446a64927b829d6766e6dde42f8d0336b9125a4
- quay.io/rhceph-dev/rhceph/rhceph-7-rhel9@sha256:17e899c9c4f2f64bc7acea361446a64927b829d6766e6dde42f8d0336b9125a4

[root@6ea84ace2311 /]# ceph-mon
Segmentation fault (core dumped)
[root@6ea84ace2311 /]#
[root@6ea84ace2311 /]# ceph-osd
Segmentation fault (core dumped)
[root@6ea84ace2311 /]#
[root@6ea84ace2311 /]# ceph-mgr
Segmentation fault (core dumped)
[root@6ea84ace2311 /]#

Comment 43 Aliaksei Makarau (IBM) 2024-05-29 14:03:24 UTC

[root@m1301015 ~]# podman run -it quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5 bash
[root@82bfd8a29a24 /]#
[root@82bfd8a29a24 /]# ceph -v
ceph version 18.2.1-136.el9cp (e7edde2b655d0dd9f860dda675f9d7954f07e6e3) reef (stable)
[root@82bfd8a29a24 /]#
[root@82bfd8a29a24 /]# ceph-mon
ceph-mon: -h or --help for usage
[root@82bfd8a29a24 /]#
[root@82bfd8a29a24 /]# cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.3 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.3 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"
[root@82bfd8a29a24 /]#



[root@m1301015 ~]# podman run -it quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2 bash
[root@fa200b32eb21 /]#
[root@fa200b32eb21 /]# ceph -v
ceph version 18.2.1-167.el9cp (e8c836edb24adb7717a6c8ba1e93a07e3efede29) reef (stable)
[root@fa200b32eb21 /]#
[root@fa200b32eb21 /]# ceph-mon
Segmentation fault (core dumped)
[root@fa200b32eb21 /]#
[root@fa200b32eb21 /]# cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.4 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.4 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.4
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"
[root@fa200b32eb21 /]#

Comment 44 Aliaksei Makarau (IBM) 2024-05-29 14:04:48 UTC

the tcmalloc lib are identical in both the systems:

quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5
79acb80d055cec93ba16ccdc3fde97bd  /lib64/libtcmalloc.so.4.5.9


quay.io/rhceph-dev/rhceph/rhceph-7-rhel9:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2
79acb80d055cec93ba16ccdc3fde97bd  /lib64/libtcmalloc.so.4.5.9

Comment 45 Aliaksei Makarau (IBM) 2024-05-29 14:06:40 UTC

GDB debug:

sh-5.1# gdb /usr/bin/ceph-mon
GNU gdb (GDB) Red Hat Enterprise Linux 10.2-13.el9
...
Reading symbols from /usr/bin/ceph-mon...
Reading symbols from .gnu_debugdata for /usr/bin/ceph-mon...
(No debugging symbols found in .gnu_debugdata for /usr/bin/ceph-mon)
Missing separate debuginfos, use: dnf debuginfo-install ceph-mon-18.2.1-167.el9cp.s390x
(gdb) r
Starting program: /usr/bin/ceph-mon 
warning: Error disabling address space randomization: Function not implemented
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
_dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:728
728	dl-tls.c: No such file or directory.
(gdb) bt
#0  _dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:728
#1  0x000003ff9861315c in update_get_addr (ti=0x3ff979d0d10, gen=<optimized out>) at dl-tls.c:922
#2  0x000003ff979bab34 in tc_free () from /lib64/libtcmalloc.so.4
#3  0x000003ff98613072 in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50
#4  _dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:828
#5  0x000003ff9861315c in update_get_addr (ti=0x3ff979d0d10, gen=<optimized out>) at dl-tls.c:922
#6  0x000003ff979bab34 in tc_free () from /lib64/libtcmalloc.so.4
#7  0x000003ff98613072 in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50
#8  _dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:828
#9  0x000003ff9861315c in update_get_addr (ti=0x3ff979d0d10, gen=<optimized out>) at dl-tls.c:922
#10 0x000003ff979bab34 in tc_free () from /lib64/libtcmalloc.so.4
#11 0x000003ff98613072 in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50
#12 _dl_update_slotinfo (req_modid=3, new_gen=2) at dl-tls.c:828
...

Comment 46 Aliaksei Makarau (IBM) 2024-05-29 14:08:20 UTC

Here is the dmesg output for the system with failing ceph:

[526151.580822]  [<000003ff7fc13156>] __dm_stat_init_temporary_percpu_totals+0xe6/0x270 [dm_mod]
[526462.504134] User process fault: interruption code 0011 ilc:3 in libtcmalloc.so.4.5.9[3ff9a400000+4e000]
[526462.504148] Failing address: 000003ffd997a000 TEID: 000003ffd997a400
[526462.504150] Fault in primary space mode while using user ASCE.
[526462.504152] AS:00000005512d81c7 R3:00000003ff500007 S:000000033bc2d800 P:0000000000000400
[526462.504158] CPU: 15 PID: 3553346 Comm: ceph-mon Not tainted 5.14.0-427.el9.s390x #1
[526462.504162] Hardware name: IBM 3906 M04 701 (z/VM 7.3.0)
[526462.504164] User PSW : 0705200180000000 000003ff9a43ab00
[526462.504166]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
[526462.504169] User GPRS: fffffffffffff001 000003ff9a43ab00 0000000000000000 0000000000000002
[526462.504171]            0000000000000002 000003ff9b0e5d00 000000000000000c 000003ff9b0e65e0
[526462.504172]            000003ff9b0e6690 000000000000000b 000003ff9b0e8320 000000000000000c
[526462.504174]            000003ff9a44ff38 0000000000000000 000003ff9b093072 000003ffd997af88
[526462.504183] User Code: 000003ff9a43aafa: 0707		bcr	0,%r7
                           000003ff9a43aafc: 0707		bcr	0,%r7
                          #000003ff9a43aafe: 0707		bcr	0,%r7
                          >000003ff9a43ab00: eb9ff0480024	stmg	%r9,%r15,72(%r15)
                           000003ff9a43ab06: c0c00000aa19	larl	%r12,000003ff9a44ff38
                           000003ff9a43ab0c: e3f0ff60ff71	lay	%r15,-160(%r15)
                           000003ff9a43ab12: c4180000b38f	lgrl	%r1,000003ff9a451230
                           000003ff9a43ab18: ec160058007c	cgij	%r1,0,6,000003ff9a43abc8
[526462.504201] Last Breaking-Event-Address:
[526462.504202]  [<000003ff9b093070>] 0x3ff9b093070
[526772.494593] User process fault: interruption code 0011 ilc:3 in libtcmalloc.so.4.5.9[3ffad400000+4e000]
[526772.494611] Failing address: 000003ffd70fa000 TEID: 000003ffd70fa400
[526772.494612] Fault in primary space mode while using user ASCE.
[526772.494615] AS:000000060b0b41c7 R3:000000055f1a4007 S:00000002539eb800 P:0000000000000400
[526772.494620] CPU: 4 PID: 3557863 Comm: ceph-mon Not tainted 5.14.0-427.el9.s390x #1
[526772.494622] Hardware name: IBM 3906 M04 701 (z/VM 7.3.0)
[526772.494624] User PSW : 0705200180000000 000003ffad43ab00
[526772.494625]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
[526772.494628] User GPRS: fffffffffffff001 000003ffad43ab00 0000000000000000 0000000000000002
[526772.494630]            0000000000000002 000003ffae0e5d00 000000000000000c 000003ffae0e65e0
[526772.494631]            000003ffae0e6690 000000000000000b 000003ffae0e8320 000000000000000c
[526772.494633]            000003ffad44ff38 0000000000000000 000003ffae093072 000003ffd70faf68
[526772.494640] User Code: 000003ffad43aafa: 0707		bcr	0,%r7
                           000003ffad43aafc: 0707		bcr	0,%r7
                          #000003ffad43aafe: 0707		bcr	0,%r7
                          >000003ffad43ab00: eb9ff0480024	stmg	%r9,%r15,72(%r15)
                           000003ffad43ab06: c0c00000aa19	larl	%r12,000003ffad44ff38
                           000003ffad43ab0c: e3f0ff60ff71	lay	%r15,-160(%r15)
                           000003ffad43ab12: c4180000b38f	lgrl	%r1,000003ffad451230
                           000003ffad43ab18: ec160058007c	cgij	%r1,0,6,000003ffad43abc8
[526772.494657] Last Breaking-Event-Address:
[526772.494657]  [<000003ffae093070>] 0x3ffae093070

Comment 47 tstober 2024-05-29 14:44:48 UTC

adding a comments from a slack conversation to this:

Radosław Zarzyński
there is already a very clear indicator there is clash between tcmalloc (in upstream it's unsupported on s390, BTW) and the dynamic linker of libc


Thomas Stober
We see different behavior:
ceph 18.2.1-136 and rhel 9.3.
ceph 18.2.1-188 and rhel 9.4
and trying to find out what has changed.
(The first works fine, the second doesnt)

Comment 49 Ken Dreyer (Red Hat) 2024-05-29 22:39:59 UTC

I tested this in an s390x VM. In the RHEL 9.3 container (cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5), when I update glibc, ceph-mon crashes. This happens for both ceph-mon-18.2.1-136.el9cp and ceph-mon-18.2.1-167.el9cp

Works:   glibc-2.34-83.el9_3.12.s390x
Crashes: glibc-2.34-100.el9_4.2.s390x

Comment 50 Ken Dreyer (Red Hat) 2024-05-29 23:02:59 UTC

glibc-2.34-100.el9 also crashes.

Comment 51 Ken Dreyer (Red Hat) 2024-05-29 23:20:09 UTC

CentOS Stream has more granular glibc builds, so I was able to bisect this further. glibc-2.34-87.el9 (https://kojihub.stream.centos.org/koji/buildinfo?buildID=40721) is the first glibc build that causes Ceph to crash. The change is https://gitlab.com/redhat/centos-stream/rpms/glibc/-/commit/2ea2e4b80215f5f1eb5146d5cab677b4357780e0

Comment 52 Brad Hubbard 2024-05-30 01:53:22 UTC

Try reproducing under valgrind?

$ valgrind --trace-children=yes --show-reachable=yes --track-origins=yes --read-var-info=yes --tool=memcheck --leak-check=full --num-callers=50 -v --log-file=leaky.log /usr/bin/ceph-mon

Comment 53 Ken Dreyer (Red Hat) 2024-05-30 20:19:34 UTC

Created attachment 2035781 [details]
valgrind leaky.log with glibc-2.34-100.el9_4.2

To generate this log, I started with the RHEL 9.3 container image (with ceph-18.2.1-136.el9cp.s390x), then updated glibc to a version that leads to the crash (glibc-2.34-100.el9_4.2). Interestingly ceph-mon did not crash when I run it under valgrind.

Comment 56 Florian Weimer 2024-05-31 17:08:20 UTC

It's a bit of a stab in the dark, but I made a test build with a potential glibc-side workaround:

https://people.redhat.com/~fweimer/dUJsQxSPin50/glibc-2.34-100.el9_4.2.0.0.testfix.1.RHEL36148/

You can update a test system using:

yum update --nogpgcheck --repofrompath=test,https://people.redhat.com/~fweimer/dUJsQxSPin50/glibc-2.34-100.el9_4.2.0.0.testfix.1.RHEL36148/

There is an upstream discussion about the phenomenon:

    New TLS usage in libgcc_s.so.1, compatibility impact
    <https://inbox.sourceware.org/gcc/8734v1ieke.fsf@oldenburg.str.redhat.com/>

The root cause is a malloc replacement that uses dynamic TLS for its thread-local data structures. For compatibility reasons, glibc uses malloc to allocate its internal dynamic TLS data structures. This circularity obviously leads to problems. The proper fix is to build malloc replacements with initial-exec (static) TLS. The patch in the test build avoids calling free in some cases. For the previously encountered issue, it was sufficient, but obviously it does not address the root cause.

Comment 57 Brad Hubbard 2024-05-31 21:47:57 UTC

(In reply to Florian Weimer from comment #56)
> It's a bit of a stab in the dark, but I made a test build with a potential
> glibc-side workaround:
> 
> https://people.redhat.com/~fweimer/dUJsQxSPin50/glibc-2.34-100.el9_4.2.0.0.
> testfix.1.RHEL36148/
> 
> You can update a test system using:
> 
> yum update --nogpgcheck
> --repofrompath=test,https://people.redhat.com/~fweimer/dUJsQxSPin50/glibc-2.
> 34-100.el9_4.2.0.0.testfix.1.RHEL36148/
> 
> There is an upstream discussion about the phenomenon:
> 
>     New TLS usage in libgcc_s.so.1, compatibility impact
>    
> <https://inbox.sourceware.org/gcc/8734v1ieke.fsf@oldenburg.str.redhat.com/>
> 
> The root cause is a malloc replacement that uses dynamic TLS for its
> thread-local data structures. For compatibility reasons, glibc uses malloc
> to allocate its internal dynamic TLS data structures. This circularity
> obviously leads to problems. The proper fix is to build malloc replacements
> with initial-exec (static) TLS. The patch in the test build avoids calling
> free in some cases. For the previously encountered issue, it was sufficient,
> but obviously it does not address the root cause.

Thanks very much for the information Florian, much appreciated, and I will test
this shortly.

I'm wondering why we are only seeing this so far on z390x? Would you have any
thoughts on that?

Comment 61 Florian Weimer 2024-06-03 08:54:43 UTC

I proposed the glibc-side workaround upstream:

  [PATCH] elf: Avoid some free (NULL) calls in _dl_update_slotinfo
  <https://inbox.sourceware.org/libc-alpha/87bk4i2yxn.fsf@oldenburg.str.redhat.com/>

I'm not sure if upstream will accept this patch.

For RHEL integration, it's probably best to file a separate RHEL ticket on issues.redhat.com. The ticket linked here (RHEL-39415) may turn out to be a duplicate, but it concerns a completely different piece of software with a different interposed malloc. I don't know how quickly the reporters for RHEL-39415 will be able to verify the workaround, so a separate ticket makes sense.

In parallel, you really should start building tcmalloc with -ftls-model=initial-exec. The glibc-side workaround probably does not cover all eventualities (and some crashes might not even be new).

Comment 63 Ken Dreyer (Red Hat) 2024-06-03 14:56:01 UTC

We copy tcmalloc from epel9 (https://src.fedoraproject.org/rpms/gperftools/tree/epel9). What change do we need to make there? I wonder if Tom's gperftools-2.7.90-disable-generic-dynamic-tls.patch is relevant?

Comment 64 Florian Weimer 2024-06-03 15:33:58 UTC

(In reply to Ken Dreyer (Red Hat) from comment #63)
> We copy tcmalloc from epel9
> (https://src.fedoraproject.org/rpms/gperftools/tree/epel9). What change do
> we need to make there? I wonder if Tom's
> gperftools-2.7.90-disable-generic-dynamic-tls.patch is relevant?

It seems to go in the opposite direction of what's required. You really need to build tcmalloc in such a way that it only uses static TLS (so no references to symbols like __tls_get_addr, __tls_get_addr_opt, __tls_get_offset). Maybe Tom's patch is relevant to other parts of gperftools, those that are expected to be loaded by dlopen. (We encountered a similar mix in lttng.) But building tcmalloc with global-dynamic TLS does not make sense because it cannot be dlopen'ed anyway, so using initial-exec TLS would be complete fine.

Comment 65 saliu 2024-06-05 11:23:17 UTC

(In reply to Ken Dreyer (Red Hat) from comment #63)
> We copy tcmalloc from epel9
> (https://src.fedoraproject.org/rpms/gperftools/tree/epel9). What change do
> we need to make there? I wonder if Tom's
> gperftools-2.7.90-disable-generic-dynamic-tls.patch is relevant?

Hi Ken, did I understand it correctly, that you copied the source code of tcmalloc from epel9 and built it for CEPH? If this is the case, could you please build it again with the option -ftls-model=initial-exec ?

Comment 66 Aliaksei Makarau (IBM) 2024-06-10 07:03:33 UTC

Hi Team,

Tom has build the updated tcmalloc package: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-4e05a33ad0

Does anybody know who is the contact for the:

1) RHEL 9.4 build?
2) UBI9 build?

Thank you all!!!

Comment 67 Florian Weimer 2024-06-10 07:24:44 UTC

(In reply to Aliaksei Makarau (IBM) from comment #66)
> Hi Team,
> 
> Tom has build the updated tcmalloc package:
> https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-4e05a33ad0
> 
> Does anybody know who is the contact for the:
> 
> 1) RHEL 9.4 build?
> 2) UBI9 build?


The gperftools package is not part of RHEL/UBI. It's build as part of Ceph and maintained by the Ceph team. Ken Dreyer did the last build/import from Fedora.