Bug 2007566 - [IBM Z] ceph osd heap profiler fails with "not using tcmalloc" error
Summary: [IBM Z] ceph osd heap profiler fails with "not using tcmalloc" error
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.9
Hardware: s390x
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Scott Ostapovicz
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-24 09:41 UTC by Abdul Kandathil (IBM)
Modified: 2023-08-09 16:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-14 02:33:16 UTC
Embargoed:


Attachments (Terms of Use)

Description Abdul Kandathil (IBM) 2021-09-24 09:41:28 UTC
Description of problem (please be detailed as possible and provide log
snippets):
ocs-ci test "tests/manage/z_cluster/test_osd_heap_profile.py::TestOSDHeapProfile::test_osd_heap_profile" fails with with "could not issue heap profiler command -- not using tcmalloc!"

Error:

E           ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage rsh rook-ceph-tools-65f5c5798c-zm4t7 ceph tell osd.0 heap start_profiler.
E           Error is Error ENOTSUP: could not issue heap profiler command -- not using tcmalloc!
E           command terminated with exit code 95


Version of all relevant components (if applicable):
ocs 4.9 (tested with 4.9.0-154.ci)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
yes

Can this issue reproduce from the UI?
no

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install ocp cluster
2. Deploy ODF along with LSO
3. executed the ocs-ci test or issue command "oc -n openshift-storage rsh rook-ceph-tools-65f5c5798c-zm4t7 ceph tell osd.2 heap start_profiler"


Actual results:

```
E           ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage rsh rook-ceph-tools-65f5c5798c-zm4t7 ceph tell osd.2 heap start_profiler.
E           Error is Error ENOTSUP: could not issue heap profiler command -- not using tcmalloc!
E           command terminated with exit code 95
```

Expected results:

command executes without any errors.

Additional info:
Test & must-gather logs : https://drive.google.com/file/d/1Ya-vU3cPD9hnNfBdw71TNSUH_ZzOFIRX/view?usp=sharing

Comment 3 Scott Ostapovicz 2021-09-28 13:48:54 UTC
This looks like a CI build problem to me.

Comment 4 Mudit Agarwal 2021-10-06 08:21:36 UTC
Petr, can you please take a look if this is a ci issue?

Comment 5 Petr Balogh 2021-10-06 11:12:29 UTC
Hello Abdul, is this issue consistently reproducible? 

Is this error coming from `oc rhs` command itself or this is really returned output from:

ceph tell osd.2 heap start_profiler

From toolbox pod?


If it's constantly reproducible, can you please just RSH to toolbox pod and try to run command locally there in the pod?


This is the first time I see such error, so not sure what can be problem, but if this is the output coming from the command (ceph tell osd.2 heap start_profiler) itself, it doesn't look like issue in OCS-CI if this is the valid command.

If it's returned from oc command, then it can be some OCP issue to run RSH command on pod. Which can be temporary glitch or bug, not sure.

Comment 6 Brad Hubbard 2021-10-11 23:00:26 UTC
I'd suggest you check whether the 'z' build disables tcmalloc. If so this error is totally expected.

https://github.com/ceph/ceph/blob/29bda6fd2aabcb37cf1c46a6edddf004d28bb164/src/osd/OSD.cc#L11509-L11513

Comment 7 Abdul Kandathil (IBM) 2021-10-12 10:36:00 UTC
With the newer version (odf 4.9.0-164.ci), I am not able to reproduce this issue. 

sh-4.4$ ceph tell osd.0 heap start_profiler
osd.0 started profiler
sh-4.4$

Comment 8 Brad Hubbard 2021-10-14 00:48:54 UTC
I think what probably happened here is the original ceph build you tested for 4.9.0-154.ci had tcmalloc disabled (I remember hearing something about this happening on some earlier builds) but that the ceph build for 4.9.0-164.ci now has tcmalloc enabled.

Comment 9 Mudit Agarwal 2021-10-14 02:33:16 UTC
Please reopen if this still exists.


Note You need to log in before you can comment on or make changes to this bug.