Description of problem (please be detailed as possible and provide log snippets): ocs-ci test "tests/manage/z_cluster/test_osd_heap_profile.py::TestOSDHeapProfile::test_osd_heap_profile" fails with with "could not issue heap profiler command -- not using tcmalloc!" Error: E ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage rsh rook-ceph-tools-65f5c5798c-zm4t7 ceph tell osd.0 heap start_profiler. E Error is Error ENOTSUP: could not issue heap profiler command -- not using tcmalloc! E command terminated with exit code 95 Version of all relevant components (if applicable): ocs 4.9 (tested with 4.9.0-154.ci) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? yes Can this issue reproduce from the UI? no If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Install ocp cluster 2. Deploy ODF along with LSO 3. executed the ocs-ci test or issue command "oc -n openshift-storage rsh rook-ceph-tools-65f5c5798c-zm4t7 ceph tell osd.2 heap start_profiler" Actual results: ``` E ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage rsh rook-ceph-tools-65f5c5798c-zm4t7 ceph tell osd.2 heap start_profiler. E Error is Error ENOTSUP: could not issue heap profiler command -- not using tcmalloc! E command terminated with exit code 95 ``` Expected results: command executes without any errors. Additional info: Test & must-gather logs : https://drive.google.com/file/d/1Ya-vU3cPD9hnNfBdw71TNSUH_ZzOFIRX/view?usp=sharing
This looks like a CI build problem to me.
Petr, can you please take a look if this is a ci issue?
Hello Abdul, is this issue consistently reproducible? Is this error coming from `oc rhs` command itself or this is really returned output from: ceph tell osd.2 heap start_profiler From toolbox pod? If it's constantly reproducible, can you please just RSH to toolbox pod and try to run command locally there in the pod? This is the first time I see such error, so not sure what can be problem, but if this is the output coming from the command (ceph tell osd.2 heap start_profiler) itself, it doesn't look like issue in OCS-CI if this is the valid command. If it's returned from oc command, then it can be some OCP issue to run RSH command on pod. Which can be temporary glitch or bug, not sure.
I'd suggest you check whether the 'z' build disables tcmalloc. If so this error is totally expected. https://github.com/ceph/ceph/blob/29bda6fd2aabcb37cf1c46a6edddf004d28bb164/src/osd/OSD.cc#L11509-L11513
With the newer version (odf 4.9.0-164.ci), I am not able to reproduce this issue. sh-4.4$ ceph tell osd.0 heap start_profiler osd.0 started profiler sh-4.4$
I think what probably happened here is the original ceph build you tested for 4.9.0-154.ci had tcmalloc disabled (I remember hearing something about this happening on some earlier builds) but that the ceph build for 4.9.0-164.ci now has tcmalloc enabled.
Please reopen if this still exists.