Description of problem (please be detailed as possible and provide log snippests): It is noticed that running noobaa cluster diagnotics doesn't capture some of the command outputs as the binaries doesn't exist in the noobaa-core-0 pod. =================================================== $ cat diagnostics_collection.log collected /var/log/messages files successfully collected chkconfig.out successfully collecting /log/nbfedump directory failed with error: Error: cp -fpR /log/nbfedump /tmp/diag exited with error Error: Command failed: cp -fpR /log/nbfedump /tmp/diag cp: cannot stat '/log/nbfedump': No such file or directory collecting slabtop.out failed with error: Error: slabtop -o &> /tmp/diag/slabtop.out exited with error Error: Command failed: slabtop -o &> /tmp/diag/slabtop.out collecting top.out failed with error: Error: COLUMNS=512 top -c -b -n 1 &> /tmp/diag/top.out exited with error Error: Command failed: COLUMNS=512 top -c -b -n 1 &> /tmp/diag/top.out collecting client_noobaa.log failed with error: Error: cp -fp /log/client_noobaa.log* /tmp/diag exited with error Error: Command failed: cp -fp /log/client_noobaa.log* /tmp/diag cp: cannot stat '/log/client_noobaa.log*': No such file or directory collecting noobaa_deploy.log failed with error: Error: cp -fp /log/noobaa_deploy* /tmp/diag exited with error Error: Command failed: cp -fp /log/noobaa_deploy* /tmp/diag cp: cannot stat '/log/noobaa_deploy*': No such file or directory collected supervisor logs successfully collected df.out successfully collected noobaa.log files successfully finished get_system_collections_dump successfully finished writing hosts list successfully collected statistics successfully collected lsof.out successfully ======================================================== In above `top` command is missing which is helpful to understand noobaa-core-0 db container high cpu usage causes. Version of all relevant components (if applicable): OCS v4.2.2 OCP 4.3.1 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No. However it affects during troubleshooting. Is there any workaround available to the best of your knowledge? Not in my knowledge. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes. Can this issue reproduce from the UI? Yes. If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Download the Diagnostics from Noobaa console. System Management -> Support & Diagnostics -> Download Diagnostics. 2. Check the contents of */tmp/diag from the archive. 3. Actual results: diag]$ cat top.out /bin/sh: top: command not found diag]$ cat slabtop.out /bin/sh: slabtop: command not found $ Expected results: diagnostics_collection should not fail for some outputs. Additional info: It is also noticed that `ps` command is not present inside the noobaa pods.
@ohad, let's suppress errors if FE dumps don't exist.
"top" and "slabtop" do not exist on the base image that NooBaa uses. My intention is to remove the calls for these commands and suppress the errors for cp command on non-existing files (fedumps and noobaa_deploy logs)
Fixed in upstream PR (see links)
(In reply to Ohad from comment #3) > "top" and "slabtop" do not exist on the base image that NooBaa uses. > My intention is to remove the calls for these commands and suppress the > errors for cp command on non-existing files (fedumps and noobaa_deploy logs) Hi Ohad, Can you please help me how we can check the causes of high CPU for noobaa-core? I would like to know what all we can check from noobaa perspective in such a case. Any pointers would be helpful. Thanks.
Hi Deepu, Currently, we do not have a special way to figure up if a process on the container uses hi CPU. You can find CPU related information in Prometheus or using the command "oc describe node" on the node hosting the pod.
The issue was resolved as described in Comment 2 and Comment 3: There are no `command not found` errors because there are currently no files top.out and slabtop.out in /tmp/diag/ Tested with: ocs-operator.v4.4.0-413.ci
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2393