Bug 1815088 - [OCS 4.2][Noobaa]: Cleanup irrelevant commands from collecting noobaa diagnostics
Summary: [OCS 4.2][Noobaa]: Cleanup irrelevant commands from collecting noobaa diagnos...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.2
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
: OCS 4.4.0
Assignee: Ohad
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-19 12:54 UTC by Deepu K S
Modified: 2023-09-07 22:29 UTC (History)
9 users (show)

Fixed In Version: 4.4.0-410
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-04 12:54:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-core pull 5955 0 None closed Refactor diagnostics collection code 2020-10-19 15:57:07 UTC
Red Hat Product Errata RHBA-2020:2393 0 None None None 2020-06-04 12:54:53 UTC

Description Deepu K S 2020-03-19 12:54:32 UTC
Description of problem (please be detailed as possible and provide log
snippests):
It is noticed that running noobaa cluster diagnotics doesn't capture some of the command outputs as the binaries doesn't exist in the noobaa-core-0 pod.

===================================================
$ cat diagnostics_collection.log 
collected /var/log/messages files successfully

collected chkconfig.out successfully

collecting /log/nbfedump directory failed with error: Error: cp -fpR /log/nbfedump /tmp/diag exited with error Error: Command failed: cp -fpR /log/nbfedump /tmp/diag
cp: cannot stat '/log/nbfedump': No such file or directory


collecting slabtop.out failed with error: Error: slabtop -o &> /tmp/diag/slabtop.out exited with error Error: Command failed: slabtop -o &> /tmp/diag/slabtop.out


collecting top.out failed with error: Error: COLUMNS=512 top -c -b -n 1 &> /tmp/diag/top.out exited with error Error: Command failed: COLUMNS=512 top -c -b -n 1 &> /tmp/diag/top.out


collecting client_noobaa.log failed with error: Error: cp -fp /log/client_noobaa.log* /tmp/diag exited with error Error: Command failed: cp -fp /log/client_noobaa.log* /tmp/diag
cp: cannot stat '/log/client_noobaa.log*': No such file or directory


collecting noobaa_deploy.log failed with error: Error: cp -fp /log/noobaa_deploy* /tmp/diag exited with error Error: Command failed: cp -fp /log/noobaa_deploy* /tmp/diag
cp: cannot stat '/log/noobaa_deploy*': No such file or directory


collected supervisor logs successfully

collected df.out successfully

collected noobaa.log files successfully

finished get_system_collections_dump successfully

finished writing hosts list successfully

collected statistics successfully

collected lsof.out successfully

========================================================

In above `top` command is missing which is helpful to understand noobaa-core-0 db container high cpu usage causes.

Version of all relevant components (if applicable):
OCS v4.2.2
OCP 4.3.1

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No. However it affects during troubleshooting.

Is there any workaround available to the best of your knowledge?
Not in my knowledge.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes.

Can this issue reproduce from the UI?
Yes.

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Download the Diagnostics from Noobaa console.
System Management -> Support & Diagnostics -> Download Diagnostics.
2. Check the contents of */tmp/diag from the archive.
3.


Actual results:
diag]$ cat top.out 
/bin/sh: top: command not found

diag]$ cat slabtop.out 
/bin/sh: slabtop: command not found
$

Expected results:
diagnostics_collection should not fail for some outputs. 

Additional info:
It is also noticed that `ps` command is not present inside the noobaa pods.

Comment 2 Nimrod Becker 2020-03-19 12:56:12 UTC
@ohad, let's suppress errors if FE dumps don't exist.

Comment 3 Ohad 2020-03-25 12:54:13 UTC
"top" and "slabtop" do not exist on the base image that NooBaa uses.
My intention is to remove the calls for these commands and suppress the errors for cp command on non-existing files (fedumps and noobaa_deploy logs)

Comment 4 Ohad 2020-03-25 15:08:44 UTC
Fixed in upstream PR (see links)

Comment 7 Deepu K S 2020-03-25 19:49:31 UTC
(In reply to Ohad from comment #3)
> "top" and "slabtop" do not exist on the base image that NooBaa uses.
> My intention is to remove the calls for these commands and suppress the
> errors for cp command on non-existing files (fedumps and noobaa_deploy logs)

Hi Ohad,

Can you please help me how we can check the causes of high CPU for noobaa-core? I would like to know what all we can check from noobaa perspective in such a case.
Any pointers would be helpful.

Thanks.

Comment 8 Ohad 2020-04-01 08:34:14 UTC
Hi Deepu,
Currently, we do not have a special way to figure up if a process on the container uses hi CPU. 
You can find CPU related information in Prometheus or using the command "oc describe node" on the node hosting the pod.

Comment 10 Filip Balák 2020-04-24 12:41:26 UTC
The issue was resolved as described in Comment 2 and Comment 3:
There are no `command not found` errors because there are currently no files top.out and slabtop.out in /tmp/diag/

Tested with:
ocs-operator.v4.4.0-413.ci

Comment 12 errata-xmlrpc 2020-06-04 12:54:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2393


Note You need to log in before you can comment on or make changes to this bug.