2014849 – Some ceph commands return error [exit code different from 0].

Bug 2014849 - Some ceph commands return error [exit code different from 0].

Summary: Some ceph commands return error [exit code different from 0].

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	must-gather
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	ODF 4.9.0
Assignee:	Mudit Agarwal
QA Contact:	Oded
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2021427
TreeView+	depends on / blocked

Reported:	2021-10-17 11:34 UTC by Oded
Modified:	2023-08-09 16:35 UTC (History)
CC List:	7 users (show)
Fixed In Version:	v4.9.0-201.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-01-07 17:46:31 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-operator pull 1379	0	None	open	must-gather: remove invalid ceph commands	2021-10-19 05:53:13 UTC
Github	red-hat-storage ocs-operator pull 1380	0	None	open	Bug 2014849: [release-4.9] must-gather: remove invalid ceph commands	2021-10-19 10:48:08 UTC

Description Oded 2021-10-17 11:34:44 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Some ceph commands return error [exit code different from 0]. 
I tried to run the relevant ceph commands manually from tool pod but the commands return error.
 

Version of all relevant components (if applicable):
Provider: Vmware
ODF Version:4.9.0-183.ci
OCP Version:4.9.0-0.nightly-2021-10-08-232649
$ ceph versions
{
    "mon": {
        "ceph version 16.2.0-140.el8cp (747f7a0286d51abc59b3a3a1a7cb17ec7a35754e) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.0-140.el8cp (747f7a0286d51abc59b3a3a1a7cb17ec7a35754e) pacific (stable)": 1
    },
    "osd": {
        "ceph version 16.2.0-140.el8cp (747f7a0286d51abc59b3a3a1a7cb17ec7a35754e) pacific (stable)": 3
    },
    "mds": {
        "ceph version 16.2.0-140.el8cp (747f7a0286d51abc59b3a3a1a7cb17ec7a35754e) pacific (stable)": 2
    },
    "rgw": {
        "ceph version 16.2.0-140.el8cp (747f7a0286d51abc59b3a3a1a7cb17ec7a35754e) pacific (stable)": 1
    },
    "overall": {
        "ceph version 16.2.0-140.el8cp (747f7a0286d51abc59b3a3a1a7cb17ec7a35754e) pacific (stable)": 10
    }
}

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
Test Process:
1.Run mg command:
$ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.9
2.Find files return error
3.Run the relevant commands manually form tool pod and the ceph commad return error.


********************************************************************************
Path1:
/ceph/logs/gather-ceph osd drain status-json-debug.log

no valid command found; 10 closest matches:
osd perf
osd df [plain|tree] [class|name] [<filter>]
osd blocked-by
osd pool stats [<pool_name>]
osd pool scrub <who>...
osd pool deep-scrub <who>...
osd pool repair <who>...
osd pool force-recovery <who>...
osd pool force-backfill <who>...
osd pool cancel-force-recovery <who>...
Error EINVAL: invalid command
command terminated with exit code 22

********************************************************************************
Path2:
/ceph/logs/gather-ceph balancer dump-debug.log

Invalid command: missing required parameter plan(<string>)
balancer dump <plan> :  Show an optimization plan
Error EINVAL: invalid command
command terminated with exit code 22


********************************************************************************
Path3:
/ceph/logs/gather-ceph pool autoscale-status-json-debug.log

no valid command found; 10 closest matches:
pg stat
pg getmap
pg dump [all|summary|sum|delta|pools|osds|pgs|pgs_brief...]
pg dump_json [all|summary|sum|pools|osds|pgs...]
pg dump_pools_json
pg ls-by-pool <poolstr> [<states>...]
pg ls-by-primary <id|osd.id> [<pool:int>] [<states>...]
pg ls-by-osd <id|osd.id> [<pool:int>] [<states>...]
pg ls [<pool:int>] [<states>...]
pg dump_stuck [inactive|unclean|stale|undersized|degraded...] [<threshold:int>]
Error EINVAL: invalid command
command terminated with exit code 22

********************************************************************************
Path4:
/ceph/logs/gather-ceph osd drain status-debug.log

no valid command found; 10 closest matches:
osd perf
osd df [plain|tree] [class|name] [<filter>]
osd blocked-by
osd pool stats [<pool_name>]
osd pool scrub <who>...
osd pool deep-scrub <who>...
osd pool repair <who>...
osd pool force-recovery <who>...
osd pool force-backfill <who>...
osd pool cancel-force-recovery <who>...
Error EINVAL: invalid command
command terminated with exit code 22

********************************************************************************
Path5:
/ceph/logs/gather-ceph pool autoscale-status-debug.log

no valid command found; 10 closest matches:
pg stat
pg getmap
pg dump [all|summary|sum|delta|pools|osds|pgs|pgs_brief...]
pg dump_json [all|summary|sum|pools|osds|pgs...]
pg dump_pools_json
pg ls-by-pool <poolstr> [<states>...]
pg ls-by-primary <id|osd.id> [<pool:int>] [<states>...]
pg ls-by-osd <id|osd.id> [<pool:int>] [<states>...]
pg ls [<pool:int>] [<states>...]
pg dump_stuck [inactive|unclean|stale|undersized|degraded...] [<threshold:int>]
Error EINVAL: invalid command
command terminated with exit code 22

********************************************************************************
Path6:
/ceph/logs/gather-ceph balancer dump-json-debug.log


Invalid command: missing required parameter plan(<string>)
balancer dump <plan> :  Show an optimization plan
Error EINVAL: invalid command
command terminated with exit code 22

********************************************************************************
I ran this python script:

import os

dir_path_4_9 = "/home/odedviner/ClusterPath/auth/must-gather.local.3413281135519036243"

errors = ["exit code 1", "exit code 2", "exit code 3", "exit code 4", "exit code 5",
          "exit code 6", "exit code 7", "exit code 8", "exit code 9"]
for root, dirs, files in os.walk(dir_path_4_9):
    for file in files:
        try:
            with open(root+'/'+file, 'r') as f:
                data = f.read()
            for error in errors:
                if error.lower() in data.lower():
                    print(root+'/'+file)
        except Exception as e:
            pass


Actual results:
Some ceph commands return error

Expected results:
All commands should be valid. And if not they should be removed from the MG command list

Additional info:

Comment 1 Martin Bukatovic 2021-10-18 15:33:21 UTC

Moving to ODF product.

Comment 4 Mudit Agarwal 2021-10-19 05:44:26 UTC

So following commands are not there:

1.
sh-4.4# ceph osd drain status
no valid command found; 10 closest matches:
osd perf
osd df [plain|tree] [class|name] [<filter>]
osd blocked-by
osd pool stats [<pool_name>]
osd pool scrub <who>...
osd pool deep-scrub <who>...
osd pool repair <who>...
osd pool force-recovery <who>...
osd pool force-backfill <who>...
osd pool cancel-force-recovery <who>...

2. This one is not an invalid command, it requires a plan which is not there.
sh-4.4# ceph balancer dump
Invalid command: missing required parameter plan(<string>)
balancer dump <plan> :  Show an optimization plan
Error EINVAL: invalid command

We can remove this as we already have "ceph balancer status" which gives all the information 

sh-4.4# ceph balancer status
{
    "active": true,
    "last_optimize_duration": "0:00:00.001034",
    "last_optimize_started": "Tue Oct 19 05:35:19 2021",
    "mode": "upmap",
    "optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
    "plans": []
}


3. 
sh-4.4# ceph pool autoscale-status
no valid command found; 10 closest matches:
pg stat
pg getmap
pg dump [all|summary|sum|delta|pools|osds|pgs|pgs_brief...]
pg dump_json [all|summary|sum|pools|osds|pgs...]
pg dump_pools_json
pg ls-by-pool <poolstr> [<states>...]
pg ls-by-primary <id|osd.id> [<pool:int>] [<states>...]
pg ls-by-osd <id|osd.id> [<pool:int>] [<states>...]
pg ls [<pool:int>] [<states>...]
pg dump_stuck [inactive|unclean|stale|undersized|degraded...] [<threshold:int>]
Error EINVAL: invalid command

Oded, I will be sending a PR to remove these command (and their json counterpart), please let me know if any other command is misbehaving.

Comment 7 Oded 2021-10-26 14:13:57 UTC

No ceph command returns an error.

SetUp:
Provider: Vmware
OCP Version: 4.9.0-0.nightly-2021-10-26-021742
ODF Version: 4.9.0-203.ci

Test Process:
1.Run must-gather command:
$ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.9

2.Run Script, No ceph command returns an error:
import os
dir_path_4_9 = "/home/odedviner/ClusterPath/auth/must-gather.local.3413281135519036243"
errors = ["exit code 1", "exit code 2", "exit code 3", "exit code 4", "exit code 5",
          "exit code 6", "exit code 7", "exit code 8", "exit code 9"]
for root, dirs, files in os.walk(dir_path_4_9):
    for file in files:
        try:
            with open(root+'/'+file, 'r') as f:
                data = f.read()
            for error in errors:
                if error.lower() in data.lower():
                    print(root+'/'+file)
        except Exception as e:
            pass

Note You need to log in before you can comment on or make changes to this bug.