Bug 2219769

Summary: Error message on node where OSDs are scheduled "ceph daemon health check failed with the following output:" - command parameters are wrong.
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Elvir Kuric <ekuric>
Component: cephAssignee: Radoslaw Zarzynski <rzarzyns>
ceph sub component: RADOS QA Contact: Elad <ebenahar>
Status: NEW --- Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bniver, brgardne, idryomov, muagarwa, nojha, odf-bz-bot, sostapov, tnielsen
Version: 4.13Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-08 15:19:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elvir Kuric 2023-07-05 09:37:47 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Note: classified this bz as "ceph" -> RBD. I think it is just ceph related, please re-assign as necessary, I am not sure in which category to put this bz.  

On nodes where OSDs are scheduled I see very often below error message [1] 
seems some ceph command intended to be executed has wrong parameters and it fails - reporting closest matches for command in order to work. 
[1] 

Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         ceph daemon health check failed with the following output:
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > no valid command found; 10 closest matches:
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > 0
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > 1
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > 2
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > abort
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > assert
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluefs debug_inject_read_zeros
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluefs files list
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluefs stats
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluestore allocator dump block
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluestore allocator fragmentation block
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > admin_socket: invalid command
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:  >
Jul 04 11:09:22 f12-h06-000-1029u ovs-vswitchd[2164]: ovs|451613|connmgr|INFO|br-ex<->unix#2713321: 2 flow_mods in the last 0 s (2 adds)




Version of all relevant components (if applicable):
ODF v4.13
ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

No
Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes, check logs on nodes where OSDs pods are scheduled 

Can this issue reproduce from the UI?

NA
If this is a regression, please provide more details to justify this:


Steps to Reproduce:
There is not actual steps to reproduce, I think we just need to check what command fails and provide it with right parameters. 



Actual results:
error message in logs - command fails with wrong parameters
Expected results:
command to succeed and to avoid misleading error messages is logs


Additional info:
NA

Comment 3 Blaine Gardner 2023-07-11 15:28:41 UTC
Thanks Parth for taking a further look.

Comment 9 Travis Nielsen 2023-08-08 15:19:28 UTC
Please reopen if there is more to investigate