Bug 2219769 - Error message on node where OSDs are scheduled "ceph daemon health check failed with the following output:" - command parameters are wrong.
Summary: Error message on node where OSDs are scheduled "ceph daemon health check fail...
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Radoslaw Zarzynski
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-05 09:37 UTC by Elvir Kuric
Modified: 2023-08-15 15:17 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-08 15:19:28 UTC
Embargoed:


Attachments (Terms of Use)

Description Elvir Kuric 2023-07-05 09:37:47 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Note: classified this bz as "ceph" -> RBD. I think it is just ceph related, please re-assign as necessary, I am not sure in which category to put this bz.  

On nodes where OSDs are scheduled I see very often below error message [1] 
seems some ceph command intended to be executed has wrong parameters and it fails - reporting closest matches for command in order to work. 
[1] 

Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         ceph daemon health check failed with the following output:
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > no valid command found; 10 closest matches:
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > 0
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > 1
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > 2
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > abort
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > assert
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluefs debug_inject_read_zeros
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluefs files list
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluefs stats
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluestore allocator dump block
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluestore allocator fragmentation block
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > admin_socket: invalid command
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:  >
Jul 04 11:09:22 f12-h06-000-1029u ovs-vswitchd[2164]: ovs|451613|connmgr|INFO|br-ex<->unix#2713321: 2 flow_mods in the last 0 s (2 adds)




Version of all relevant components (if applicable):
ODF v4.13
ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

No
Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes, check logs on nodes where OSDs pods are scheduled 

Can this issue reproduce from the UI?

NA
If this is a regression, please provide more details to justify this:


Steps to Reproduce:
There is not actual steps to reproduce, I think we just need to check what command fails and provide it with right parameters. 



Actual results:
error message in logs - command fails with wrong parameters
Expected results:
command to succeed and to avoid misleading error messages is logs


Additional info:
NA

Comment 3 Blaine Gardner 2023-07-11 15:28:41 UTC
Thanks Parth for taking a further look.

Comment 9 Travis Nielsen 2023-08-08 15:19:28 UTC
Please reopen if there is more to investigate


Note You need to log in before you can comment on or make changes to this bug.