2219769 – [Tracker for Ceph BZ #2311694] Error message on node where OSDs are scheduled "ceph daemon health check failed with the following output:" - command parameters are wrong.

Bug 2219769 - [Tracker for Ceph BZ #2311694] Error message on node where OSDs are scheduled "ceph daemon health check failed with the following output:" - command parameters are wrong. [NEEDINFO]

Summary: [Tracker for Ceph BZ #2311694] Error message on node where OSDs are scheduled...

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Santosh Pillai
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2311694
TreeView+	depends on / blocked

Reported:	2023-07-05 09:37 UTC by Elvir Kuric
Modified:	2024-09-20 00:30 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2311694 (view as bug list)
Environment:
Last Closed:	2023-08-08 15:19:28 UTC
Embargoed:
Flags:	muagarwa: needinfo? (rzarzyns)

Attachments	(Terms of Use)

Description Elvir Kuric 2023-07-05 09:37:47 UTC

Description of problem (please be detailed as possible and provide log
snippests):

Note: classified this bz as "ceph" -> RBD. I think it is just ceph related, please re-assign as necessary, I am not sure in which category to put this bz.  

On nodes where OSDs are scheduled I see very often below error message [1] 
seems some ceph command intended to be executed has wrong parameters and it fails - reporting closest matches for command in order to work. 
[1] 

Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         ceph daemon health check failed with the following output:
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > no valid command found; 10 closest matches:
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > 0
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > 1
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > 2
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > abort
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > assert
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluefs debug_inject_read_zeros
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluefs files list
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluefs stats
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluestore allocator dump block
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > bluestore allocator fragmentation block
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:         > admin_socket: invalid command
Jul 04 11:09:21 f12-h06-000-1029u kubenswrapper[4818]:  >
Jul 04 11:09:22 f12-h06-000-1029u ovs-vswitchd[2164]: ovs|451613|connmgr|INFO|br-ex<->unix#2713321: 2 flow_mods in the last 0 s (2 adds)




Version of all relevant components (if applicable):
ODF v4.13
ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

No
Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes, check logs on nodes where OSDs pods are scheduled 

Can this issue reproduce from the UI?

NA
If this is a regression, please provide more details to justify this:


Steps to Reproduce:
There is not actual steps to reproduce, I think we just need to check what command fails and provide it with right parameters. 



Actual results:
error message in logs - command fails with wrong parameters
Expected results:
command to succeed and to avoid misleading error messages is logs


Additional info:
NA

Comment 3 Blaine Gardner 2023-07-11 15:28:41 UTC

Thanks Parth for taking a further look.

Comment 9 Travis Nielsen 2023-08-08 15:19:28 UTC

Please reopen if there is more to investigate

Note You need to log in before you can comment on or make changes to this bug.