Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2151908

Summary:	[RFE] mgr/cephadm: timeouts for ssh/binary commands to help users to know what might be going wrong
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Vasishta <vashastr>
Component:	Cephadm	Assignee:	Adam King <adking>
Status:	CLOSED ERRATA	QA Contact:	Vinayak Papnoi <vpapnoi>
Severity:	high	Docs Contact:	Akash Raj <akraj>
Priority:	unspecified
Version:	5.3	CC:	adking, akraj, cephqe-warriors, msaini, prprakas, qshi, rsachere, saraut, trchakra, tserlin, vereddy, vpapnoi
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	6.1z1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ceph-17.2.6-84.el9cp	Doc Type:	Enhancement
Doc Text:	.The Cephadm commands run on the host from the cephadm mgr module now have timeouts Previously, one of the Cephadm commands would occasionally hang indefinitely, and it was difficult for users to notice and sort the issue. With this release, timeouts are introduced in the Cephadm commands that are run on the host from the Cephadm mgr module. Users are now alerted with a health warning about eventual failure if one of the commands hangs. The timeout is configurable with the `mgr/cephadm/default_cephadm_command_timeout` setting, and defaults to 900 seconds.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-08-03 16:45:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2192813, 2221020

Description Vasishta 2022-12-08 14:35:01 UTC

Description of problem:
Sometimes orchestrator operations might get stuck in nodes due to various reasons.

Example :
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/__init__.py", line 998, in emit
    self.flush()
  File "/usr/lib64/python3.6/logging/__init__.py", line 978, in flush
    self.stream.flush()
OSError: [Errno 28] No space left on device


ceph doesn't rever user back with any information but gets stuck without any notifications even in the DEBUG logs.

This BZ is a downstream tracker for on ongoing effort to add timeouts to help users to know that the operation was actually tried but timed-out due to possible x,y,z scenario.

Comment 1 Vasishta 2022-12-08 15:39:47 UTC

*** Bug 2149606 has been marked as a duplicate of this bug. ***

Comment 2 Vasishta 2022-12-08 15:43:55 UTC

*** Bug 2149564 has been marked as a duplicate of this bug. ***

Comment 8 Adam King 2023-03-31 19:14:39 UTC

*** Bug 2102485 has been marked as a duplicate of this bug. ***

Comment 9 Adam King 2023-03-31 19:22:20 UTC

*** Bug 2133406 has been marked as a duplicate of this bug. ***

Comment 10 Adam King 2023-03-31 19:34:54 UTC

*** Bug 2153709 has been marked as a duplicate of this bug. ***

Comment 36 errata-xmlrpc 2023-08-03 16:45:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:4473