Bug 2151908 - [RFE] mgr/cephadm: timeouts for ssh/binary commands to help users to know what might be going wrong
Summary: [RFE] mgr/cephadm: timeouts for ssh/binary commands to help users to know wha...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 6.1z1
Assignee: Adam King
QA Contact: Vinayak Papnoi
Akash Raj
URL:
Whiteboard:
: 2102485 2133406 2149564 2149606 2153709 (view as bug list)
Depends On:
Blocks: 2192813 2221020
TreeView+ depends on / blocked
 
Reported: 2022-12-08 14:35 UTC by Vasishta
Modified: 2025-01-23 06:43 UTC (History)
12 users (show)

Fixed In Version: ceph-17.2.6-84.el9cp
Doc Type: Enhancement
Doc Text:
.The Cephadm commands run on the host from the cephadm mgr module now have timeouts Previously, one of the Cephadm commands would occasionally hang indefinitely, and it was difficult for users to notice and sort the issue. With this release, timeouts are introduced in the Cephadm commands that are run on the host from the Cephadm mgr module. Users are now alerted with a health warning about eventual failure if one of the commands hangs. The timeout is configurable with the `mgr/cephadm/default_cephadm_command_timeout` setting, and defaults to 900 seconds.
Clone Of:
Environment:
Last Closed: 2023-08-03 16:45:09 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 54024 0 None None None 2022-12-08 14:35:01 UTC
Red Hat Issue Tracker RHCEPH-5763 0 None None None 2022-12-08 14:45:22 UTC
Red Hat Product Errata RHBA-2023:4473 0 None None None 2023-08-03 16:46:06 UTC

Description Vasishta 2022-12-08 14:35:01 UTC
Description of problem:
Sometimes orchestrator operations might get stuck in nodes due to various reasons.

Example :
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/__init__.py", line 998, in emit
    self.flush()
  File "/usr/lib64/python3.6/logging/__init__.py", line 978, in flush
    self.stream.flush()
OSError: [Errno 28] No space left on device


ceph doesn't rever user back with any information but gets stuck without any notifications even in the DEBUG logs.

This BZ is a downstream tracker for on ongoing effort to add timeouts to help users to know that the operation was actually tried but timed-out due to possible x,y,z scenario.

Comment 1 Vasishta 2022-12-08 15:39:47 UTC
*** Bug 2149606 has been marked as a duplicate of this bug. ***

Comment 2 Vasishta 2022-12-08 15:43:55 UTC
*** Bug 2149564 has been marked as a duplicate of this bug. ***

Comment 8 Adam King 2023-03-31 19:14:39 UTC
*** Bug 2102485 has been marked as a duplicate of this bug. ***

Comment 9 Adam King 2023-03-31 19:22:20 UTC
*** Bug 2133406 has been marked as a duplicate of this bug. ***

Comment 10 Adam King 2023-03-31 19:34:54 UTC
*** Bug 2153709 has been marked as a duplicate of this bug. ***

Comment 36 errata-xmlrpc 2023-08-03 16:45:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:4473


Note You need to log in before you can comment on or make changes to this bug.