Bug 2346030
| Summary: | [10k osd]cephadm timeout error during refresh | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Paul Cuzner <pcuzner> | ||||
| Component: | Cephadm | Assignee: | Shweta Bhosale <shbhosal> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Sayalee <saraut> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.0 | CC: | adking, akane, cephqe-warriors, jansingh, saraut, shbhosal | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 9.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | ceph-20.1.0-16 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2026-01-29 06:53:51 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2026:1536 |
Created attachment 2076788 [details] timeout error Description of problem: The 10K OSDs testing sees a timeout exception at 399 hosts. It's unclear whether this is as a result of lab issues but does highlight the way that the condition is handled. For example, should the exception result in a healthcheck so the admin knows to change any associated timeout values? Version-Release number of selected component (if applicable): 19.2.0-55 How reproducible: Unclear - other network stability issues have prevented repeat work Steps to Reproduce: 1. Scale a cluster to over 399 hosts (>6000 OSDs) 2. check the mgr log 3. Actual results: timeout observed (see attached file) Expected results: Any timeout should post a healthcheck to trigger administrator action to resolve. Additional info: