Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2237038

Summary: [RHOSP 17.1 Hackfest] Attempt to connect to RHCS 5.3 cluster from RHEL 9.2 using v1 protocol causes client core dump
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Andrew Austin <aaustin>
Component: RADOSAssignee: Radoslaw Zarzynski <rzarzyns>
Status: CLOSED ERRATA QA Contact: Harsh Kumar <hakumar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.3CC: bhubbard, ceph-eng-bugs, cephqe-warriors, ngangadh, nojha, sostapov, tserlin, vumrao
Target Milestone: ---   
Target Release: 7.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-18.2.1-61.el9cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-06-13 14:21:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Austin 2023-09-03 02:47:57 UTC
Created attachment 1986778 [details]
core dump from RHCS 5 client

Description of problem:
Executing ceph client commands from a RHEL 9.2 to an RHCS 5.3 cluster using messenger v1 results in the client crashing with a core dump.

Commands succeed when the client is limited to v2; however RHOSP always configures the v1 port so this breaks integration between RHOSP 17.1 and an external RHCS 5.3 cluster.

Version-Release number of selected component (if applicable):
Tested with RHEL 9.2 (kernel 5.14.0-284.25.1.el9_2.x86_64) and both RHCS 5 (16.2.10-208.el9cp) and RHCS 6 (17.2.6-100.el9cp) clients.

The cluster tested was version 16.2.10-187.el8cp. Connection to an RHCS 6.1 cluster seemed to be fine.

How reproducible:

From a fresh RHEL 9.2 machine with ceph-common installed, configure the ceph client to communicate with the RHCS cluster over v1 protocol. Run ceph df or ceph status to trigger the crash.

Steps to Reproduce:
1.Configure a RHEL 9.2 machine with a ceph client pointing to an RHCS 5.3 cluster with only v1 ports
2.Run ceph df
3.Observe crash and core dump

Actual results:

When v1 protocol is used (forced or fallback), there is a core dump with the message below.

Expected results:

The command should return the normal ceph df output.

Additional info:

stderr text on crash:

/usr/include/c++/11/bits/random.tcc:2667: void std::discrete_distribution<_IntType>::param_type::_M_initialize() [with _IntType = int]: Assertion '__sum > 0' failed.
Aborted (core dumped)

Workaround for users using ceph CLI: configure only v2 endpoints in ceph.conf

Since OpenStack always configures libvirt to use port 6789, there does not seem to be a workaround for RHOSP 17.1 integration.

Comment 1 RHEL Program Management 2023-09-03 02:48:08 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Andrew Austin 2023-09-03 17:36:42 UTC
This seems to be related to https://bugzilla.redhat.com/show_bug.cgi?id=2235738. After locating that bug, I found that one of my mons had weight = 10 while the other two had weight = 0. Setting the odd mon to weight 0 resolved the issue.

Comment 9 errata-xmlrpc 2024-06-13 14:21:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925