Bug 2167343

Summary: [luminous] FAILED assert(authenticate_err == 0)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shiqi <qshi>
Component: RADOSAssignee: Brad Hubbard <bhubbard>
Status: ASSIGNED --- QA Contact: Pawan <pdhiran>
Severity: low Docs Contact:
Priority: low    
Version: 3.1CC: bhubbard, bkunal, ceph-eng-bugs, cephqe-warriors, vumrao, xili, yyin
Target Milestone: ---Keywords: Reopened
Target Release: 6.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-15 07:07:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description shiqi 2023-02-06 11:00:26 UTC
Description of problem:
nova compute shutdown with the log:
/builddir/build/BUILD/ceph-12.2.5/src/mon/MonClient.cc: In function 'int MonClient::authenticate(double)' thread 7f7bc63e9740 time 2023-02-06 13:55:52.904855
/builddir/build/BUILD/ceph-12.2.5/src/mon/MonClient.cc: 479: FAILED assert(authenticate_err == 0)
 ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f7baab12ec0]
 2: (MonClient::authenticate(double)+0xa17) [0x7f7baab5d057]
 3: (librados::RadosClient::connect()+0x10ac) [0x7f7bb35d92bc]
 4: (rados_connect()+0x20) [0x7f7bb3583ff0]

Version-Release number of selected component (if applicable):
 ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 17 shiqi 2023-02-17 08:42:17 UTC
/builddir/build/BUILD/ceph-12.2.5/src/mon/MonClient.cc: In function 'int MonClient::authenticate(double)' thread 7f7bc63e9740 time 2023-02-06 13:55:52.904855
/builddir/build/BUILD/ceph-12.2.5/src/mon/MonClient.cc: 479: FAILED assert(authenticate_err == 0)
 ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f7baab12ec0]
 2: (MonClient::authenticate(double)+0xa17) [0x7f7baab5d057]
 3: (librados::RadosClient::connect()+0x10ac) [0x7f7bb35d92bc]
 4: (rados_connect()+0x20) [0x7f7bb3583ff0]

1. See above call stack . I think this is the coredump info.
2. As far as I understand __ceph_assert_fail would cannot trigger coredump generation.
3. debug_auth=20, debug_monc=20, debug_ms=1 will cause a large number of logs to occupy disks and reduce performance. According to the customer's description, the fault occurs every two months. Can you lower the log level ? such as debug_auth=5, debug_monc=5, debug_ms=1. Do you think this log level will affect the production environment?

Comment 25 shiqi 2023-04-07 03:41:40 UTC
@Brad Hubbard
Hello brad. All the logs and coredump are attached to the related case(03429811) include coredump(abrt.tar(1).gz), ceph client log(ceph.client.log.tar(1).gz), ceph callstack in docker log(W-PC-SRH310-369--docker-log-all.txt).  Please pay attention to these attachment. Please indicate which bug it is?  is it fixed? and is it intended to be fixed in this release ?