Bug 1568897

Summary: [RFE][CEE/SD] change "osd_max_markdown_count" dout level from (10) -> (0) for RHCS 2.y version
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tomas Petr <tpetr>
Component: RADOSAssignee: Josh Durgin <jdurgin>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact: Aron Gunn <agunn>
Priority: medium    
Version: 2.5CC: agunn, ceph-eng-bugs, ceph-qe-bugs, dzafman, hnallurv, jdurgin, kchai, mhackett, tchandra, tserlin, vumrao
Target Milestone: z1Keywords: CodeChange, FutureFeature
Target Release: 2.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.10-22.el7cp Ubuntu: ceph_10.2.10-19redhat1 Doc Type: Enhancement
Doc Text:
OSDs now log when they shutdown due to disk operations timing out by default.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-26 18:06:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1536401    

Description Tomas Petr 2018-04-18 11:53:52 UTC
Description of problem:
If possible, change the dout level of "osd_max_markdown_count" from (10) -> (0),
 - that will also match with upstream "master" and  downstream  "RHCS 3":

From ceph-10.2.10/src/osd/OSD.cc:

   7021         if ((int)osd_markdown_log.size() > g_conf->osd_max_markdown_count) {
   7022           dout(10) << __func__ << " marked down "
   7023                    << osd_markdown_log.size()
   7024                    << " > osd_max_markdown_count "
   7025                    << g_conf->osd_max_markdown_count
   7026                    << " in last " << grace << " seconds, shutting down"
   7027                    << dendl;
   7028           do_restart = false;
   7029           do_shutdown = true;

To:
   7021         if ((int)osd_markdown_log.size() > g_conf->osd_max_markdown_count) {
   7022           dout(0) << __func__ << " marked down "
   7023                    << osd_markdown_log.size()
   7024                    << " > osd_max_markdown_count "
   7025                    << g_conf->osd_max_markdown_count
   7026                    << " in last " << grace << " seconds, shutting down"
   7027                    << dendl;
   7028           do_restart = false;
   7029           do_shutdown = true;


Reason: will make our life a lot easier =D
  - in ceph-osd.log with default logging level will show this was a reason for osd to terminate (gracefully suicide) itself

Target: next RHCS 2.x release

Version-Release number of selected component (if applicable):
RHCS 2.5

Comment 4 Vikhyat Umrao 2018-04-30 20:07:25 UTC
jewel upstream backport: https://github.com/ceph/ceph/pull/21747

Comment 6 Vikhyat Umrao 2018-05-24 21:07:45 UTC
(In reply to Vikhyat Umrao from comment #4)
> jewel upstream backport: https://github.com/ceph/ceph/pull/21747

Merged Upstream.

Comment 7 Josh Durgin 2018-06-26 16:40:13 UTC
added to ceph-2-rhel-patches

Comment 16 errata-xmlrpc 2018-07-26 18:06:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2261