Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1390779

Summary:

Apparently unable to change dynamically osd_recovery_max_active parameter

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Jean-Charles Lopez <jelopez>

Component:

RADOS

Assignee:

Brad Hubbard <bhubbard>

Status:

CLOSED NOTABUG

QA Contact:

ceph-qe-bugs <ceph-qe-bugs>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

2.0

CC:

bhubbard, ceph-eng-bugs, dzafman, jdurgin, jelopez, kchai, tbrekke

Target Milestone:

Target Release:

2.3

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-04-01 01:28:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
OSD 0 log file after changing osd_recovery_max_active to 1 immediately after the OSD started	none

Description Jean-Charles Lopez 2016-11-02 00:11:05 UTC

Created attachment 1216280 [details]
OSD 0 log file after changing osd_recovery_max_active to 1 immediately after the OSD started

Description of problem:
While trying to use "ceph osd daemon osd.x config set" or "ceph tell osd.x injectargs" to modify dynamically the osd_recovery_max_active parameter for one or more OSDs I get an output message telling me this parameter is unchangeable.


Version-Release number of selected component (if applicable):
2.0

How reproducible:
Easy 100% of the times

Steps to Reproduce:
1. ceph daemon osd.0 config set osd_recovery_max_active 5
2. 
3.

Actual results:
"success": "osd_recovery_max_active = '5' (unchangeable) "

Expected results:
"success": "osd_recovery_max_active = '5'"

Additional info:
Not sure if this parameter has become static only though and couldn't find anything in the mailing lists.

I have made a test and collected a lot to verify if the change was actually made because if I run a get command, it shows me the modified value
ceph daemon osd.0 config get osd_recovery_max_active
{
    "osd_recovery_max_active": "5"
}

I have attached a log after setting the value to 1 for a given OSD and enabling debug_osd=20 after the OSD started (I set noout and stopped the OSD and loaded some data in 3 PGs for which this OSD was primary before I stopped it.

Befroe restarting the OSD I modified the config file to include debug_osd=20 osd_recovery_max_active=2

Comment 2 Brad Hubbard 2016-11-02 22:14:07 UTC

Here's the history behind this I believe.

http://tracker.ceph.com/issues/11692
https://github.com/ceph/ceph/pull/7085
https://github.com/ceph/ceph/commit/d93d92a7f48cea65edfdf89e3d6cff64a95d4070

So it is a warning that nothing is observing this value so it is "unchangeable" in the sense that any change will not take effect without a restart (can not be altered with any effect at runtime).

The key piece of code is here.

 613 void md_config_t::_apply_changes(std::ostream *oss)
 614 {
 615   /* Maps observers to the configuration options that they care about which
 616    * have changed. */
 617   typedef std::map < md_config_obs_t*, std::set <std::string> > rev_obs_map_t;
 618 
 619   expand_all_meta();
 620 
 621   // create the reverse observer mapping, mapping observers to the set of
 622   // changed keys that they'll get.
 623   rev_obs_map_t robs;
 624   std::set <std::string> empty_set;
 625   char buf[128];
 626   char *bufptr = (char*)buf;
 627   for (changed_set_t::const_iterator c = changed.begin();
 628        c != changed.end(); ++c) {
 629     const std::string &key(*c);
 630     pair < obs_map_t::iterator, obs_map_t::iterator >
 631       range(observers.equal_range(key));
 632     if ((oss) &&
 633 »·······(!_get_val(key.c_str(), &bufptr, sizeof(buf))) &&
 634 »·······!_internal_field(key)) {
 635       (*oss) << key << " = '" << buf << "' ";
 636       if (range.first == range.second) {
 637 »·······(*oss) << "(unchangeable) ";
 638       }
 639     }
 640     for (obs_map_t::iterator r = range.first; r != range.second; ++r) {
 641       rev_obs_map_t::value_type robs_val(r->second, empty_set);
 642       pair < rev_obs_map_t::iterator, bool > robs_ret(robs.insert(robs_val));
 643       std::set <std::string> &keys(robs_ret.first->second);
 644       keys.insert(key);
 645     }
 646   }

HTH

Comment 3 Brad Hubbard 2016-11-02 22:29:05 UTC

Perhaps there is a more suitable text than "unchangeable"?

"changing this value at runtime will have no effect as it is not being observed by any subsystem" ?

Just a thought.

Comment 4 Jean-Charles Lopez 2016-11-03 14:56:03 UTC

Hi Brad,

I got your comment and understand what's beneath although the Tracker you are quoting doesn't explicitely name this particular parameter.

Our major concern here is that this parameter was heavily used by the PS team when on site in order to dynamically influence the recovery traffic in version prior to RHCS 2.0.

According to what I am reading from your comments is that this will no longer doable and only a restart of the OSD(s) will allow us to do that which will turn to be a real pain in the field I guess. 

Thanks for giving it a look.
JC


(In reply to Brad Hubbard from comment #3)
> Perhaps there is a more suitable text than "unchangeable"?
> 
> "changing this value at runtime will have no effect as it is not being
> observed by any subsystem" ?
> 
> Just a thought.

Comment 5 Brad Hubbard 2016-11-04 02:13:19 UTC

(In reply to Jean-Charles Lopez from comment #4)
> Hi Brad,
> 
> I got your comment and understand what's beneath although the Tracker you
> are quoting doesn't explicitely name this particular parameter.

Right, as you correctly worked out, my comment was more of an explanation of the "unchangeable" notation in the output. i probably should have made that clearer.

> 
> Our major concern here is that this parameter was heavily used by the PS
> team when on site in order to dynamically influence the recovery traffic in
> version prior to RHCS 2.0.
> 
> According to what I am reading from your comments is that this will no
> longer doable and only a restart of the OSD(s) will allow us to do that
> which will turn to be a real pain in the field I guess. 

I searched for the "rops)" output from below.

void OSDService::start_recovery_op(PG *pg, const hobject_t& soid)
{
  Mutex::Locker l(recovery_lock);
  dout(10) << "start_recovery_op " << *pg << " " << soid 
»·······   << " (" << recovery_ops_active << "/"
»·······   << cct->_conf->osd_recovery_max_active << " rops)"
»·······   << dendl;
  recovery_ops_active++;

#ifdef DEBUG_RECOVERY_OIDS
  dout(20) << "  active was " << recovery_oids[pg->info.pgid] << dendl;
  assert(recovery_oids[pg->info.pgid].count(soid) == 0);
  recovery_oids[pg->info.pgid].insert(soid);
#endif
}

void OSDService::finish_recovery_op(PG *pg, const hobject_t& soid, bool dequeue) 
{
  Mutex::Locker l(recovery_lock);
  dout(10) << "finish_recovery_op " << *pg << " " << soid 
»·······   << " dequeue=" << dequeue
»·······   << " (" << recovery_ops_active << "/" << cct->_conf->osd_recovery_max_active << " rops)"
»·······   << dendl;

It is always "1" in the log you provided.

It looks like the mechanism to observe changes is there in the OSD class but osd_recovery_max_active does not seem to be being tracked in OSD::handle_conf_change (osd_min_recovery_priority is, for example) however we use it here.

bool OSDService::_recover_now(uint64_t *available_pushes)
{
  uint64_t max = cct->_conf->osd_recovery_max_active;
  if (max <= recovery_ops_active + recovery_ops_reserved) {
    dout(15) << "_recover_now active " << recovery_ops_active
»·······     << " + reserved " << recovery_ops_reserved
»·······     << " >= max " << max << dendl;
    if (available_pushes)
      *available_pushes = 0; 
    return false;
  }

Which looks to me like we are using it dynamically and that each time this function is called it will use the current value of cct->_conf->osd_recovery_max_active which should be updated by setting it via the admin socket. I added this patch to test that theory.

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 265c4ff..a9e3a7c 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -8402,6 +8402,7 @@ void OSDService::_maybe_queue_recovery() {
 bool OSDService::_recover_now(uint64_t *available_pushes)
 {
   uint64_t max = cct->_conf->osd_recovery_max_active;
+  dout(15) << "_recover_now " << "max = " << max << dendl;
   if (max <= recovery_ops_active + recovery_ops_reserved) {
     dout(15) << "_recover_now active " << recovery_ops_active
             << " + reserved " << recovery_ops_reserved

Trigger recovery with default settings and grep for "_recover_now".

out/osd.2.log:2016-11-04 12:00:21.733586 7f7a46dad700 15 osd.2 14 _recover_now max = 3

$ bin/ceph daemon osd.2 config set osd_recovery_max_active 1

Trigger recovery again and grep for "_recover_now".

out/osd.2.log:2016-11-04 12:05:15.097952 7f7a435a6700 15 osd.2 19 _recover_now max = 1

So it changes without a restart in the function where it is used so the "unchangeable" output seems inaccurate in this case.

So I believe you *can* change osd_recovery_max_active dynamically (and see it's effects) despite the "unchangeable" tag.
 
> 
> Thanks for giving it a look.
> JC
>

Comment 6 Jean-Charles Lopez 2016-11-04 14:34:45 UTC

Hi Brad,

your last test actually checks what I was trying to test and the reason I joined the log file.

I started the OSD with a config file value of 2 and immediately as the OSD was starting I set a value of 1 through the admin socket to verify with a debug_osd level 20 if we were doing 1 at a time or 2 at a time.

So through your test it is obvious that it is the message dispalyed that is incorrect as the value we set on the fly is indeed being used to control the number of active recoveries.

So we should fix the bad message.

Thanks for the investigation
JC

Comment 7 Brad Hubbard 2016-11-04 23:39:46 UTC

(In reply to Jean-Charles Lopez from comment #6)
> Hi Brad,
> 
> your last test actually checks what I was trying to test and the reason I
> joined the log file.
> 
> I started the OSD with a config file value of 2 and immediately as the OSD
> was starting I set a value of 1 through the admin socket to verify with a
> debug_osd level 20 if we were doing 1 at a time or 2 at a time.
> 
> So through your test it is obvious that it is the message dispalyed that is
> incorrect as the value we set on the fly is indeed being used to control the
> number of active recoveries.

That is my understanding, yes.

> 
> So we should fix the bad message.

So maybe it needs to be something like...

"changing this value at runtime *may* have no effect as it is not being observed by any subsystem" ?

> 
> Thanks for the investigation
> JC

Comment 8 Brad Hubbard 2017-01-04 22:43:34 UTC

Moving this to 2.3 as I need to get a patch in upstream (will do today).

Comment 9 Brad Hubbard 2017-01-10 05:40:23 UTC

ttps://github.com/ceph/ceph/pull/12855

Comment 10 Josh Durgin 2017-04-01 01:28:38 UTC

The config value is still changeable.

The 'unchangeable' message displayed is updated for 3.0, no need for a bz tracking it at this point so closing.