Bug 1390779 - Apparently unable to change dynamically osd_recovery_max_active parameter
Summary: Apparently unable to change dynamically osd_recovery_max_active parameter
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 2.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 2.3
Assignee: Brad Hubbard
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-02 00:11 UTC by Jean-Charles Lopez
Modified: 2020-07-16 08:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-01 01:28:38 UTC
Target Upstream Version:


Attachments (Terms of Use)
OSD 0 log file after changing osd_recovery_max_active to 1 immediately after the OSD started (2.62 MB, application/x-gzip)
2016-11-02 00:11 UTC, Jean-Charles Lopez
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 18424 0 None None None 2017-01-05 02:08:07 UTC

Description Jean-Charles Lopez 2016-11-02 00:11:05 UTC
Created attachment 1216280 [details]
OSD 0 log file after changing osd_recovery_max_active to 1 immediately after the OSD started

Description of problem:
While trying to use "ceph osd daemon osd.x config set" or "ceph tell osd.x injectargs" to modify dynamically the osd_recovery_max_active parameter for one or more OSDs I get an output message telling me this parameter is unchangeable.


Version-Release number of selected component (if applicable):
2.0

How reproducible:
Easy 100% of the times

Steps to Reproduce:
1. ceph daemon osd.0 config set osd_recovery_max_active 5
2. 
3.

Actual results:
"success": "osd_recovery_max_active = '5' (unchangeable) "

Expected results:
"success": "osd_recovery_max_active = '5'"

Additional info:
Not sure if this parameter has become static only though and couldn't find anything in the mailing lists.

I have made a test and collected a lot to verify if the change was actually made because if I run a get command, it shows me the modified value
ceph daemon osd.0 config get osd_recovery_max_active
{
    "osd_recovery_max_active": "5"
}

I have attached a log after setting the value to 1 for a given OSD and enabling debug_osd=20 after the OSD started (I set noout and stopped the OSD and loaded some data in 3 PGs for which this OSD was primary before I stopped it.

Befroe restarting the OSD I modified the config file to include debug_osd=20 osd_recovery_max_active=2

Comment 2 Brad Hubbard 2016-11-02 22:14:07 UTC
Here's the history behind this I believe.

http://tracker.ceph.com/issues/11692
https://github.com/ceph/ceph/pull/7085
https://github.com/ceph/ceph/commit/d93d92a7f48cea65edfdf89e3d6cff64a95d4070

So it is a warning that nothing is observing this value so it is "unchangeable" in the sense that any change will not take effect without a restart (can not be altered with any effect at runtime).

The key piece of code is here.

 613 void md_config_t::_apply_changes(std::ostream *oss)
 614 {
 615   /* Maps observers to the configuration options that they care about which
 616    * have changed. */
 617   typedef std::map < md_config_obs_t*, std::set <std::string> > rev_obs_map_t;
 618 
 619   expand_all_meta();
 620 
 621   // create the reverse observer mapping, mapping observers to the set of
 622   // changed keys that they'll get.
 623   rev_obs_map_t robs;
 624   std::set <std::string> empty_set;
 625   char buf[128];
 626   char *bufptr = (char*)buf;
 627   for (changed_set_t::const_iterator c = changed.begin();
 628        c != changed.end(); ++c) {
 629     const std::string &key(*c);
 630     pair < obs_map_t::iterator, obs_map_t::iterator >
 631       range(observers.equal_range(key));
 632     if ((oss) &&
 633 »·······(!_get_val(key.c_str(), &bufptr, sizeof(buf))) &&
 634 »·······!_internal_field(key)) {
 635       (*oss) << key << " = '" << buf << "' ";
 636       if (range.first == range.second) {
 637 »·······(*oss) << "(unchangeable) ";
 638       }
 639     }
 640     for (obs_map_t::iterator r = range.first; r != range.second; ++r) {
 641       rev_obs_map_t::value_type robs_val(r->second, empty_set);
 642       pair < rev_obs_map_t::iterator, bool > robs_ret(robs.insert(robs_val));
 643       std::set <std::string> &keys(robs_ret.first->second);
 644       keys.insert(key);
 645     }
 646   }

HTH

Comment 3 Brad Hubbard 2016-11-02 22:29:05 UTC
Perhaps there is a more suitable text than "unchangeable"?

"changing this value at runtime will have no effect as it is not being observed by any subsystem" ?

Just a thought.

Comment 4 Jean-Charles Lopez 2016-11-03 14:56:03 UTC
Hi Brad,

I got your comment and understand what's beneath although the Tracker you are quoting doesn't explicitely name this particular parameter.

Our major concern here is that this parameter was heavily used by the PS team when on site in order to dynamically influence the recovery traffic in version prior to RHCS 2.0.

According to what I am reading from your comments is that this will no longer doable and only a restart of the OSD(s) will allow us to do that which will turn to be a real pain in the field I guess. 

Thanks for giving it a look.
JC


(In reply to Brad Hubbard from comment #3)
> Perhaps there is a more suitable text than "unchangeable"?
> 
> "changing this value at runtime will have no effect as it is not being
> observed by any subsystem" ?
> 
> Just a thought.

Comment 5 Brad Hubbard 2016-11-04 02:13:19 UTC
(In reply to Jean-Charles Lopez from comment #4)
> Hi Brad,
> 
> I got your comment and understand what's beneath although the Tracker you
> are quoting doesn't explicitely name this particular parameter.

Right, as you correctly worked out, my comment was more of an explanation of the "unchangeable" notation in the output. i probably should have made that clearer.

> 
> Our major concern here is that this parameter was heavily used by the PS
> team when on site in order to dynamically influence the recovery traffic in
> version prior to RHCS 2.0.
> 
> According to what I am reading from your comments is that this will no
> longer doable and only a restart of the OSD(s) will allow us to do that
> which will turn to be a real pain in the field I guess. 

I searched for the "rops)" output from below.

void OSDService::start_recovery_op(PG *pg, const hobject_t& soid)
{
  Mutex::Locker l(recovery_lock);
  dout(10) << "start_recovery_op " << *pg << " " << soid 
»·······   << " (" << recovery_ops_active << "/"
»·······   << cct->_conf->osd_recovery_max_active << " rops)"
»·······   << dendl;
  recovery_ops_active++;

#ifdef DEBUG_RECOVERY_OIDS
  dout(20) << "  active was " << recovery_oids[pg->info.pgid] << dendl;
  assert(recovery_oids[pg->info.pgid].count(soid) == 0);
  recovery_oids[pg->info.pgid].insert(soid);
#endif
}

void OSDService::finish_recovery_op(PG *pg, const hobject_t& soid, bool dequeue) 
{
  Mutex::Locker l(recovery_lock);
  dout(10) << "finish_recovery_op " << *pg << " " << soid 
»·······   << " dequeue=" << dequeue
»·······   << " (" << recovery_ops_active << "/" << cct->_conf->osd_recovery_max_active << " rops)"
»·······   << dendl;

It is always "1" in the log you provided.

It looks like the mechanism to observe changes is there in the OSD class but osd_recovery_max_active does not seem to be being tracked in OSD::handle_conf_change (osd_min_recovery_priority is, for example) however we use it here.

bool OSDService::_recover_now(uint64_t *available_pushes)
{
  uint64_t max = cct->_conf->osd_recovery_max_active;
  if (max <= recovery_ops_active + recovery_ops_reserved) {
    dout(15) << "_recover_now active " << recovery_ops_active
»·······     << " + reserved " << recovery_ops_reserved
»·······     << " >= max " << max << dendl;
    if (available_pushes)
      *available_pushes = 0; 
    return false;
  }

Which looks to me like we are using it dynamically and that each time this function is called it will use the current value of cct->_conf->osd_recovery_max_active which should be updated by setting it via the admin socket. I added this patch to test that theory.

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 265c4ff..a9e3a7c 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -8402,6 +8402,7 @@ void OSDService::_maybe_queue_recovery() {
 bool OSDService::_recover_now(uint64_t *available_pushes)
 {
   uint64_t max = cct->_conf->osd_recovery_max_active;
+  dout(15) << "_recover_now " << "max = " << max << dendl;
   if (max <= recovery_ops_active + recovery_ops_reserved) {
     dout(15) << "_recover_now active " << recovery_ops_active
             << " + reserved " << recovery_ops_reserved

Trigger recovery with default settings and grep for "_recover_now".

out/osd.2.log:2016-11-04 12:00:21.733586 7f7a46dad700 15 osd.2 14 _recover_now max = 3

$ bin/ceph daemon osd.2 config set osd_recovery_max_active 1

Trigger recovery again and grep for "_recover_now".

out/osd.2.log:2016-11-04 12:05:15.097952 7f7a435a6700 15 osd.2 19 _recover_now max = 1

So it changes without a restart in the function where it is used so the "unchangeable" output seems inaccurate in this case.

So I believe you *can* change osd_recovery_max_active dynamically (and see it's effects) despite the "unchangeable" tag.
 
> 
> Thanks for giving it a look.
> JC
>

Comment 6 Jean-Charles Lopez 2016-11-04 14:34:45 UTC
Hi Brad,

your last test actually checks what I was trying to test and the reason I joined the log file.

I started the OSD with a config file value of 2 and immediately as the OSD was starting I set a value of 1 through the admin socket to verify with a debug_osd level 20 if we were doing 1 at a time or 2 at a time.

So through your test it is obvious that it is the message dispalyed that is incorrect as the value we set on the fly is indeed being used to control the number of active recoveries.

So we should fix the bad message.

Thanks for the investigation
JC

Comment 7 Brad Hubbard 2016-11-04 23:39:46 UTC
(In reply to Jean-Charles Lopez from comment #6)
> Hi Brad,
> 
> your last test actually checks what I was trying to test and the reason I
> joined the log file.
> 
> I started the OSD with a config file value of 2 and immediately as the OSD
> was starting I set a value of 1 through the admin socket to verify with a
> debug_osd level 20 if we were doing 1 at a time or 2 at a time.
> 
> So through your test it is obvious that it is the message dispalyed that is
> incorrect as the value we set on the fly is indeed being used to control the
> number of active recoveries.

That is my understanding, yes.

> 
> So we should fix the bad message.

So maybe it needs to be something like...

"changing this value at runtime *may* have no effect as it is not being observed by any subsystem" ?

> 
> Thanks for the investigation
> JC

Comment 8 Brad Hubbard 2017-01-04 22:43:34 UTC
Moving this to 2.3 as I need to get a patch in upstream (will do today).

Comment 9 Brad Hubbard 2017-01-10 05:40:23 UTC
ttps://github.com/ceph/ceph/pull/12855

Comment 10 Josh Durgin 2017-04-01 01:28:38 UTC
The config value is still changeable.

The 'unchangeable' message displayed is updated for 3.0, no need for a bz tracking it at this point so closing.


Note You need to log in before you can comment on or make changes to this bug.