Hide Forgot
Created attachment 1216280 [details] OSD 0 log file after changing osd_recovery_max_active to 1 immediately after the OSD started Description of problem: While trying to use "ceph osd daemon osd.x config set" or "ceph tell osd.x injectargs" to modify dynamically the osd_recovery_max_active parameter for one or more OSDs I get an output message telling me this parameter is unchangeable. Version-Release number of selected component (if applicable): 2.0 How reproducible: Easy 100% of the times Steps to Reproduce: 1. ceph daemon osd.0 config set osd_recovery_max_active 5 2. 3. Actual results: "success": "osd_recovery_max_active = '5' (unchangeable) " Expected results: "success": "osd_recovery_max_active = '5'" Additional info: Not sure if this parameter has become static only though and couldn't find anything in the mailing lists. I have made a test and collected a lot to verify if the change was actually made because if I run a get command, it shows me the modified value ceph daemon osd.0 config get osd_recovery_max_active { "osd_recovery_max_active": "5" } I have attached a log after setting the value to 1 for a given OSD and enabling debug_osd=20 after the OSD started (I set noout and stopped the OSD and loaded some data in 3 PGs for which this OSD was primary before I stopped it. Befroe restarting the OSD I modified the config file to include debug_osd=20 osd_recovery_max_active=2
Here's the history behind this I believe. http://tracker.ceph.com/issues/11692 https://github.com/ceph/ceph/pull/7085 https://github.com/ceph/ceph/commit/d93d92a7f48cea65edfdf89e3d6cff64a95d4070 So it is a warning that nothing is observing this value so it is "unchangeable" in the sense that any change will not take effect without a restart (can not be altered with any effect at runtime). The key piece of code is here. 613 void md_config_t::_apply_changes(std::ostream *oss) 614 { 615 /* Maps observers to the configuration options that they care about which 616 * have changed. */ 617 typedef std::map < md_config_obs_t*, std::set <std::string> > rev_obs_map_t; 618 619 expand_all_meta(); 620 621 // create the reverse observer mapping, mapping observers to the set of 622 // changed keys that they'll get. 623 rev_obs_map_t robs; 624 std::set <std::string> empty_set; 625 char buf[128]; 626 char *bufptr = (char*)buf; 627 for (changed_set_t::const_iterator c = changed.begin(); 628 c != changed.end(); ++c) { 629 const std::string &key(*c); 630 pair < obs_map_t::iterator, obs_map_t::iterator > 631 range(observers.equal_range(key)); 632 if ((oss) && 633 »·······(!_get_val(key.c_str(), &bufptr, sizeof(buf))) && 634 »·······!_internal_field(key)) { 635 (*oss) << key << " = '" << buf << "' "; 636 if (range.first == range.second) { 637 »·······(*oss) << "(unchangeable) "; 638 } 639 } 640 for (obs_map_t::iterator r = range.first; r != range.second; ++r) { 641 rev_obs_map_t::value_type robs_val(r->second, empty_set); 642 pair < rev_obs_map_t::iterator, bool > robs_ret(robs.insert(robs_val)); 643 std::set <std::string> &keys(robs_ret.first->second); 644 keys.insert(key); 645 } 646 } HTH
Perhaps there is a more suitable text than "unchangeable"? "changing this value at runtime will have no effect as it is not being observed by any subsystem" ? Just a thought.
Hi Brad, I got your comment and understand what's beneath although the Tracker you are quoting doesn't explicitely name this particular parameter. Our major concern here is that this parameter was heavily used by the PS team when on site in order to dynamically influence the recovery traffic in version prior to RHCS 2.0. According to what I am reading from your comments is that this will no longer doable and only a restart of the OSD(s) will allow us to do that which will turn to be a real pain in the field I guess. Thanks for giving it a look. JC (In reply to Brad Hubbard from comment #3) > Perhaps there is a more suitable text than "unchangeable"? > > "changing this value at runtime will have no effect as it is not being > observed by any subsystem" ? > > Just a thought.
(In reply to Jean-Charles Lopez from comment #4) > Hi Brad, > > I got your comment and understand what's beneath although the Tracker you > are quoting doesn't explicitely name this particular parameter. Right, as you correctly worked out, my comment was more of an explanation of the "unchangeable" notation in the output. i probably should have made that clearer. > > Our major concern here is that this parameter was heavily used by the PS > team when on site in order to dynamically influence the recovery traffic in > version prior to RHCS 2.0. > > According to what I am reading from your comments is that this will no > longer doable and only a restart of the OSD(s) will allow us to do that > which will turn to be a real pain in the field I guess. I searched for the "rops)" output from below. void OSDService::start_recovery_op(PG *pg, const hobject_t& soid) { Mutex::Locker l(recovery_lock); dout(10) << "start_recovery_op " << *pg << " " << soid »······· << " (" << recovery_ops_active << "/" »······· << cct->_conf->osd_recovery_max_active << " rops)" »······· << dendl; recovery_ops_active++; #ifdef DEBUG_RECOVERY_OIDS dout(20) << " active was " << recovery_oids[pg->info.pgid] << dendl; assert(recovery_oids[pg->info.pgid].count(soid) == 0); recovery_oids[pg->info.pgid].insert(soid); #endif } void OSDService::finish_recovery_op(PG *pg, const hobject_t& soid, bool dequeue) { Mutex::Locker l(recovery_lock); dout(10) << "finish_recovery_op " << *pg << " " << soid »······· << " dequeue=" << dequeue »······· << " (" << recovery_ops_active << "/" << cct->_conf->osd_recovery_max_active << " rops)" »······· << dendl; It is always "1" in the log you provided. It looks like the mechanism to observe changes is there in the OSD class but osd_recovery_max_active does not seem to be being tracked in OSD::handle_conf_change (osd_min_recovery_priority is, for example) however we use it here. bool OSDService::_recover_now(uint64_t *available_pushes) { uint64_t max = cct->_conf->osd_recovery_max_active; if (max <= recovery_ops_active + recovery_ops_reserved) { dout(15) << "_recover_now active " << recovery_ops_active »······· << " + reserved " << recovery_ops_reserved »······· << " >= max " << max << dendl; if (available_pushes) *available_pushes = 0; return false; } Which looks to me like we are using it dynamically and that each time this function is called it will use the current value of cct->_conf->osd_recovery_max_active which should be updated by setting it via the admin socket. I added this patch to test that theory. diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 265c4ff..a9e3a7c 100644 --- a/src/osd/OSD.cc +++ b/src/osd/OSD.cc @@ -8402,6 +8402,7 @@ void OSDService::_maybe_queue_recovery() { bool OSDService::_recover_now(uint64_t *available_pushes) { uint64_t max = cct->_conf->osd_recovery_max_active; + dout(15) << "_recover_now " << "max = " << max << dendl; if (max <= recovery_ops_active + recovery_ops_reserved) { dout(15) << "_recover_now active " << recovery_ops_active << " + reserved " << recovery_ops_reserved Trigger recovery with default settings and grep for "_recover_now". out/osd.2.log:2016-11-04 12:00:21.733586 7f7a46dad700 15 osd.2 14 _recover_now max = 3 $ bin/ceph daemon osd.2 config set osd_recovery_max_active 1 Trigger recovery again and grep for "_recover_now". out/osd.2.log:2016-11-04 12:05:15.097952 7f7a435a6700 15 osd.2 19 _recover_now max = 1 So it changes without a restart in the function where it is used so the "unchangeable" output seems inaccurate in this case. So I believe you *can* change osd_recovery_max_active dynamically (and see it's effects) despite the "unchangeable" tag. > > Thanks for giving it a look. > JC >
Hi Brad, your last test actually checks what I was trying to test and the reason I joined the log file. I started the OSD with a config file value of 2 and immediately as the OSD was starting I set a value of 1 through the admin socket to verify with a debug_osd level 20 if we were doing 1 at a time or 2 at a time. So through your test it is obvious that it is the message dispalyed that is incorrect as the value we set on the fly is indeed being used to control the number of active recoveries. So we should fix the bad message. Thanks for the investigation JC
(In reply to Jean-Charles Lopez from comment #6) > Hi Brad, > > your last test actually checks what I was trying to test and the reason I > joined the log file. > > I started the OSD with a config file value of 2 and immediately as the OSD > was starting I set a value of 1 through the admin socket to verify with a > debug_osd level 20 if we were doing 1 at a time or 2 at a time. > > So through your test it is obvious that it is the message dispalyed that is > incorrect as the value we set on the fly is indeed being used to control the > number of active recoveries. That is my understanding, yes. > > So we should fix the bad message. So maybe it needs to be something like... "changing this value at runtime *may* have no effect as it is not being observed by any subsystem" ? > > Thanks for the investigation > JC
Moving this to 2.3 as I need to get a patch in upstream (will do today).
ttps://github.com/ceph/ceph/pull/12855
The config value is still changeable. The 'unchangeable' message displayed is updated for 3.0, no need for a bz tracking it at this point so closing.