Bug 1593110
Summary: | Ceph mgr daemon crashing after starting balancer module in automatic mode | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | liuwei <wliu> |
Component: | RADOS | Assignee: | Brad Hubbard <bhubbard> |
Status: | CLOSED ERRATA | QA Contact: | Manohar Murthy <mmurthy> |
Severity: | medium | Docs Contact: | John Brier <jbrier> |
Priority: | low | ||
Version: | 3.0 | CC: | agunn, anharris, bengland, bhubbard, ceph-eng-bugs, dzafman, jbrier, kchai, mmurthy, pasik, tchandra, tserlin |
Target Milestone: | z2 | ||
Target Release: | 3.2 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHEL: ceph-12.2.8-113.el7cp Ubuntu: ceph_12.2.8-96redhat1xenial | Doc Type: | Bug Fix |
Doc Text: |
.The `ceph-mgr` daemon no longer crashes after starting balancer module in automatic mode
Previously, due to a CRUSH bug, invalid mappings were created. When an invalid mapping was encountered in the `_apply_upmap` function, the code caused a segmentation fault. With this release, the code has been updated to check that the values are within an expected range. If not, the invalid values are ignored.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2019-04-30 15:56:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1629656 |
Description
liuwei
2018-06-20 05:30:02 UTC
Using a binary identical environment... # gdb -q /usr/bin/ceph-osd Reading symbols from /usr/bin/ceph-osd...Reading symbols from /usr/lib/debug/usr/bin/ceph-osd.debug...done. done. (gdb) p &OSDMap::_apply_upmap $1 = (void (OSDMap::*)(const OSDMap * const, const pg_pool_t &, pg_t, std::vector<int, std::allocator<int> > *)) 0xb36330 <OSDMap::_apply_upmap(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >*) const> So that's the address of our function, 0xb36330. Now we can find the exact point where we crashed by adding 0x17d (the offset into the function given in frame 3 of the backtrace above) to 0xb36330. (gdb) p/x 0xb36330+0x17d $2 = 0xb364ad Also note that the decimal of 0x17d is 381. (gdb) p/d 0x17d $4 = 381 This can also be used to find the correct instruction by the offset (+381). The following command disassembles the function containing the address 0xb364ad and interleaves the source code. (gdb) disass /m 0xb364ad ... 1979 pos < 0 && 0x0000000000b36492 <+354>: cmp %edx,%edi 0x0000000000b36494 <+356>: jne 0xb36480 <OSDMap::_apply_upmap(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >*) const+336> 0x0000000000b3649c <+364>: cmp $0x7fffffff,%esi 0x0000000000b364a2 <+370>: je 0xb364b5 <OSDMap::_apply_upmap(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >*) const+389> 1980 !(r.second != CRUSH_ITEM_NONE && r.second < max_osd && 0x0000000000b364a4 <+372>: cmp %esi,0x38(%rbx) 0x0000000000b364a7 <+375>: jle 0xb364b5 <OSDMap::_apply_upmap(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >*) const+389> 1981 osd_weight[r.second] == 0)) { 0x0000000000b36467 <+311>: movslq %esi,%rax 0x0000000000b3646a <+314>: mov (%r10),%edi 0x0000000000b364a9 <+377>: mov 0x78(%rbx),%rdx 0x0000000000b364ad <+381>: mov (%rdx,%r12,1),%edx <---- HERE 0x0000000000b364b1 <+385>: test %edx,%edx 0x0000000000b364b3 <+387>: je 0xb36480 <OSDMap::_apply_upmap(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >*) const+336> (gdb) l 1981 1976 } 1977 // ignore mapping if target is marked out (or invalid osd id) 1978 if (osd == r.first && 1979 pos < 0 && 1980 !(r.second != CRUSH_ITEM_NONE && r.second < max_osd && 1981 osd_weight[r.second] == 0)) { <---- HERE 1982 pos = i; 1983 } 1984 } 1985 if (!exists && pos >= 0) { So to me this looks like we are indexing outside the bounds of the osd_weight array so far that we actually tried to access some memory that caused a segfault which looks like the -823648512 value is the culprit to me. If we had a coredump we could verify that. The naive solution might be to check if r.second < 0 and ignore it if it is but I want to look further into why/how this came about and the best way to solve it going forward. That will involve tracing where that -823648512 value is coming from. If we can get a coredump that might help, especially if the customer can recreate this easily. I'll continue with this tomorrow morning. I have a solution for the segfault going into master (details in upstream tracker). I'll create a separate bug for the python balancer code sending negative values to the mgr but, with this fix in place, those values will be ignored. Hi Brad, This would be really, really important to get working in RHCS 3 for large clusters, I'm having a similar problem with it. This was RHOSP (OpenStack) 13 GA, which is a LTS and therefore widely used. If you get the fix into RHCS 3 it should then make its way into RHOSP 13 via the Ceph container images. What I saw is that I would enable the balancer module, try to run it and it would no longer be enabled. [root@overcloud-controller-2 ~]# ceph mgr module enable balancer [root@overcloud-controller-2 ~]# ceph balancer eval Error EINVAL: No handler found for 'balancer eval' [root@overcloud-controller-2 ~]# ceph mgr module ls { "enabled_modules": [], "disabled_modules": [ "balancer", "dashboard", "influx", "localpool", "prometheus", "restful", "selftest", "status", "zabbix" ] } [root@overcloud-controller-2 ~]# rpm -qa | grep ceph ... ceph-common-12.2.4-10.el7cp.x86_64 For a large cluster, the regular PG distribution across OSDs can lead to very inefficient operation, where a couple of OSDs are running with 20-30% more load and slowing down the entire cluster just because they have more PGs than everyone else. To some extent this can be ameliorated by "ceph osd reweight-by-utilization", but I was looking forward to having this tool to deal with it, particularly in upmap mode. (In reply to Ben England from comment #8) > Hi Brad, > > This would be really, really important to get working in RHCS 3 for large > clusters, I'm having a similar problem with it. This was RHOSP (OpenStack) > 13 GA, which is a LTS and therefore widely used. If you get the fix into > RHCS 3 it should then make its way into RHOSP 13 via the Ceph container > images. > > What I saw is that I would enable the balancer module, try to run it and it > would no longer be enabled. > > [root@overcloud-controller-2 ~]# ceph mgr module enable balancer > [root@overcloud-controller-2 ~]# ceph balancer eval > Error EINVAL: No handler found for 'balancer eval' > [root@overcloud-controller-2 ~]# ceph mgr module ls > { > "enabled_modules": [], > "disabled_modules": [ > "balancer", > "dashboard", > "influx", > "localpool", > "prometheus", > "restful", > "selftest", > "status", > "zabbix" > ] > } > > [root@overcloud-controller-2 ~]# rpm -qa | grep ceph > ... > ceph-common-12.2.4-10.el7cp.x86_64 > > For a large cluster, the regular PG distribution across OSDs can lead to > very inefficient operation, where a couple of OSDs are running with 20-30% > more load and slowing down the entire cluster just because they have more > PGs than everyone else. To some extent this can be ameliorated by "ceph osd > reweight-by-utilization", but I was looking forward to having this tool to > deal with it, particularly in upmap mode. Hi Ben, https://bugzilla.redhat.com/show_bug.cgi?id=1612623 is the actual issue, this segfault won't occur if that is resolved. Perhaps an adjustment of priority/severity of that bug is in order? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0911 |