1219040 – [MON]: Monitor crash after doing so many crush map edits

Bug 1219040 - [MON]: Monitor crash after doing so many crush map edits

Summary: [MON]: Monitor crash after doing so many crush map edits

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	1.3.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	1.3.1
Assignee:	Kefu Chai
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-05-06 13:10 UTC by shylesh
Modified:	2022-07-09 07:47 UTC (History)
CC List:	7 users (show)
Fixed In Version:	ceph-0.94.2-3.el7cp
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-11-23 20:20:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
All log:- osd, mon (12.30 MB, application/x-gzip) 2015-05-06 13:10 UTC, shylesh	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	11569	None	None	None	Never
Ceph Project Bug Tracker	11910	None	None	None	Never
Red Hat Issue Tracker	RHCEPH-4698	None	None	None	2022-07-09 07:47:39 UTC
Red Hat Product Errata	RHSA-2015:2066	normal	SHIPPED_LIVE	Moderate: Red Hat Ceph Storage 1.3.1 security, bug fix, and enhancement update	2015-11-24 02:34:55 UTC
Red Hat Product Errata	RHSA-2015:2512	normal	SHIPPED_LIVE	Moderate: Red Hat Ceph Storage 1.3.1 security, bug fix, and enhancement update	2016-02-03 03:15:52 UTC

Description shylesh 2015-05-06 13:10:34 UTC

Created attachment 1022666 [details]
All log:- osd, mon

Description of problem:
A cluster with 3 monitors were created, all the monitors got crashed after doing multiple crush map edits 

Version-Release number of selected component (if applicable):
[root@hp-ms-01-c05 home]# rpm -qa| grep ceph
ceph-common-0.94.1-5.el7cp.x86_64
ceph-mon-0.94.1-5.el7cp.x86_64
ceph-0.94.1-5.el7cp.x86_64
ceph-osd-0.94.1-5.el7cp.x86_64


How reproducible:
Tried only once

Steps to Reproduce:
1.created a cluster with 3 monitors and 5 osds
2.started doing rados put operations
3.to induce misplaced pgs , I edited crush map by changing some values in the rules part of the crush map
4. After doing so many edits and bringing osds down , out and in suddenly monitor became unresponsive 

Actual results:
All the monitors got crashed

Expected results:


Additional info:

Backtrace
=========
-- begin dump of recent events ---
     0> 2015-05-06 07:28:28.359165 7fad29cf5700 -1 *** Caught signal (Aborted) **
 in thread 7fad29cf5700

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: /usr/bin/ceph-mon() [0x9017e2]
 2: (()+0xf130) [0x7fad3005b130]
 3: (gsignal()+0x37) [0x7fad2ea755d7]
 4: (abort()+0x148) [0x7fad2ea76cc8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fad2f3799b5]
 6: (()+0x5e926) [0x7fad2f377926]
 7: (()+0x5e953) [0x7fad2f377953]
 8: (()+0x5eb73) [0x7fad2f377b73]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0x7b361a]
 10: (PGMap::get_filtered_pg_stats(std::string&, long, long, bool, std::set<pg_t, std::less<pg_t>, std::allocator<pg_t> >&)+0x1d3) [0x885ea3]
 11: (PGMonitor::preprocess_command(MMonCommand*)+0x1ccd) [0x66398d]
 12: (PGMonitor::preprocess_query(PaxosServiceMessage*)+0x27f) [0x66584f]
 13: (PaxosService::dispatch(PaxosServiceMessage*)+0x833) [0x5cacd3]
 14: (Monitor::handle_command(MMonCommand*)+0x1549) [0x591b19]
 15: (Monitor::dispatch(MonSession*, Message*, bool)+0xf9) [0x594c89]
 16: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x595936]
 17: (Monitor::ms_dispatch(Message*)+0x23) [0x5b5403]
 18: (DispatchQueue::entry()+0x64a) [0x8a1d9a]
 19: (DispatchQueue::DispatchThread::entry()+0xd) [0x79bd9d]
 20: (()+0x7df5) [0x7fad30053df5]
 21: (clone()+0x6d) [0x7fad2eb361ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


No core genereated since ulimit was set to 0 by default
 

cluster status
========
[root@hp-ms-01-c02 ~]# ceph -s
    cluster f75f054f-b849-4832-9b63-a71cec24bdc6
     health HEALTH_WARN
            8 pgs degraded
            5 pgs stuck degraded
            106 pgs stuck unclean
            5 pgs stuck undersized
            8 pgs undersized
            recovery 1/60 objects degraded (1.667%)
            recovery 2/60 objects misplaced (3.333%)
            too many PGs per OSD (404 > max 300)
     monmap e1: 3 mons at {mon1=10.12.27.2:6789/0,mon2=10.12.27.3:6789/0,mon3=10.12.27.5:6789/0}
            election epoch 8, quorum 0,1,2 mon1,mon2,mon3
     osdmap e160: 6 osds: 5 up, 4 in; 106 remapped pgs
      pgmap v1482: 576 pgs, 2 pools, 18400 bytes data, 20 objects
            40169 MB used, 360 GB / 399 GB avail
            1/60 objects degraded (1.667%)
            2/60 objects misplaced (3.333%)
                 470 active+clean
                  98 active+remapped
                   5 active+undersized+degraded+remapped
                   3 active+undersized+degraded


Edits performed on crush map
===========================

rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

above rule was modified to 

ule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 1 type host
        step emit
}

above change in crush map triggered some of the misplaced pgs , then brought down one of the osd which makes cluster degraded + misplaced pgs to appear.

This operation repeated couple of times then saw the crash

Attaching all the logs with the bug



crush map
==========
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host osd1 {
        id -2           # do not change unnecessarily
        # weight 1.100
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 1.000
        item osd.5 weight 0.100
}
host osd2 {
        id -4           # do not change unnecessarily
        # weight 0.100
        alg straw
        hash 0  # rjenkins1
        item osd.1 weight 0.100
}
host osd3 {
        id -3           # do not change unnecessarily
        # weight 1.000
        alg straw
        hash 0  # rjenkins1
        item osd.2 weight 1.000
}
host osd4 {
        id -5           # do not change unnecessarily
        # weight 0.100
        alg straw
        hash 0  # rjenkins1
        item osd.3 weight 0.100
}
host osd5 {
        id -6           # do not change unnecessarily
        # weight 0.100
        alg straw
        hash 0  # rjenkins1
        item osd.4 weight 0.100
}
root default {
        id -1           # do not change unnecessarily
        # weight 2.400
        alg straw
        hash 0  # rjenkins1
        item osd1 weight 1.100
        item osd2 weight 0.100
        item osd3 weight 1.000
        item osd4 weight 0.100
        item osd5 weight 0.100
}

# rules
rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map

Comment 2 Ken Dreyer (Red Hat) 2015-05-07 16:11:00 UTC

Greg would you mind taking a look at this one (or re-assigning as appropriate?)

Comment 3 Greg Farnum 2015-05-07 18:43:41 UTC

This crash can only have been caused by some tool issuing a request to "pg ls" (with specific states!) on the monitor. Was this done explicitly, or perhaps as part of Calamari? It looks like this is in some new features added to support Calamari functionality and I have no idea how often it's used.

I have crated an upstream bug to fix what I think is the actual issue: http://tracker.ceph.com/issues/11569

In other news, the crush change you made is fairly nonsensical, since it's reducing every PG's mapping to size one (although the OSD will maintain the previous mappings in pg_temp in order to keep the sizes requested as replicated). Not sure if that's deliberate or not.

Assigning this to Kefu to dig into farther since we're trying to bring him up on the monitors.

Comment 4 shylesh 2015-05-08 06:08:00 UTC

(In reply to Greg Farnum from comment #3)
> This crash can only have been caused by some tool issuing a request to "pg
> ls" (with specific states!) on the monitor. Was this done explicitly, or
> perhaps as part of Calamari? It looks like this is in some new features

This opeartion was done explicitly i.e
1. get the crushmap
2. edit the map
3. put it back to the cluster
4. check ceph -s and ceph pg dump

 I am not using calamari.

> added to support Calamari functionality and I have no idea how often it's
> used.
> 
> I have crated an upstream bug to fix what I think is the actual issue:
> http://tracker.ceph.com/issues/11569
> 
> In other news, the crush change you made is fairly nonsensical, since it's
> reducing every PG's mapping to size one (although the OSD will maintain the
> previous mappings in pg_temp in order to keep the sizes requested as
> replicated). Not sure if that's deliberate or not.

This was done deliberately to create misplaced PGs which is necessary condition for the test I am doing.

> 
> Assigning this to Kefu to dig into farther since we're trying to bring him
> up on the monitors.

Comment 5 Ken Dreyer (Red Hat) 2015-05-12 19:54:32 UTC

Looks like the code is still undergoing review into master upstream (https://github.com/ceph/ceph/pull/4643)

From what I understand, this crash is pretty rare, right? Based on that assumption I'm going to un-target this bugfix from the 1.3.0 release

Comment 6 Kefu Chai 2015-05-18 04:58:17 UTC

> From what I understand, this crash is pretty rare, right? 

ken, as long as user does not send "pg ls* recovery" to ceph cli. we are good.

> Based on that assumption I'm going to un-target this bugfix from the 1.3.0 release

thank you!

Comment 7 Kefu Chai 2015-06-16 02:51:16 UTC

pending on backport: http://tracker.ceph.com/issues/11910

Comment 8 Ken Dreyer (Red Hat) 2015-07-16 01:29:17 UTC

Thanks Kefu! We should be able to pull those patches in downstream in time for 1.3.1.

Comment 9 Ken Dreyer (Red Hat) 2015-07-30 20:11:44 UTC

Patches to pull downstream: https://github.com/ceph/ceph/pull/5160/commits

Comment 12 shylesh 2015-09-28 17:29:07 UTC

Verified on rpm -qa| grep ceph
ceph-mon-0.94.3-1.el7cp.x86_64
ceph-common-0.94.3-1.el7cp.x86_64
ceph-0.94.3-1.el7cp.x86_64


Even after many crush map edits I dont see any mon crash , hence marking this as verified.

Comment 14 errata-xmlrpc 2015-11-23 20:20:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2512

Comment 15 Siddharth Sharma 2015-11-23 21:53:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2066

Note You need to log in before you can comment on or make changes to this bug.