Created attachment 1038776 [details] Mon log Description of problem: Seeing Monitor Crash while creating erasure coded pool with wrong parameters Version-Release number of selected component (if applicable): ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) How reproducible: 1/1 Steps to Reproduce: 1. Create a ec profile, with the below command. ceph osd erasure-code-profile set myprofile plugin=lrc mapping=__DD__DD layers='[[ "_cDD_cDD", "" ],[ "cDDD____", "" ],[ "____cDDD", "" ],]' ruleset-steps='[ [ "choose", "datacenter", 3 ], [ "chooseleaf", "osd", 0] ]' 2. ceph osd pool create ecpool 12 12 erasure myprofile Actual results: Monitoring is crashing. BT: ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: /usr/bin/ceph-mon() [0x901e52] 2: (()+0xf130) [0x7f205a9ca130] 3: (crush_do_rule()+0x291) [0x833501] 4: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >*, int*, unsigned int*) const+0xff) [0x77b76f] 5: (OSDMap::_pg_to_up_acting_osds(pg_t const&, std::vector<int, std::allocator<int> >*, int*, std::vector<int, std::allocator<int> >*, int*) const+0x104) [0x77be24] 6: (PGMonitor::map_pg_creates()+0x268) [0x65b748] 7: (PGMonitor::post_paxos_update()+0x25) [0x65bf35] 8: (Monitor::refresh_from_paxos(bool*)+0x221) [0x575721] 9: (Monitor::init_paxos()+0x95) [0x575ac5] 10: (Monitor::preinit()+0x7f1) [0x57a881] 11: (main()+0x24a1) [0x54d881] 12: (__libc_start_main()+0xf5) [0x7f20593d0af5] 13: /usr/bin/ceph-mon() [0x55d0f9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Expected results: There should not be any crash. While executing any ceph command, i am getting error. ceph -s 2015-06-15 02:09:16.514340 7fe8301a6700 0 -- :/1012129 >> 10.16.154.227:6789/0 pipe(0x7fe82c028050 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fe82c0250d0).fault 2015-06-15 02:09:19.513490 7fe827d78700 0 -- :/1012129 >> 10.16.154.227:6789/0 pipe(0x7fe81c000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fe81c004ea0).fault Also there is no way to recover once the Monitor have crashed, i have to purge and re-create the cluster again. Additional info: Mon log, Crush Map
Created attachment 1038777 [details] Mon log
Sam, blocker for 1.3.0?
This is not a blocker for 1.3.0.
Loic, is there a fix merged upstream that we want for 1.3.1?
(In reply to Samuel Just from comment #5) > Loic, is there a fix merged upstream that we want for 1.3.1? If not, please re-target to 1.3.2.
Loic, it's not totally clear to me what exact patches we'd need on top of 0.94.2? Can you please clarify?
Ken, http://tracker.ceph.com/issues/11814 has "Copied to" issues, each of them being a Backport of the fix to the relevant stable releases. In this case there only is one, http://tracker.ceph.com/issues/11824 which is targetted to hammer, as shown by the Release field. The description of http://tracker.ceph.com/issues/11824 is a link to the pull request that has the commits that were backported. This convention is strictly enforced (the stable release team verifies it is on a weekly basis). The Target version field tells you in which version this backport will be published. In this case it is v0.94.4, i.e. the next hammer point release at the time of this writing. When this field is set, it means the backport has been tested (via the relevant ceph-qa-suite) and approved by the developer who merged the corresponding commit in master. To answer your question, the patches you need are at https://github.com/ceph/ceph/pull/5276 (3 of them). If you'd like to know more about the backport process it's at http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO Cheers
Thanks Loic. The thing that confused me about the upstream tracker was that there were two issues marked as "related", and 11824 upstream also has a comment about "It must be backported together with the fix for #12419" ... so I really appreciate your clarification that only https://github.com/ceph/ceph/pull/5276 is needed.
Loic, I take it that we'll also want the patches at http://tracker.ceph.com/issues/13477 ? "crush/mapper: ensure bucket id is valid before indexing buckets array" and "crush/mapper: ensure take bucket value is valid" ?
Ken, yes, that's a patch worth having
If we re-spin to fix the Ceph-deploy issues, we will include this fix as well.
I've cherry-picked the changes from https://github.com/ceph/ceph/pull/5276 and https://github.com/ceph/ceph/pull/6430 here. (I did not cherry-pick 6f0af185ad7cf9640557efb7f61a7ea521871b5b because it only fixes the vstart.sh file in /src/test/, and the usptream v0.94.3 tarball does not contain /src/test, . Also, vstart.sh is not used in RHCS downstream, so the patch is not relevant in any regard.) The exact commands I ran on the "ceph-1.3-rhel-patches" branch in Gerrit (for RHEL) and the rhcs-0.94.3-ubuntu patch in GitHub (for Ubuntu): git cherry-pick -x b58cbbab4f74e352c3d4a61190cea2731057b3c9 git cherry-pick -x f47ba4b1a1029a55f8bc4ab393a7fa3712cd4e00 git fetch https://github.com/SUSE/ceph wip-13654-hammer git cherry-pick -x 81d8aa14f3f2b7bf4bdd0b4e53e3a653a600ef38 git cherry-pick -x a52f7cb372339dffbeed7dae8ce2680586760754
Running the same command as mentioned in the BUG is not working. Now i am not seeing the crash, but its getting stuck in pg creation forever. 1. Create a ec profile, with the below command. ceph osd erasure-code-profile set myprofile plugin=lrc mapping=__DD__DD layers='[[ "_cDD_cDD", "" ],[ "cDDD____", "" ],[ "____cDDD", "" ],]' ruleset-steps='[ [ "choose", "datacenter", 3 ], [ "chooseleaf", "osd", 0] ]' 2. ceph osd pool create ecpool 12 12 erasure myprofile ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 14.52994 root default -2 1.44998 host cephqe11 1 0.35999 osd.1 up 1.00000 1.00000 14 1.09000 osd.14 up 1.00000 1.00000 -3 4.35999 host cephqe8 2 1.09000 osd.2 up 1.00000 1.00000 3 1.09000 osd.3 up 1.00000 1.00000 4 1.09000 osd.4 up 1.00000 1.00000 5 1.09000 osd.5 up 1.00000 1.00000 -4 4.35999 host cephqe9 6 1.09000 osd.6 up 1.00000 1.00000 7 1.09000 osd.7 up 1.00000 1.00000 8 1.09000 osd.8 up 1.00000 1.00000 9 1.09000 osd.9 up 1.00000 1.00000 -5 4.35999 host cephqe10 10 1.09000 osd.10 up 1.00000 1.00000 11 1.09000 osd.11 up 1.00000 1.00000 12 1.09000 osd.12 up 1.00000 1.00000 13 1.09000 osd.13 up 1.00000 1.00000 0 0 osd.0 down 0 1.00000 ceph -s cluster 4b86e8aa-7004-45b4-8328-319f23fbcd6f health HEALTH_WARN clock skew detected on mon.cephqe5, mon.cephqe6 12 pgs stuck inactive 12 pgs stuck unclean too few PGs per OSD (13 < min 30) Monitor clock skew detected monmap e1: 3 mons at {cephqe4=10.70.44.42:6789/0,cephqe5=10.70.44.44:6789/0,cephqe6=10.70.44.46:6789/0} election epoch 4, quorum 0,1,2 cephqe4,cephqe5,cephqe6 osdmap e79: 15 osds: 14 up, 14 in pgmap v184: 76 pgs, 2 pools, 3072 MB data, 3 objects 9701 MB used, 14816 GB / 14826 GB avail 64 active+clean 12 creating ceph pg dump |grep ^1 dumped all in format plain 1.a 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669950 0'0 2015-11-08 05:22:31.669950 1.b 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669951 0'0 2015-11-08 05:22:31.669951 1.8 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669949 0'0 2015-11-08 05:22:31.669949 1.9 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669950 0'0 2015-11-08 05:22:31.669950 1.6 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669937 0'0 2015-11-08 05:22:31.669937 1.7 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669949 0'0 2015-11-08 05:22:31.669949 1.4 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669936 0'0 2015-11-08 05:22:31.669936 1.5 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669936 0'0 2015-11-08 05:22:31.669936 1.2 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669935 0'0 2015-11-08 05:22:31.669935 1.3 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669935 0'0 2015-11-08 05:22:31.669935 1.0 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669934 0'0 2015-11-08 05:22:31.669934 1.1 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2015-11-08 05:22:31.669934 0'0 2015-11-08 Version: ceph-0.94.3-3.el7cp.x86_64
(In reply to Tanay Ganguly from comment #17) > Running the same command as mentioned in the BUG is not working. > Now i am not seeing the crash, but its getting stuck in pg creation forever. Loic, any idea why pg creation would get stuck here? Is there another bug to fix with this?
> ruleset-steps='[ [ "choose", "datacenter", 3 ], [ "chooseleaf", "osd", 0] ]' The crush rule requires for 3 datacenters but there are none in the crush map, no OSD can be mapped to the PG and they are stuck.
The fix appears to have failed, and it was a non-blocker "target of opportunity" in the re-spin. I agree with Harish's assessment, Pushing to 1.3.2.
With ceph 0.94.3.3, it sounds like the following is occurring: 1) User specifies an erasure code profile with three datacenters. 2) There are not three datacenters in the crush map. 3) Ceph waits for a crushmap adjustment If I'm understanding correctly, there is no remaining issue to fix here. If a user was in this situation, the solution would be to re-do the erasure code profile with something that aligns with the current crush map, or else adjust the crush map to fit the erasure code profile. But Ceph can't guess at what the user wants to do in this situation. Do I have that right Loic?
When invalid value as mentioned above was provided, the command hung. The expectation is that it should time out and print user understandable error message rather than hanging.
(In reply to Harish NV Rao from comment #22) > When invalid value as mentioned above was provided, the command hung. Which command hung?
Hi Ken, Just to clarify the pg creation gets hung not the command. Ceph cluster never become active+clean state as it doesn't get valid parameters. i.e. if users specifies Datacenter of Rack as ruleset-steps, where these entities does not exists in Crushmap then the PG creation stucks. This was part of negative testing, which customer can sometime face it. Only solution he have to delete the pool created using that profile and then delete the profile and also remove the ruleset from the crush ( To avoid user from using the same ruleset again ) From QE stand point, it would have been better if the command failed stating the reason for it. Thanks, Tanay
(In reply to Tanay Ganguly from comment #26) > From QE stand point, it would have been better if the command failed stating > the reason for it. Loic, would it be possible to do some sort of input validation as Tanay describes here, so it's clearer to the user?
> Loic, would it be possible to do some sort of input validation as Tanay describes here, so it's clearer to the user? There would be value in clarifying this, indeed. It's worth a discussion on ceph-devel I think: I can't think of a trivial way to do that.
@Ken > If I'm understanding correctly, there is no remaining issue to fix here. Yes, this issue does not need fixing. Another could be open to suggest a usability improvement.
Validating a CRUSHmap would be a different bug. The balance of the issue is resolved. NO QE required.