Bug 1600040
Summary: | [bluestore]: 1 Pg stuck in active+clean+remapped state after running reweight-by-utilization | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Parikshith <pbyregow> | ||||||||
Component: | RADOS | Assignee: | Josh Durgin <jdurgin> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | Parikshith <pbyregow> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.1 | CC: | ceph-eng-bugs, dzafman, hnallurv, jdurgin, kchai, nojha | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | 3.1 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2018-07-11 23:32:03 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Created attachment 1458032 [details]
pg query
Created attachment 1458033 [details]
osd dump
being remapped isn't an error - in this case crush happens to not reach an assignment for shard 4, so the osds override it and add one. |
Created attachment 1458031 [details] crush map Description of problem: 1 Pg stuck in active+clean+remapped state for over 12 hrs after running reweight-by-utilization Version-Release number of selected component (if applicable): ceph version 12.2.5-27.el7cp Steps to Reproduce: 1. Created cluster with only bluestore osds. Filled the cluster with 50-60% data. 2. Ran ceph osd reweight-by-utilization (default threshold) Actual results: One of the Pgs(13.5) belonging to an EC pool(overwrites enabled) is stuck in active+clean+remapped state $ceph osd erasure-code-profile get newprofile crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=4 m=2 plugin=jerasure technique=reed_sol_van w=8 $ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 10.91510 root default -3 2.72878 host magna106 1 hdd 0.90959 osd.1 up 1.00000 1.00000 2 hdd 0.90959 osd.2 up 1.00000 1.00000 3 hdd 0.90959 osd.3 up 1.00000 1.00000 -9 1.81918 host magna107 9 hdd 0.90959 osd.9 up 1.00000 1.00000 10 hdd 0.90959 osd.10 up 0.90002 1.00000 -7 0.90959 host magna108 8 hdd 0.90959 osd.8 up 1.00000 1.00000 -5 2.72878 host magna113 0 hdd 0.90959 osd.0 up 1.00000 1.00000 4 hdd 0.90959 osd.4 up 1.00000 1.00000 5 hdd 0.90959 osd.5 up 1.00000 1.00000 -13 1.81918 host magna114 7 hdd 0.90959 osd.7 up 1.00000 1.00000 11 hdd 0.90959 osd.11 up 1.00000 1.00000 -11 0.90959 host magna115 6 hdd 0.90959 osd.6 up 0.95001 1.00000 $ceph pg dump | grep active+clean+remapped dumped all 13.5 2691 0 0 2691 0 3178894720 9560 9560 active+clean+remapped 2018-07-11 07:59:52.847350 377'9560 509:16941 [4,3,10,11,NONE,8] 4 [4,3,10,11,8,8] 4 0'0 2018-07-09 13:11:58.937740 0'0 2018-07-09 13:11:58.937740 0 $ceph pg 13.5 query { "state": "active+clean+remapped", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 537, "up": [ 4, 3, 10, 11, 2147483647, 8 ], "acting": [ 4, 3, 10, 11, 8, 8 ], "actingbackfill": [ "3(1)", "4(0)", "8(4)", "8(5)", "10(2)", "11(3)" ], Attached complete pg query, ceph osd dump and osd crush dump Additional info: I did not manually change crush map.