Bug 1814681
| Summary: | [RFE] use topologySpreadConstraints to evenly spread OSDs across hosts | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | acalhoun | ||||||
| Component: | ocs-operator | Assignee: | Kesavan <kvellalo> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Warren <wusui> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 4.3 | CC: | assingh, ebenahar, hnallurv, jdurgin, kbader, kvellalo, madam, mbukatov, mmuench, muagarwa, nberry, ocs-bugs, owasserm, ratamir, rohgupta, rojoseph, rperiyas, sabose, shan, sostapov, tnielsen | ||||||
| Target Milestone: | --- | Keywords: | AutomationBackLog, FutureFeature, Performance | ||||||
| Target Release: | OCS 4.7.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-05-19 09:14:51 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1776562, 1817438 | ||||||||
| Attachments: |
|
||||||||
|
Description
acalhoun
2020-03-18 14:12:33 UTC
Created attachment 1671098 [details]
Per OSD Capacity
Observe that OSDs 0, 1, 2, and 11 have a significantly higher capacity than remaining OSDs. At ~20:00 these/a OSD(s) reach full limits(Total Cap. is ~2.2TiB) and prevented additional writes to occur on the cluster.
Comment on attachment 1671097 [details]
PGs per OSD
added additional OSDs at ~18:37, initial PGs were allocated but balancing did not occur.
14.2.8 would be post-4.3 - that's the basis of RHCS 4.1, which is still in development. As noted in email, the pg counts are lower than they should be. Which pool(s) were you filling up? ocs-operator sets up the target ratio for rbd and cephfs data pools [1]. If you're creating additional pools, you'll need to set a target ratio or size for them so the autoscaler can set the pg count appropriately. You can see the target size ratios for each pool in 'ceph osd pool ls detail'. My cluster is no longer responding, but I was using the default pools provided by the storage cluster, example-storagecluster-cephblockpool
below is the previously recorded pg auto scaling status and ceph osd df output
oc rsh rook-ceph-tools-7f96779fb9-48c6h ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
example-storagecluster-cephblockpool 3801G 3.0 28317G 0.4027 0.4900 1.0 256 on
example-storagecluster-cephobjectstore.rgw.control 0 3.0 28317G 0.0000 1.0 32 on
example-storagecluster-cephobjectstore.rgw.log 0 3.0 28317G 0.0000 1.0 32 on
.rgw.root 0 3.0 28317G 0.0000 1.0 32 on
example-storagecluster-cephfilesystem-metadata 2286 3.0 28317G 0.0000 4.0 32 on
example-storagecluster-cephfilesystem-data0 0 3.0 28317G 0.0000 0.4900 1.0 256 on
example-storagecluster-cephobjectstore.rgw.meta 0 3.0 28317G 0.0000 1.0 32 on
nce 2h); epoch: e80
[root@f03-h29-000-r620 test-files]# oc rsh rook-ceph-tools-7f96779fb9-48c6h ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 ssd 2.91100 1.00000 2.9 TiB 2.0 TiB 2.0 TiB 71 KiB 3.6 GiB 886 GiB 70.28 1.74 341 up
11 ssd 2.91100 1.00000 2.9 TiB 1.7 TiB 1.7 TiB 32 KiB 3.0 GiB 1.2 TiB 57.47 1.42 319 up
0 ssd 2.18320 1.00000 2.2 TiB 1.9 TiB 1.9 TiB 43 KiB 3.2 GiB 333 GiB 85.11 2.11 329 up
1 ssd 2.18320 1.00000 2.2 TiB 1.9 TiB 1.9 TiB 52 KiB 3.2 GiB 330 GiB 85.22 2.11 327 up
3 ssd 2.18320 1.00000 2.2 TiB 551 GiB 549 GiB 20 KiB 1.3 GiB 1.6 TiB 24.63 0.61 103 up
4 ssd 2.18320 1.00000 2.2 TiB 397 GiB 396 GiB 27 KiB 1.0 GiB 1.8 TiB 17.75 0.44 97 up
5 ssd 2.18320 1.00000 2.2 TiB 539 GiB 538 GiB 12 KiB 1.2 GiB 1.7 TiB 24.12 0.60 98 up
6 ssd 2.18320 1.00000 2.2 TiB 474 GiB 473 GiB 20 KiB 1.2 GiB 1.7 TiB 21.21 0.53 78 up
7 ssd 2.18320 1.00000 2.2 TiB 417 GiB 416 GiB 19 KiB 1.0 GiB 1.8 TiB 18.65 0.46 80 up
8 ssd 2.18320 1.00000 2.2 TiB 461 GiB 460 GiB 12 KiB 1.0 GiB 1.7 TiB 20.63 0.51 78 up
9 ssd 2.18320 1.00000 2.2 TiB 493 GiB 492 GiB 20 KiB 1.2 GiB 1.7 TiB 22.06 0.55 90 up
10 ssd 2.18320 1.00000 2.2 TiB 478 GiB 477 GiB 22 KiB 1.2 GiB 1.7 TiB 21.40 0.53 76 up
TOTAL 28 TiB 11 TiB 11 TiB 354 KiB 22 GiB 16 TiB 40.35
Doing the math again, the nmuber of pgs does match the target ratio: 12 * 100 / 3 * 0.49 = 196 -> rounded to 256 pgs When you reproduce can you attach a pg dump and a binary osdmap (ceph osd getmap -o /tmp/osdmap). That will let us see the distribution of pgs per pool, and check the balancer's behavior. not 4.3 material Retested with quay.io/rhceph-dev/ocs-olm-operator:4.3.0-rc2 and the same issue occured. Evaluated cluster Configuration with Ben England and observed that I had non uniform Racks that is Rack0 had 3host, Rack1 had 3host, Rack2 had 2host. We suspected that because each rack did not have the same number of host, OCS had improperly balanced OSDS/PGs. When deploying OSDs, OCS deployed them as below: Rack 0 -host 0 --osd0 -host 1 --osd1 Rack 1 -host 2 --osd2 -host 3 --osd11 Rack 2 -host4 --osd3 --osd4 --osd5 --osd6 -host5 --osd7 --osd8 --osd9 --osd10 Is it expected that users have the same number of host/capacity for each Rack, Is this just a configuration error or still a functional issue? (In reply to acalhoun from comment #8) > Retested with quay.io/rhceph-dev/ocs-olm-operator:4.3.0-rc2 and the same > issue occured. > > Evaluated cluster Configuration with Ben England and observed that I had non > uniform Racks that is Rack0 had 3host, Rack1 had 3host, Rack2 had 2host. We > suspected that because each rack did not have the same number of host, OCS > had improperly balanced OSDS/PGs. > > When deploying OSDs, OCS deployed them as below: > > Rack 0 > -host 0 > --osd0 > -host 1 > --osd1 > Rack 1 > -host 2 > --osd2 > -host 3 > --osd11 > Rack 2 > -host4 > --osd3 > --osd4 > --osd5 > --osd6 > -host5 > --osd7 > --osd8 > --osd9 > --osd10 > > Is it expected that users have the same number of host/capacity for each > Rack, Is this just a configuration error or still a functional issue? If crush is configured to split across racks, then it's expected to have similar capacity in each, otherwise you may not be able to use the full capacity or balance data appropriately. Same for splitting across hosts. Without this, crush constrains how balanced the cluster can be. For example, if you have two hosts with 5x8 TB osds and one host with 1x8 TB osd you can only use up to 10TB of 3x replicated space, and the single osd host will have 5x the pgs/osd as the other two. For maximum parallelism and performance, you need equally sized hosts and racks. I'm not sure how ocs/rook/ocp is controlling scheduling of osds onto hosts/racks - Seb where is this controlled? Hi acalhoun, Is this a bare metal only issue? Tamir, Based on my testing I haven't observed this issue on AWS, although the Rack topology setup is different between AWS and BM. I believe in AWS Rack aligns with Availability Zones and with BM is in the traditional sense of rack. Overall, I believe Josh's assessment is correct about the poor distribution due to the variation in Rack/Host Capacity. I am surprised at how significant this is, but not sure if this is "Okay" or changes are necessary. I re-ran this test with balanced Racks (1 Rack to 2 host, w/ total of 4Racks), although devices varied from 2.2TiB to 2.9TiB and this difference still resulted in high variation of data distribution and incorrect Full status at 12 OSDs. @Josh, the kube scheduler is responsible for this, OCS labels the nodes and passes that selector to Rook which then creates the resources and leave the rest to Kubernetes. We need 1.18 to get proper placement with topologySpreadConstraints, this is already tracked upstream https://github.com/rook/rook/issues/4387 We noticed that Kubernetes didn't have a thing like topologySpreadConstraints when we were doing failure domain testing of 4.2. The resulting scheme was to have a storageDeviceSets in the storageCluster, and then build a cephCluster storageClassDeviceSets list with a set for each failure-domain (rack or zone). If there rack or zone failure-domain labels didn't exist, we'd create virtual 'racks' (for VMware, because their cloud provider didn't lend the ability to surface failure-domain data to OCP, at least in 4.2). Pools would be use rack or zone as the failure-domain to decluster replicas. The UI for 4.2 would surfacce the failure-domain information, and when you "picked" nodes you needed to pick an equal number from each failure domain, and have at least 3 failure domains. With bare metal, we don't use the UI, and the docs don't have "here be dragons" text when we instruct the user to add OCS labels to hosts. The result is someone might end up labeling hosts in 2 racks, with an unbalanced number of nodes each each. Because the rack label exists, we can't create virtual ones. Since there are < 3 failure-domain rack, we fall back to host, and we presumably have a single set in storageClassDeviceSets. In this scenario, there is no way to ensure OSDs are balanced across hosts. Docs should probably add text that tells people not to point a gun at their foot when labeling hosts on bare metal. Second, it might be worthwhile to have the OCS Operator use a crush bucket type between "rack" and "host" for "virtual racks" if rack/zone failure-domain count is less than three. I haven't thought about how topologySpreadConstraints would change the strategy. Also, since we have a bug where the PVC ID is used for the host bucket name in crush, even when portable: false, we can't even guarantee replicas are on distinct hosts! @Kyle Besides the potential "portable: false" issue, are you seeing any reason we couldn't solve these issues with documentation for bare metal? Or the concern is basically how we can help the user not shoot themselves in the foot while following the documentation? I'm doing this in parallel with Annette. Basically after folks label their nodes, we're going to have them run - oc get nodes -L failure-domain.beta.kubernetes.io/zone,failure-domain.beta.kubernetes.io/rack -l cluster.ocs.openshift.io/openshift-storage='' NAME STATUS ROLES AGE VERSION ZONE RACK ip-10-0-128-167.us-west-2.compute.internal Ready worker 41h v1.16.2 us-west-2a ip-10-0-133-93.us-west-2.compute.internal Ready worker 5d15h v1.16.2 us-west-2a ip-10-0-159-206.us-west-2.compute.internal Ready worker 5d15h v1.16.2 us-west-2b ip-10-0-172-122.us-west-2.compute.internal Ready worker 5d15h v1.16.2 us-west-2c To verify they do in fact have an even number of nodes in at least 3 distinct racks or zones. Personally, I'd set the minimum at closing #1816820 combined with a docsfix along the lines of the above, and then look into the viability of switching to a new crush bucket type between host and rack/zone as our "virtual rack" for 4.4. (In reply to leseb from comment #12) > @Josh, the kube scheduler is responsible for this, OCS labels the nodes and > passes that selector to Rook which then creates the resources and leave the > rest to Kubernetes. @Seb, Not 100% correct, afaik. OCS operator does a bit more. See below: (In reply to Kyle Bader from comment #13) > We noticed that Kubernetes didn't have a thing like > topologySpreadConstraints when we were doing failure domain testing of 4.2. > The resulting scheme was to have a storageDeviceSets in the storageCluster, > and then build a cephCluster storageClassDeviceSets list with a set for each > failure-domain (rack or zone). If there rack or zone failure-domain labels > didn't exist, we'd create virtual 'racks' (for VMware, because their cloud > provider didn't lend the ability to surface failure-domain data to OCP, at > least in 4.2). Pools would be use rack or zone as the failure-domain to > decluster replicas. @Kyle, I'm a little confused about the "we" here: This sounds like what we did in the development of OCS operator (see below). Are you referring to that work, or are you describing what you did in testing with manual preparation of the OCP cluster? > The UI for 4.2 would surfacce the failure-domain > information, and when you "picked" nodes you needed to pick an equal number > from each failure domain, and have at least 3 failure domains. > > With bare metal, we don't use the UI, Yeah... My original understanding was that we would only be using the CLI to set up the LSO and PVs for the backend disks. And would use the UI just like normal from there on. Maybe that is not how it ended up. > and the docs don't have "here be > dragons" text when we instruct the user to add OCS labels to hosts. The > result is someone might end up labeling hosts in 2 racks, with an unbalanced > number of nodes each each. Because the rack label exists, we can't create > virtual ones. Since there are < 3 failure-domain rack, we fall back to host, > and we presumably have a single set in storageClassDeviceSets. In this > scenario, there is no way to ensure OSDs are balanced across hosts. > > Docs should probably add text that tells people not to point a gun at their > foot when labeling hosts on bare metal. Second, it might be worthwhile to > have the OCS Operator use a crush bucket type between "rack" and "host" for > "virtual racks" if rack/zone failure-domain count is less than three. ocs-operator is using "rack" as a virtual zone. This was done in cases where we have < #replica AZs in AWS. In genreal, if it does not find enough (>= #replica) zone labels in the nodes, it will create #replica rack labels on the nodes, distributing the nodes as evenly as possible across the racks, and will try to make sure to have an as even as possible distribution of the osds among the racks using affinity and anti-affinity settings. It also chops the StorageDeviceSet up into multiple StorageClassDeviceSets. Within the rack label, the kubernetes scheduler is responsible for placing OSDs, so we will not necessarily have an even distribution of OSDs among nodes with the same rack. So if, in a bare-metal environment, the admin has already created rack labels, then ocs-operator would honor them and just try and distribute the OSDs among them as evenly as possible. But it is indeed important to spread the storage nodes evenly across the racks. The description here looks as if the hosts have been distributed across the racks well (2 in each rack), so that is fine. Not sure why the OSDs are not distributed well across racks. Note that all this "magic" would only happen in ocs-operator if no "placement" is explicitly configured in the StorageDeviceSet. See: https://github.com/openshift/ocs-operator/blob/release-4.3/deploy/olm-catalog/ocs-operator/4.3.0/storagecluster.crd.yaml#L110 I am not 100% sure which doc was followed to set this up. - Was the UI not used for setup after the LSO and backend PVs were created? - Was "placement" used? - Can I see the StorageCluster cr that was used? Regarding the introduction of a bucket between rack and host: We are doing pretty much what you are describing, but with racks. We were not aware of any existing level between rack and host, so ended up using rack since ceph knows about it, and we didn't know that OCP would set these labels automatically (like it sets zone labels automatically for AWS AZs...). > I haven't thought about how topologySpreadConstraints would change the > strategy. As BM is going to be GAed in 4.4 and this issue wasn't observed in AWS (comment #11), marking as a blocker to 4.4 (In reply to Raz Tamir from comment #19) > As BM is going to be GAed in 4.4 and this issue wasn't observed in AWS > (comment #11), marking as a blocker to 4.4 Have you not seen it in AWS ever, or possibly just not when running with at least 3 zones? This whole BZ is a bit convoluted. I can not see clearly what is the actual problem that this BZ is about. If I understand it correctly, what is described here, is is a combination of various aspects and currently mostly works as designed... (In reply to Josh Durgin from comment #9) > > If crush is configured to split across racks, then it's expected to have > similar capacity in each, otherwise you may not be able to use the full > capacity or balance data appropriately. Same for splitting across hosts. > Without this, crush constrains how balanced the cluster can be. For example, > if you have two hosts with 5x8 TB osds and one host with 1x8 TB osd you can > only use up to 10TB of 3x replicated space, and the single osd host will > have 5x the pgs/osd as the other two. For maximum parallelism and > performance, you need equally sized hosts and racks. > > I'm not sure how ocs/rook/ocp is controlling scheduling of osds onto > hosts/racks - Seb where is this controlled? As explained in comment #18, * the ocs operator either detects a failure domain (zone, corresoponding to AWS AZ) or creates one (rack) by labelling nodes into racks artificially. * The various OSDs should be distributed across the failure domain (rack or zone) as evenly as possible. In particular we should roughly the same capacity in each failure domain (zone or rack). ==> If this is not the case, then this is a bug. * Within the failure domain (rack/zone), the distribution is entirely up to the kubernetes scheduler. This is currently NOT done homogeneously across nodes, but it will frequently happen that some nodes get many OSDs, some nodes get only a few and some get none. There is just nothing we can currently do about it, and if Ceph assumes the hosts to be of similar capacity (even if the failure domain is set to rack or zone), then this is just a fact that we have to accept at this point. With Kube 1.18 / OCP 4.6, we will fix this by the use of topologySpreadConstraints. * There is one possible problem with the portable=true on OSDs for bare metal. But it is treated in a separate BZ. https://bugzilla.redhat.com/show_bug.cgi?id=1816820 * Is there an additional problem at all with the pg adjusting? (I really don't know it, it just seems that it is a combination of the above...) ==> @Josh? I think the google doc you're following is outdated.It specifically mentions rack labels in the StorageCluster object[1]. It should not. I don't see a must-gather attached, so I can't check if that's the exact cause. I think it was corrected in doc review process [2] [1] https://docs.google.com/document/d/1AqFw3ylCfZ7vxq-63QVbGPHESPW-xep4reGVS0mPgXQ/edit [2] Step 1.2. ctrl+f "kind: storagecluster" https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.3/html-single/deploying_openshift_container_storage/index?lb_target=preview#installing-openshift-container-storage-using-local-storage-devices_rhocs > * The various OSDs should be distributed across the failure domain (rack or
> zone) as evenly as possible. In particular we should roughly the same
> capacity in each failure domain (zone or rack).
> ==> If this is not the case, then this is a bug.
To clarify, specifying the rack labels in the StorageCluster.Spec.StorageDeviceSets would cause the issue Michael mentioned here.
(In reply to Rohan CJ from comment #23) > > * The various OSDs should be distributed across the failure domain (rack or > > zone) as evenly as possible. In particular we should roughly the same > > capacity in each failure domain (zone or rack). > > ==> If this is not the case, then this is a bug. > > To clarify, specifying the rack labels in the > StorageCluster.Spec.StorageDeviceSets would cause the issue Michael > mentioned here. Hmm, I don't understand. It is perfectly fine if rack labels already exist. In this case ocs-operator is going to honour them. ocs-operator will only create rack labels if nodes are in less than 3 zones and if there are no rack labels already. If the admin has distributed the nodes into racks with labels, it should just work, and ocs-operator should still make sure to distribute the disks evenly across those racks. If that is not working (the even distribution across racks or zones), it is a bug in ocs-operator that needs to be fixed, I believe. I'd have to double check if the expansion scenario different though. (In reply to Michael Adam from comment #21) > * Within the failure domain (rack/zone), the distribution is entirely up > to the kubernetes scheduler. This is currently NOT done homogeneously > across nodes, but it will frequently happen that some nodes get many > OSDs, some nodes get only a few and some get none. > There is just nothing we can currently do about it, and if Ceph assumes > the hosts to be of similar capacity (even if the failure domain is set > to rack or zone), then this is just a fact that we have to accept at this > point. > With Kube 1.18 / OCP 4.6, we will fix this by the use of > topologySpreadConstraints. IMO this BZ should track this problem. > * Is there an additional problem at all with the pg adjusting? > (I really don't know it, it just seems that it is a combination of the > above...) > ==> @Josh? There is no additional problem to my knowledge. (In reply to Josh Durgin from comment #25) > (In reply to Michael Adam from comment #21) > > * Within the failure domain (rack/zone), the distribution is entirely up > > to the kubernetes scheduler. This is currently NOT done homogeneously > > across nodes, but it will frequently happen that some nodes get many > > OSDs, some nodes get only a few and some get none. > > There is just nothing we can currently do about it, and if Ceph assumes > > the hosts to be of similar capacity (even if the failure domain is set > > to rack or zone), then this is just a fact that we have to accept at this > > point. > > With Kube 1.18 / OCP 4.6, we will fix this by the use of > > topologySpreadConstraints. > > IMO this BZ should track this problem. So why is this bug not targeted to 4.6? > > > * Is there an additional problem at all with the pg adjusting? > > (I really don't know it, it just seems that it is a combination of the > > above...) > > ==> @Josh? > > There is no additional problem to my knowledge. (In reply to Yaniv Kaul from comment #26) > (In reply to Josh Durgin from comment #25) > > (In reply to Michael Adam from comment #21) > > > * Within the failure domain (rack/zone), the distribution is entirely up > > > to the kubernetes scheduler. This is currently NOT done homogeneously > > > across nodes, but it will frequently happen that some nodes get many > > > OSDs, some nodes get only a few and some get none. > > > There is just nothing we can currently do about it, and if Ceph assumes > > > the hosts to be of similar capacity (even if the failure domain is set > > > to rack or zone), then this is just a fact that we have to accept at this > > > point. > > > With Kube 1.18 / OCP 4.6, we will fix this by the use of > > > topologySpreadConstraints. > > > > IMO this BZ should track this problem. > > So why is this bug not targeted to 4.6? I assumed Michael was waiting for other comments. Since there are none in a week, going ahead with my suggestion to track only this problem with this BZ, and retitling/moving as appropriate. If anyone wants to track another issue, please open a separate BZ. (In reply to Michael Adam from comment #20) > (In reply to Raz Tamir from comment #19) > > As BM is going to be GAed in 4.4 and this issue wasn't observed in AWS > > (comment #11), marking as a blocker to 4.4 > > Have you not seen it in AWS ever, or possibly just not when running with at > least 3 zones? Not that I'm aware of. We are checking OSD distribution and remember we had few bugs but noting new on AWS @Rajat confirmed that the api is available in 4.5 onwards. We're okay with using beta. We're aiming to land this in 4.6. Removing the blocker flag which was added in 4.4 This is now an epic for 4.7: https://issues.redhat.com/browse/KNIP-1512 ==> moving to 4.7 Hi Kesavan, Are the fixes for https://bugzilla.redhat.com/show_bug.cgi?id=1817438 and this bug inter-related ? Or they need to be verified separately ? Hey Neha, The two bugs needs to be verified separately as on them is of baremetal scenario and the other one is AWS. The topology spread for baremetal is rack and for AWS is AZ(zones). adding pm-ack, (which was not given automatically b/c of the [RFE] tag), since this is an approved epic for 4.7 *** Bug 1778216 has been marked as a duplicate of this bug. *** On Aws, pg distribution in range of 92 to 100.
I will try to do this on vSphere on Wednesday
====================================================================================================
(venv) wusui@localhost:~/ocs-ci$ oc -n openshift-storage get pods | grep osd
rook-ceph-osd-0-7dc45754fc-8w5vs 2/2 Running 0 7h53m
rook-ceph-osd-1-588f9fdf9-t8v4d 2/2 Running 0 7h53m
rook-ceph-osd-2-779d9c795b-bxjdk 2/2 Running 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0x52qn-zq8gc 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Init:0/2 0 8s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Init:0/2 0 7s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Init:0/2 0 7s
(venv) wusui@localhost:~/ocs-ci$ sleep 120; !!
sleep 120; oc -n openshift-storage get pods | grep osd
rook-ceph-osd-0-7dc45754fc-8w5vs 2/2 Running 0 7h55m
rook-ceph-osd-1-588f9fdf9-t8v4d 2/2 Running 0 7h55m
rook-ceph-osd-2-779d9c795b-bxjdk 2/2 Running 0 7h55m
rook-ceph-osd-3-58577cf8c5-przg2 2/2 Running 0 106s
rook-ceph-osd-4-799f945b7f-zwrp5 2/2 Running 0 105s
rook-ceph-osd-5-856545cfc7-7bspr 2/2 Running 0 103s
rook-ceph-osd-prepare-ocs-deviceset-0-data-0x52qn-zq8gc 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Completed 0 2m14s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Completed 0 2m13s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Completed 0 2m13s
====================================================================================================
sh-4.4# ceph status
cluster:
id: 467e00f5-3885-4fb5-949e-6f3eef7d40a1
health: HEALTH_OK
sh-4.4# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
ocs-storagecluster-cephblockpool 4513M 3.0 12288G 0.0011 0.4900 1.0000 1.0 128 on
ocs-storagecluster-cephfilesystem-metadata 56886 3.0 12288G 0.0000 4.0 32 on
ocs-storagecluster-cephfilesystem-data0 158 3.0 12288G 0.0000 1.0 32 on
sh-4.4# ceph osd df output
Error EINVAL: you must specify both 'filter_by' and 'filter'
sh-4.4# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
4 ssd 2.00000 1.00000 2 TiB 3.3 GiB 2.3 GiB 0 B 1 GiB 2.0 TiB 0.16 1.03 94 up
2 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.97 98 up
1 ssd 2.00000 1.00000 2 TiB 3.0 GiB 2.0 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 96 up
3 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 96 up
0 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 92 up
5 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 100 up
TOTAL 12 TiB 19 GiB 13 GiB 0 B 6 GiB 12 TiB 0.16
MIN/MAX VAR: 0.95/1.05 STDDEV: 0.01
sh-4.4#
services:
mon: 3 daemons, quorum a,b,c (age 8h)
mgr: a(active, since 8h)
mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay
osd: 6 osds: 6 up (since 18m), 6 in (since 18m)
task status:
scrub status:
mds.ocs-storagecluster-cephfilesystem-a: idle
mds.ocs-storagecluster-cephfilesystem-b: idle
data:
pools: 3 pools, 192 pgs
objects: 1.33k objects, 4.4 GiB
usage: 18 GiB used, 12 TiB / 12 TiB avail
pgs: 192 active+clean
io:
client: 853 B/s rd, 124 KiB/s wr, 1 op/s rd, 1 op/s wr
sh-4.4# ceph osd df
(venv) wusui@localhost:~/ocs-ci$ vi !$
vi ~/bugreview/1*/aws-message
(venv) wusui@localhost:~/ocs-ci$ cat !$
cat ~/bugreview/1*/aws-message
On Aws, pg distribution in range of 92 to 100 which is better than 32 to 256.
I will try to do this on vSphere on Wednesday
====================================================================================================
(venv) wusui@localhost:~/ocs-ci$ oc -n openshift-storage get pods | grep osd
rook-ceph-osd-0-7dc45754fc-8w5vs 2/2 Running 0 7h53m
rook-ceph-osd-1-588f9fdf9-t8v4d 2/2 Running 0 7h53m
rook-ceph-osd-2-779d9c795b-bxjdk 2/2 Running 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0x52qn-zq8gc 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Init:0/2 0 8s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Init:0/2 0 7s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Init:0/2 0 7s
(venv) wusui@localhost:~/ocs-ci$ sleep 120; !!
sleep 120; oc -n openshift-storage get pods | grep osd
rook-ceph-osd-0-7dc45754fc-8w5vs 2/2 Running 0 7h55m
rook-ceph-osd-1-588f9fdf9-t8v4d 2/2 Running 0 7h55m
rook-ceph-osd-2-779d9c795b-bxjdk 2/2 Running 0 7h55m
rook-ceph-osd-3-58577cf8c5-przg2 2/2 Running 0 106s
rook-ceph-osd-4-799f945b7f-zwrp5 2/2 Running 0 105s
rook-ceph-osd-5-856545cfc7-7bspr 2/2 Running 0 103s
rook-ceph-osd-prepare-ocs-deviceset-0-data-0x52qn-zq8gc 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Completed 0 2m14s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Completed 0 2m13s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Completed 0 2m13s
====================================================================================================
sh-4.4# ceph status
cluster:
id: 467e00f5-3885-4fb5-949e-6f3eef7d40a1
health: HEALTH_OK
sh-4.4# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
4 ssd 2.00000 1.00000 2 TiB 3.3 GiB 2.3 GiB 0 B 1 GiB 2.0 TiB 0.16 1.03 94 up
2 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.97 98 up
1 ssd 2.00000 1.00000 2 TiB 3.0 GiB 2.0 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 96 up
3 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 96 up
0 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 92 up
5 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 100 up
TOTAL 12 TiB 19 GiB 13 GiB 0 B 6 GiB 12 TiB 0.16
MIN/MAX VAR: 0.95/1.05 STDDEV: 0.01
sh-4.4#
services:
mon: 3 daemons, quorum a,b,c (age 8h)
mgr: a(active, since 8h)
mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay
osd: 6 osds: 6 up (since 18m), 6 in (since 18m)
task status:
scrub status:
mds.ocs-storagecluster-cephfilesystem-a: idle
mds.ocs-storagecluster-cephfilesystem-b: idle
data:
pools: 3 pools, 192 pgs
objects: 1.33k objects, 4.4 GiB
usage: 18 GiB used, 12 TiB / 12 TiB avail
pgs: 192 active+clean
io:
client: 853 B/s rd, 124 KiB/s wr, 1 op/s rd, 1 op/s wr
(venv) wusui@localhost:~/ocs-ci$ On Aws, pg distribution in range of 92 to 100 which is better than 32 to 256.
eph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Completed 0 2m14s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Completed 0 2m13s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Completed 0 2m13s
====================================================================================================
sh-4.4# ceph status
cluster:
id: 467e00f5-3885-4fb5-949e-6f3eef7d40a1
health: HEALTH_OK
sh-4.4# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
ocs-storagecluster-cephblockpool 4513M 3.0 12288G 0.0011 0.4900 1.0000 1.0 128 on
ocs-storagecluster-cephfilesystem-metadata 56886 3.0 12288G 0.0000 4.0 32 on
ocs-storagecluster-cephfilesystem-data0 158 3.0 12288G 0.0000 1.0 bash: On: command not found
32 on
sh-4.4# ceph osd df output
Error EINVAL: you must specify both 'filter_by' and 'filter'
sh-4.4# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
4 ssd 2.00000 1.00000 2 TiB 3.3 GiB 2.3 GiB 0 B 1 GiB 2.0 TiB 0.16 1.03 94 up
2 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.97 98 up
1 ssd 2.00000 1.00000 2 TiB 3.0 GiB 2.0 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 96 up
3 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 96 up
0 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 92 up
5 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 100 up
TOTAL 12 TiB(venv) wusui@localhost:~/ocs-ci$
On Aws, pg distribution in range of 92 to 100.
I will try to do this on vSphere on Wednesday
====================================================================================================
(venv) wusui@localhost:~/ocs-ci$ oc -n openshift-storage get pods | grep osd
rook-ceph-osd-0-7dc45754fc-8w5vs 2/2 Running 0 7h53m
rook-ceph-osd-1-588f9fdf9-t8v4d 2/2 Running 0 7h53m
rook-ceph-osd-2-779d9c795b-bxjdk 2/2 Running 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0x52qn-zq8gc 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Init:0/2 0 8s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Init:0/2 0 7s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Init:0/2 0 7s
(venv) wusui@localhost:~/ocs-ci$ sleep 120; !!
sleep 120; oc -n openshift-storage get pods | grep osd
rook-ceph-osd-0-7dc45754fc-8w5vs 2/2 Running 0 7h55m
rook-ceph-osd-1-588f9fdf9-t8v4d 2/2 Running 0 7h55m
rook-ceph-osd-2-779d9c795b-bxjdk 2/2 Running 0 7h55m
rook-ceph-osd-3-58577cf8c5-przg2 2/2 Running 0 106s
rook-ceph-osd-4-799f945b7f-zwrp5 2/2 Running 0 105s
rook-ceph-osd-5-856545cfc7-7bspr 2/2 Running 0 103s
rook-ceph-osd-prepare-ocs-deviceset-0-data-0x52qn-zq8gc 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Completed 0 2m14s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Completed 0 2m13s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Completed 0 2m13s
====================================================================================================
sh-4.4# ceph status
cluster:
id: 467e00f5-3885-4fb5-949e-6f3eef7d40a1
health: HEALTH_OK
sh-4.4# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
ocs-storagecluster-cephblockpool 4513M 3.0 12288G 0.0011 0.4900 1.0000 1.0 128 on
ocs-storagecluster-cephfilesystem-metadata 56886 3.0 12288G 0.0000 4.0 32 on
ocs-storagecluster-cephfilesystem-data0 158 3.0 12288G 0.0000 1.0 32 on
sh-4.4# ceph osd df output
Error EINVAL: you must specify both 'filter_by' and 'filter'
sh-4.4# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
4 ssd 2.00000 1.00000 2 TiB 3.3 GiB 2.3 GiB 0 B 1 GiB 2.0 TiB 0.16 1.03 94 up
2 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.97 98 up
1 ssd 2.00000 1.00000 2 TiB 3.0 GiB 2.0 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 96 up
3 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 96 up
0 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 92 up
5 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 100 up
TOTAL 12 TiB 19 GiB 13 GiB 0 B 6 GiB 12 TiB 0.16
MIN/MAX VAR: 0.95/1.05 STDDEV: 0.01
sh-4.4#
services:
mon: 3 daemons, quorum a,b,c (age 8h)
mgr: a(active, since 8h)
mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay
osd: 6 osds: 6 up (since 18m), 6 in (since 18m)
task status:
scrub status:
mds.ocs-storagecluster-cephfilesystem-a: idle
mds.ocs-storagecluster-cephfilesystem-b: idle
data:
pools: 3 pools, 192 pgs
objects: 1.33k objects, 4.4 GiB
usage: 18 GiB used, 12 TiB / 12 TiB avail
pgs: 192 active+clean
io:
client: 853 B/s rd, 124 KiB/s wr, 1 op/s rd, 1 op/s wr
sh-4.4# ceph osd df
(venv) wusui@localhost:~/ocs-ci$ vi !$
vi ~/bugreview/1*/aws-message
(venv) wusui@localhost:~/ocs-ci$ cat !$
cat ~/bugreview/1*/aws-message
On Aws, pg distribution in range of 92 to 100 which is better than 32 to 256.
I will try to do this on vSphere on Wednesday
====================================================================================================
(venv) wusui@localhost:~/ocs-ci$ oc -n openshift-storage get pods | grep osd
rook-ceph-osd-0-7dc45754fc-8w5vs 2/2 Running 0 7h53m
rook-ceph-osd-1-588f9fdf9-t8v4d 2/2 Running 0 7h53m
rook-ceph-osd-2-779d9c795b-bxjdk 2/2 Running 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0x52qn-zq8gc 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Init:0/2 0 8s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Init:0/2 0 7s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h53m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Init:0/2 0 7s
(venv) wusui@localhost:~/ocs-ci$ sleep 120; !!
sleep 120; oc -n openshift-storage get pods | grep osd
rook-ceph-osd-0-7dc45754fc-8w5vs 2/2 Running 0 7h55m
rook-ceph-osd-1-588f9fdf9-t8v4d 2/2 Running 0 7h55m
rook-ceph-osd-2-779d9c795b-bxjdk 2/2 Running 0 7h55m
rook-ceph-osd-3-58577cf8c5-przg2 2/2 Running 0 106s
rook-ceph-osd-4-799f945b7f-zwrp5 2/2 Running 0 105s
rook-ceph-osd-5-856545cfc7-7bspr 2/2 Running 0 103s
rook-ceph-osd-prepare-ocs-deviceset-0-data-0x52qn-zq8gc 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Completed 0 2m14s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Completed 0 2m13s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Completed 0 2m13s
====================================================================================================
sh-4.4# ceph status
cluster:
id: 467e00f5-3885-4fb5-949e-6f3eef7d40a1
health: HEALTH_OK
sh-4.4# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
4 ssd 2.00000 1.00000 2 TiB 3.3 GiB 2.3 GiB 0 B 1 GiB 2.0 TiB 0.16 1.03 94 up
2 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.97 98 up
1 ssd 2.00000 1.00000 2 TiB 3.0 GiB 2.0 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 96 up
3 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 96 up
0 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 92 up
5 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 100 up
TOTAL 12 TiB 19 GiB 13 GiB 0 B 6 GiB 12 TiB 0.16
MIN/MAX VAR: 0.95/1.05 STDDEV: 0.01
sh-4.4#
services:
mon: 3 daemons, quorum a,b,c (age 8h)
mgr: a(active, since 8h)
mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay
osd: 6 osds: 6 up (since 18m), 6 in (since 18m)
task status:
scrub status:
mds.ocs-storagecluster-cephfilesystem-a: idle
mds.ocs-storagecluster-cephfilesystem-b: idle
data:
pools: 3 pools, 192 pgs
objects: 1.33k objects, 4.4 GiB
usage: 18 GiB used, 12 TiB / 12 TiB avail
pgs: 192 active+clean
io:
client: 853 B/s rd, 124 KiB/s wr, 1 op/s rd, 1 op/s wr
(venv) wusui@localhost:~/ocs-ci$ On Aws, pg distribution in range of 92 to 100 which is better than 32 to 256.
eph-osd-prepare-ocs-deviceset-0-data-1xc52x-2tcws 0/1 Completed 0 2m14s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0fw9zb-qkmzd 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-1-data-16cg8r-9vxjv 0/1 Completed 0 2m13s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0l8zh2-gd8st 0/1 Completed 0 7h55m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1h46wn-g6wqk 0/1 Completed 0 2m13s
====================================================================================================
sh-4.4# ceph status
cluster:
id: 467e00f5-3885-4fb5-949e-6f3eef7d40a1
health: HEALTH_OK
sh-4.4# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
ocs-storagecluster-cephblockpool 4513M 3.0 12288G 0.0011 0.4900 1.0000 1.0 128 on
ocs-storagecluster-cephfilesystem-metadata 56886 3.0 12288G 0.0000 4.0 32 on
ocs-storagecluster-cephfilesystem-data0 158 3.0 12288G 0.0000 1.0 bash: On: command not found
32 on
sh-4.4# ceph osd df output
Error EINVAL: you must specify both 'filter_by' and 'filter'
sh-4.4# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
4 ssd 2.00000 1.00000 2 TiB 3.3 GiB 2.3 GiB 0 B 1 GiB 2.0 TiB 0.16 1.03 94 up
2 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.97 98 up
1 ssd 2.00000 1.00000 2 TiB 3.0 GiB 2.0 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 96 up
3 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 96 up
0 ssd 2.00000 1.00000 2 TiB 3.1 GiB 2.1 GiB 0 B 1 GiB 2.0 TiB 0.15 0.95 92 up
5 ssd 2.00000 1.00000 2 TiB 3.4 GiB 2.4 GiB 0 B 1 GiB 2.0 TiB 0.16 1.05 100 up
TOTAL 12 TiB(venv) wusui@localhost:~/ocs-ci$
On vSphere, I added three OSDs to a 3 OSD cluster and got the following results:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 2.00000 1.00000 2 TiB 1.3 GiB 347 MiB 0 B 1 GiB 2.0 TiB 0.07 0.99 133 up
5 hdd 2.00000 1.00000 2 TiB 1.4 GiB 366 MiB 0 B 1 GiB 2.0 TiB 0.07 1.00 127 up
0 hdd 2.00000 1.00000 2 TiB 1.5 GiB 480 MiB 0 B 1 GiB 2.0 TiB 0.07 1.08 146 up
3 hdd 2.00000 1.00000 2 TiB 1.3 GiB 341 MiB 0 B 1 GiB 2.0 TiB 0.07 0.98 128 up
1 hdd 2.00000 1.00000 2 TiB 1.3 GiB 314 MiB 0 B 1 GiB 2.0 TiB 0.06 0.96 127 up
4 hdd 2.00000 1.00000 2 TiB 1.3 GiB 347 MiB 0 B 1 GiB 2.0 TiB 0.07 0.99 155 up
TOTAL 12 TiB 8.1 GiB 2.1 GiB 0 B 6 GiB 12 TiB 0.07
MIN/MAX VAR: 0.96/1.08 STDDEV: 0.00
Is the range from 127 to 155 pgs per osd considered balanced?
(venv) wusui@localhost:~/ocs-ci$ oc -n openshift-storage get pods | grep ceph-osd-
rook-ceph-osd-0-6b9cbf8bc-h954h 2/2 Running 0 10h
rook-ceph-osd-1-bb68456bb-ssp6r 2/2 Running 0 10h
rook-ceph-osd-2-6f87d6fdcc-8sknc 2/2 Running 0 10h
rook-ceph-osd-3-69f8fd65d9-nhcv5 2/2 Running 0 9h
rook-ceph-osd-4-6fffbb8c46-btcdz 2/2 Running 0 9h
rook-ceph-osd-5-6bd4654c58-2pc5x 2/2 Running 0 9h
rook-ceph-osd-6-5b4d8c9595-8r6f6 0/2 Pending 0 25m
rook-ceph-osd-7-57df9c4b4-xwdrg 0/2 Pending 0 25m
rook-ceph-osd-8-8bcf9447b-fhmrk 0/2 Pending 0 25m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0r9rr2-475cs 0/1 Completed 0 10h
rook-ceph-osd-prepare-ocs-deviceset-0-data-1g2jp2-q5m88 0/1 Completed 0 9h
rook-ceph-osd-prepare-ocs-deviceset-0-data-24tn86-cnf7h 0/1 Completed 0 25m
rook-ceph-osd-prepare-ocs-deviceset-1-data-0z54ww-kjh7g 0/1 Completed 0 10h
rook-ceph-osd-prepare-ocs-deviceset-1-data-1g7qfw-gn2dn 0/1 Completed 0 9h
rook-ceph-osd-prepare-ocs-deviceset-1-data-2xjzmk-zpdbw 0/1 Completed 0 25m
rook-ceph-osd-prepare-ocs-deviceset-2-data-069r9t-xlscm 0/1 Completed 0 10h
rook-ceph-osd-prepare-ocs-deviceset-2-data-1fhmtq-m9rgm 0/1 Completed 0 9h
rook-ceph-osd-prepare-ocs-deviceset-2-data-2zmc4m-729jf 0/1 Completed 0 25m
(venv) wusui@localhost:~/ocs-ci$ oc -n openshift-storage get pods | grep ceph-osd- | egrep "Running|Pending" | sed -e 's/ .*//'
rook-ceph-osd-0-6b9cbf8bc-h954h
rook-ceph-osd-1-bb68456bb-ssp6r
rook-ceph-osd-2-6f87d6fdcc-8sknc
rook-ceph-osd-3-69f8fd65d9-nhcv5
rook-ceph-osd-4-6fffbb8c46-btcdz
rook-ceph-osd-5-6bd4654c58-2pc5x
rook-ceph-osd-6-5b4d8c9595-8r6f6
rook-ceph-osd-7-57df9c4b4-xwdrg
rook-ceph-osd-8-8bcf9447b-fhmrk
(venv) wusui@localhost:~/ocs-ci$ for i in `cat /tmp/foo`; do oc -n openshift-storage describe pod $i | grep topology-location; done
topology-location-host=ocs-deviceset-1-data-0z54ww
topology-location-rack=rack1
topology-location-root=default
topology-location-host=ocs-deviceset-0-data-0r9rr2
topology-location-rack=rack2
topology-location-root=default
topology-location-host=ocs-deviceset-2-data-069r9t
topology-location-rack=rack0
topology-location-root=default
topology-location-host=ocs-deviceset-1-data-1g7qfw
topology-location-rack=rack1
topology-location-root=default
topology-location-host=ocs-deviceset-0-data-1g2jp2
topology-location-rack=rack2
topology-location-root=default
topology-location-host=ocs-deviceset-2-data-1fhmtq
topology-location-rack=rack0
topology-location-root=default
topology-location-host=ocs-deviceset-2-data-2zmc4m
topology-location-rack=rack0
topology-location-root=default
topology-location-host=ocs-deviceset-0-data-24tn86
topology-location-rack=rack2
topology-location-root=default
topology-location-host=ocs-deviceset-1-data-2xjzmk
topology-location-rack=rack1
topology-location-root=default
So the above output shows what I saw after I added three OSDs, waited some number of hours, and added three more OSDs.
The first three OSDs appear to be part of the cluster. The next three are still in Pending state and causing Ceph to not be healthy.
The OSDs are evenly allocated on the the nodes and so that appears to be correct. However the new OSDs are not Running.
Is this a separate bug to be reported? And this verifies that the new OSDS are evenly distributed, even if faultly. So has the
actual gist of this change been verified and we are running into another issue?
I am asking for more info for both of these questions.
On looking into the cluster, The newly added OSDs moved to pending state because of insufficient memory, probably we need to run in a cluster with higher specs in case of adding multiple OSDs
0/6 nodes are available: 3 Insufficient memory, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
According to Kesevan, the Pending status is due to the fact that we are out of available memory on this cluster. This is the expected behavior in this case. Since the distribution of osds added is even among all nodes added, I am marking this as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |