Bug 1921811
Summary: | [IBM Z and Power] Ceph cluster goes into Warning state and also OSDs OOM during various tier1 tests listed | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Sravika <sbalusu> | ||||||
Component: | ceph | Assignee: | Neha Ojha <nojha> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Raz Tamir <ratamir> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.6 | CC: | aaaggarw, bniver, jdurgin, madam, manokuma, ocs-bugs | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2021-02-01 15:31:49 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Sravika
2021-01-28 16:26:39 UTC
Created attachment 1751752 [details]
OSDs describe
We are also facing multiple restarts of OSD pods due to OOMkilled on our IBM Power Platform. I ran scale,tier1 and also started performance test on the cluster. osd pod restarted 25 times. [root@ocs4-aaragga1-5ed0-bastion-0 ~]# oc get pods -n openshift-storage |grep osd-1 rook-ceph-osd-1-5bd4d44b6f-dd6nq 1/1 Running 25 3d12h So I setup kruize pod for monitoring the osd pod . So when the performance test was running , i checked the values generated by kruize. [root@ocs4-aaragga1-5ed0-bastion-0 ~]# curl http://kruize-openshift-monitoring.apps.ocs4-aaragga1-5ed0.ibm.com/recommendations?application_name=rook-ceph-osd-1 [ { "application_name": "rook-ceph-osd-1", "resources": { "requests": { "memory": "3427.2M", "cpu": 0.5 }, "limits": { "memory": "6415.4M", "cpu": 1.0 } } } ] We are using this storagecluster.yaml file for deploying our storagecluster-> https://github.com/red-hat-storage/ocs-ci/blob/master/ocs_ci/templates/ocs-deployment/ibm-storage-cluster.yaml we are having 3 worker nodes each having 16vcpus, 64GB memory and additional disk of 500GB and OCS version is 4.6.2 (4.6.2-233.ci) 3 osds have configuration as follows -> Limits: cpu: 2 memory: 5Gi Requests: cpu: 2 memory: 5Gi *** This bug has been marked as a duplicate of bug 1917815 *** |