Bug 2131703 - Ceph is in HEALTH_WARN right after deployment with size 12
Summary: Ceph is in HEALTH_WARN right after deployment with size 12
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Leela Venkaiah Gangavarapu
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-03 11:44 UTC by Filip Balák
Modified: 2023-08-09 17:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-06 10:10:54 UTC
Embargoed:


Attachments (Terms of Use)

Description Filip Balák 2022-10-03 11:44:22 UTC
Description of problem:
Right after deployment of ODF Managed Service addon with size 12, the ceph is in unhealthy state:
HEALTH_WARN 1 slow ops, oldest one blocked for 9728 sec, mon.c has slow ops

Version-Release number of selected component (if applicable):
ocs-osd-deployer.v2.0.7

How reproducible:
1/1

Steps to Reproduce:
1.Deploy a service with dev addon:
rosa create service --type ocs-provider-dev --name fbalak-pr --machine-cidr 10.0.0.0/16 --size 12 --onboarding-validation-key <key> --subnet-ids <subnets> --region us-east-1
2. Check health status:
oc rsh -n openshift-storage $(oc get pods -n openshift-storage|grep tool|awk '{print$1}') ceph health

Actual results:
HEALTH_WARN 1 slow ops, oldest one blocked for 9728 sec, mon.c has slow ops

Expected results:
HEALTH_OK

Additional info:
The cluster was deployed with dev addon that contains changes to epic ODFMS-55.

Comment 2 Leela Venkaiah Gangavarapu 2022-10-10 11:25:44 UTC
hi,

- this seems to be a legit issue and need changes to resource calculations as well
- for time being, I'm assigning the bug to myself

@fbalak,

- does this effect the IO/management ops directly?

Thanks,
leela.

Comment 3 Leela Venkaiah Gangavarapu 2022-10-27 13:41:06 UTC
- Still awaiting to hear back any repercussions caused by this bug
- Orit is also looking into it and will await an update

Comment 4 Filip Balák 2022-10-27 13:54:27 UTC
No IO was tested with the cluster. This was a state right after installation without any operation.

Comment 6 Leela Venkaiah Gangavarapu 2022-11-04 07:40:44 UTC
- pls note above workaround has to applied after each upscale

Comment 7 Leela Venkaiah Gangavarapu 2022-11-08 09:33:21 UTC
- Bug is resolved, the dependent jira issue is fixed from OCM

Comment 16 Elad 2023-01-17 12:41:44 UTC
Moving to 4.12.z as the verification would be done against the ODF MS rollout that would be based on ODF 4.12

Comment 17 Elad 2023-01-17 13:18:12 UTC
Moving to VERIFIED based on regression testing.
We will clone this bug for the sake of verifying the scenario as part of ODF MS testing over ODF 4.12 or with the provider-consumer layout

Comment 19 Filip Balák 2023-02-06 10:10:54 UTC
Size 12 is not going to be supported now. --> CLOSED NOTABUG


Note You need to log in before you can comment on or make changes to this bug.