2132006 – Deployment of Managed Service cluster with size 20 fails

Bug 2132006 - Deployment of Managed Service cluster with size 20 fails

Summary: Deployment of Managed Service cluster with size 20 fails

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-managed-service
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Leela Venkaiah Gangavarapu
QA Contact:	Jilju Joy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-10-04 12:56 UTC by Filip Balák
Modified:	2023-08-09 17:00 UTC (History)
CC List:	6 users (show)
Fixed In Version:	2.0.11-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-23 06:53:56 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	2131237	0	unspecified	CLOSED	Managed Service cluster with size 8 can not be installed	2023-08-09 17:00:26 UTC

Description Filip Balák 2022-10-04 12:56:30 UTC

Description of problem:
Cluster with dev addon that contains changes in topology related to ODFMS-55 can not finish ODF addon installation. Intallation is turned after a while from Installing to Failed state with description: 'ocs-osd-deployer' : 'InstallCheckFailed'

Version-Release number of selected component (if applicable):
ocs-osd-deployer.v2.0.8

How reproducible:
1/1

Steps to Reproduce:
1. Install provider:
rosa create service --type ocs-provider-dev --name fbalak-pr --machine-cidr 10.0.0.0/16 --size 20 --onboarding-validation-key <key> --subnet-ids <subnet-ids> --region us-east-1
2. Wait until installation finishes.

Actual results:
Installation of the odf addon fails with description: 'ocs-osd-deployer' : 'InstallCheckFailed'

Expected results:
Installation succeeds.

Additional info:
The cluster was deployed with dev addon that contains changes to epic ODFMS-55.

Comment 2 Leela Venkaiah Gangavarapu 2022-10-10 11:15:11 UTC

hi,

- from the must-gather I can see the reason for the installation failure is due to unscheduled monitors
- however, at a previous time around (11:38 from events) I can see all monitors were scheduled and running fine
- by 13:40, all monitors were killed and when they were coming up again they went into unscheduled state and the installation stuck
- if possible, when you repeat the operation and this issue is hit, pls let me have access to the cluster
- I believe the nodes were getting upgraded during this time, however this doesn't explain why monitors failed to schedule in a later time

Thanks,
Leela.

Comment 3 Leela Venkaiah Gangavarapu 2022-10-27 13:38:29 UTC

- Still awaiting a confirmation whether this is hit or not

Comment 4 Jilju Joy 2022-11-24 05:01:11 UTC

Deployment of provider cluster with size 20 was tested with dev addon which contains the changes in topology related to ODFMS-55. Deployment was successful.
Adding must-gather logs for reference - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-n22-pr/jijoy-n22-pr_20221122T075048/logs/testcases_1669119379/

Comment 8 Ritesh Chikatwar 2023-01-23 08:49:22 UTC

As per Comment4 Closing the bug as it's fixed in 2.0.11-1 and verified by QA.

Note You need to log in before you can comment on or make changes to this bug.