Bug 2209623 - [Fussion-aaS] Agent deployment on Privatelink setup failed for provider and mons, osds are not created
Summary: [Fussion-aaS] Agent deployment on Privatelink setup failed for provider and m...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Rewant
QA Contact: suchita
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-24 09:34 UTC by suchita
Modified: 2023-08-09 17:00 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description suchita 2023-05-24 09:34:45 UTC
Description of problem:
Agent deployment on Privatelink setup failed for provider and mons, osds are not created
rook-ceph-tools stuck in ContainerCreating

Version-Release number of selected component (if applicable):
catsrc image: image: quay.io/rhceph-dev/ocs-registry:4.12.3-17

$ oc get csv ocs-operator.v4.12.3-rhodf -o yaml | grep full
    full_version: 4.12.3-17
ROSA/OCP  4.12.16 


How reproducible:
3/3


Steps to Reproduce:
1. Deploy ROSA Privatelink cluster
2. Deploy agent with image of 4.12.3-17 or 4.12.3-16
3. 

Actual results:
rook-ceph-tools stuck in ContainerCreating state, 
mons, osds are not created

Expected results:
All pods should be created and csv should be succeeded state 

Additional info:
$ oc get pods
NAME                                                              READY   STATUS              RESTARTS   AGE
4261301e16ee6ab3d8f05ac122f9f46dd6d6b340676b8aeb8a8c49a9d7qkw7f   0/1     Completed           0          4m26s
managed-fusion-offering-catalog-z7jsx                             1/1     Running             0          4m57s
ocs-metrics-exporter-8d85ddb7f-dkcjq                              1/1     Running             0          3m45s
ocs-operator-d9b889d-f7rlt                                        1/1     Running             0          3m46s
rook-ceph-operator-6df5dcc59-fdvxr                                1/1     Running             0          3m46s
rook-ceph-tools-774c9c597c-4gqrl                                  0/1     ContainerCreating   0          3m34s


$ oc get service
NAME                              TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
managed-fusion-offering-catalog   ClusterIP      172.30.165.125   <none>        50051/TCP         11m
ocs-provider-server               LoadBalancer   172.30.126.91    <pending>     50051:31659/TCP   9m40s

$ oc describe service ocs-provider-server
Name:                     ocs-provider-server
Namespace:                fusion-storage
Labels:                   <none>
Annotations:              service.beta.openshift.io/serving-cert-secret-name: ocs-provider-server-cert
Selector:                 app=ocsProviderApiServer
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.30.126.91
IPs:                      172.30.126.91
Port:                     <unset>  50051/TCP
TargetPort:               ocs-provider/TCP
NodePort:                 <unset>  31659/TCP
Endpoints:                <none>
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type     Reason                  Age                  From                Message
  ----     ------                  ----                 ----                -------
  Normal   EnsuringLoadBalancer    4m52s (x7 over 10m)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  4m52s (x7 over 10m)  service-controller  Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB

Comment 1 Rewant 2023-05-24 12:48:52 UTC
Going through the AWS docs, I found out that for LoadBalancer in private link clusters, we need to tag all the private subnet on the aws side and annotate the service from cluster.


Tags required ->
kubernetes.io/cluster/sgatfane-p3421-8x6jn	shared (should already exist)
kubernetes.io/role/internal-elb	1

Annotation required  for the ocs-provider-server service->
service.beta.kubernetes.io/aws-load-balancer-internal: "true"

Docs:
https://docs.aws.amazon.com/eks/latest/userguide/network-load-balancing.html


Note You need to log in before you can comment on or make changes to this bug.