Bug 1911265 - Submariner join failed: Deployment does not have minimum availability
Summary: Submariner join failed: Deployment does not have minimum availability
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Submariner
Version: rhacm-2.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Nir Yechiel
QA Contact: Noam Manos
Christopher Dawson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-28 12:31 UTC by Noam Manos
Modified: 2021-02-22 14:41 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-03 14:26:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github open-cluster-management backlog issues 8191 0 None None None 2021-02-22 14:41:09 UTC

Description Noam Manos 2020-12-28 12:31:59 UTC
Description of problem:
Submariner join failed: Deployment does not have minimum availability


Version-Release number of selected component (if applicable):
Submariner 0.8.0

How reproducible:
Always on my d/s CI:
https://qe-jenkins-csb-skynet.cloud.paas.psi.redhat.com/job/debug_job/901/Test-Report/

Steps to Reproduce:

subctl join   ./broker-info.subm --cable-driver libreswan   --ikeport 501 --nattport 4501 --enable-pod-debugging --ipsec-debug --health-check     --image-override submariner-operator=registry.gitlab.com/smattar/submariner-rhel8-operator:v0.8.0     --image-override submariner-gateway=registry.gitlab.com/smattar/submariner-gateway-rhel8:v0.8.0     --image-override submariner-route-agent=registry.gitlab.com/smattar/submariner-route-agent-rhel8:v0.8.0     --image-override submariner-globalnet=registry.gitlab.com/smattar/submariner-globalnet-rhel8:v0.8.0     --image-override submariner-networkplugin-syncer=registry.gitlab.com/smattar/submariner-networkplugin-syncer-rhel8:v0.8.0     --image-override lighthouse-agent=registry.gitlab.com/smattar/lighthouse-agent-rhel8:v0.8.0     --image-override lighthouse-coredns=registry.gitlab.com/smattar/lighthouse-coredns-rhel8:v0.8.0


Actual results:

https://api.nmanos-cluster-a.devcluster.openshift.com:6443
 • Discovering network details  ...
* There are 1 labeled nodes in the cluster:
  - ip-10-166-25-149.us-west-1.compute.internal
 ✓ Discovering network details
    Discovered network details:
        Network plugin:  OpenShiftSDN
        Service CIDRs:   [100.96.0.0/16]
        Cluster CIDRs:   [10.252.0.0/14]
 • Discovering multi cluster details  ...
 • Validating Globalnet configurations  ...
 ✓ Validating Globalnet configurations
 • Assigning Globalnet IPs  ...
 ✓ Assigning Globalnet IPs
 ✓ Allocated GlobalCIDR: 169.254.0.0/19
 ✓ Discovering multi cluster details
 • Deploying the Submariner operator  ...
 ✗ Deploying the Submariner operator
 ✓ Created operator CRDs
 ✓ Created operator namespace: submariner-operator
 ✓ Created operator service account and role
 ✓ Created lighthouse service account and role
 ✓ Created Lighthouse service accounts and roles
Error deploying the operator: timed out waiting for the condition

Pod logs show:

image: registry.gitlab.com/smattar/submariner-rhel8-operator:v0.8.0
          imagePullPolicy: Always
          name: submariner-operator
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: submariner-operator
        serviceAccountName: submariner-operator
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2020-12-28T09:17:41Z"
      lastUpdateTime: "2020-12-28T09:17:41Z"
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    - lastTransitionTime: "2020-12-28T09:27:42Z"
      lastUpdateTime: "2020-12-28T09:27:42Z"
      message: ReplicaSet "submariner-operator-677668ff95" has timed out progressing.
      reason: ProgressDeadlineExceeded
      status: "False"
      type: Progressing
    observedGeneration: 1
    replicas: 1
    unavailableReplicas: 1
    updatedReplicas: 
Expected results:
Join should complete

Comment 2 Mike Ng 2021-01-04 14:24:05 UTC
G2Bsync 753781471 comment 
 skeeey Mon, 04 Jan 2021 06:20:07 UTC 
 G2Bsync
this seems the operator pod cannot be created, would you increase the cpu/memory for the submariner operator?

Comment 3 Noam Manos 2021-01-05 15:06:03 UTC
This seems to be related to the images availability on the remote registry.

In another attempt:

subctl join   ./broker-info.subm --cable-driver libreswan   --ikeport 501 --nattport 4501 --health-check --image-override submariner-operator=registry.redhat.io/rhacm2-tech-preview/submariner-rhel8-operator:v0.8.0 --image-override submariner=registry.redhat.io/rhacm2-tech-preview/submariner-gateway-rhel8:v0.8.0
16:44:21 * ./broker-info.subm says broker is at: https://api.nmanos-cluster-a.devcluster.openshift.com:6443
16:44:21 * There are 1 labeled nodes in the cluster:
16:44:21  ��� Discovering network details  ...
16:44:21   - default-cl1-l5mpb-worker-8d6gh
16:44:21  ��� Discovering network details
16:44:21     Discovered network details:
16:44:21         Network plugin:  OpenShiftSDN
16:44:21         Service CIDRs:   [100.96.0.0/16]
16:44:21         Cluster CIDRs:   [10.252.0.0/14]
16:44:22  ��� Discovering multi cluster details  ...
16:44:22  ��� Validating Globalnet configurations  ...
16:44:22  ��� Validating Globalnet configurations
16:44:22  ��� Assigning Globalnet IPs  ...
16:44:22  ��� Assigning Globalnet IPs
16:44:22  ��� Allocated GlobalCIDR: 169.254.32.0/19
16:44:22  ��� Discovering multi cluster details
16:44:22  ��� Deploying the Submariner operator  ...

16:54:22  ��� Deploying the Submariner operator
16:54:22  ��� Created operator CRDs
16:54:22  ��� Created operator service account and role
16:54:22  ��� Created lighthouse service account and role
16:54:22  ��� Created Lighthouse service accounts and roles
16:54:22 Error deploying the operator: timed out waiting for the condition
16:54:22 
16:54:22 subctl version: v0.8.0


Looking at globalnet pod I see:

16:54:39   brokerK8sRemoteNamespace:  submariner-k8s-broker
16:54:39   Cable Driver:              libreswan
16:54:39   Ce IP Sec Debug:           false
16:54:39   Ce IP Sec IKE Port:        501
16:54:39   Ce IP Sec NATT Port:       4501
16:54:39   Ce IP Sec PSK:             up9ryrrxCxn3ngjyOrJvyKLO+mw6r+wTxbV/Nj/2njBOZmG08/yIbs8VbylC6Pjn
16:54:39   Cluster CIDR:              
16:54:39   Cluster ID:                nmanos-cluster-a
16:54:39   Color Codes:               blue
16:54:39   Connection Health Check:
16:54:39     Enabled:                true
16:54:39     Interval Seconds:       1
16:54:39     Max Packet Loss Count:  5
16:54:39   Debug:                    false
16:54:39   Global CIDR:              169.254.0.0/19
16:54:39   Image Overrides:
16:54:39     Submariner:               registry.redhat.io/rhacm2-tech-preview/submariner-gateway-rhel8:v0.8.0
16:54:39     Submariner - Operator:    registry.redhat.io/rhacm2-tech-preview/submariner-rhel8-operator:v0.8.0
16:54:39   Namespace:                  submariner-operator
16:54:39   Nat Enabled:                true
16:54:39   Repository:                 quay.io/submariner
16:54:39   Service CIDR:               
16:54:39   Service Discovery Enabled:  true
16:54:39   Version:                    0.8.0
16:54:39 Status:
16:54:39   Cluster CIDR:  10.252.0.0/14
16:54:39   Cluster ID:    nmanos-cluster-a
16:54:39   Color Codes:   blue
16:54:39   Engine Daemon Set Status:
16:54:39     Last Resource Version:        183930
16:54:39     Mismatched Container Images:  false
16:54:39     Non Ready Container States:
16:54:39     Status:
16:54:39       Current Number Scheduled:  1
16:54:39       Desired Number Scheduled:  1
16:54:39       Number Available:          1
16:54:39       Number Misscheduled:       0
16:54:39       Number Ready:              1
16:54:39       Observed Generation:       1
16:54:39       Updated Number Scheduled:  1
16:54:39   Gateways:
16:54:39     Connections:
16:54:39     Ha Status:  active
16:54:39     Local Endpoint:
16:54:39       Backend:          libreswan
16:54:39       cable_name:       submariner-cable-nmanos-cluster-a-10-166-80-167
16:54:39       cluster_id:       nmanos-cluster-a
16:54:39       Health Check IP:  10.254.2.1
16:54:39       Hostname:         ip-10-166-80-167
16:54:39       nat_enabled:      true
16:54:39       private_ip:       10.166.80.167
16:54:39       public_ip:        13.57.57.11
16:54:39       Subnets:
16:54:39         169.254.0.0/19
16:54:39     Status Failure:  
16:54:39     Version:         0.8.0
16:54:39   Global CIDR:       169.254.0.0/19
16:54:39   Globalnet Daemon Set Status:
16:54:39     Last Resource Version:        183753
16:54:39     Mismatched Container Images:  false
16:54:39     Non Ready Container States:
16:54:39       Waiting:
16:54:39         Message:  rpc error: code = Unknown desc = Error reading manifest 0.8.0 in quay.io/submariner/submariner-globalnet-rhel8: unauthorized: access to the requested resource is not authorized
16:54:39         Reason:   ErrImagePull

Comment 4 Noam Manos 2021-01-12 09:05:59 UTC
To workaround this I used SubCtl DEVEL version (above 0.8.0).


Note You need to log in before you can comment on or make changes to this bug.