Bug 2003543

Summary: timed out waiting for annotations: context deadline exceeded
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: NetworkingAssignee: Patryk Diak <pdiak>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: high CC: aconstan
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-17 09:27:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2021-09-13 08:10:34 UTC
Description of problem:
IPI_GCP_OVN_IPSec FIPS enabled cluster
**************
networkType: "OVNKubernetes"
ovn_ipsec_config: "yes"
fips_enable: true
**************
# oc get network/cluster -oyaml
...
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
...
# oc -n openshift-ovn-kubernetes get pod
NAME                   READY   STATUS    RESTARTS        AGE
ovn-ipsec-6wbbm        1/1     Running   0               5h9m
ovn-ipsec-7qk28        1/1     Running   0               4h57m
ovn-ipsec-f2qh5        1/1     Running   0               4h55m
ovn-ipsec-rxzdj        1/1     Running   0               4h59m
ovn-ipsec-w66g8        1/1     Running   0               5h9m
ovn-ipsec-wttkp        1/1     Running   0               5h9m
ovnkube-master-7j5gd   6/6     Running   6 (5h7m ago)    5h9m
ovnkube-master-rc9cs   6/6     Running   0               5h9m
ovnkube-master-zsgnl   6/6     Running   7 (5h7m ago)    5h9m
ovnkube-node-8pmpp     4/4     Running   1 (5h6m ago)    5h9m
ovnkube-node-9jpmz     4/4     Running   1 (5h6m ago)    5h9m
ovnkube-node-9rjqn     4/4     Running   1 (4h54m ago)   4h57m
ovnkube-node-g64qb     4/4     Running   1 (5h6m ago)    5h9m
ovnkube-node-tr5r5     4/4     Running   1 (4h54m ago)   4h55m
ovnkube-node-txxbb     4/4     Running   4 (4h58m ago)   4h59m

# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          4h50m   Unable to apply 4.9.0-0.nightly-2021-09-10-170926: the cluster operator monitoring has not yet successfully rolled out

# oc get co monitoring -oyaml
...
  - lastTransitionTime: "2021-09-13T03:33:37Z"
    message: 'Failed to rollout the stack. Error: updating alertmanager: waiting for
      Alertmanager object changes failed: waiting for Alertmanager openshift-monitoring/main:
      expected 3 replicas, got 2 updated replicas'
    reason: UpdatingAlertmanagerFailed
    status: "True"
    type: Degraded
...
# oc -n openshift-monitoring get pod -o wide | grep alertmanager-main
alertmanager-main-0                            5/5     Running             0               5h15m   10.129.2.7     jwei0913-1-4wkkj-worker-b-6x892.c.openshift-qe.internal   <none>           <none>
alertmanager-main-1                            5/5     Running             0               5h15m   10.131.0.19    jwei0913-1-4wkkj-worker-c-bf6w7.c.openshift-qe.internal   <none>           <none>
alertmanager-main-2                            0/5     ContainerCreating   0               5h15m   <none>         jwei0913-1-4wkkj-worker-b-6x892.c.openshift-qe.internal   <none>           <none>


# oc -n openshift-monitoring describe pod alertmanager-main-2
...
Events:
  Type     Reason                  Age                    From     Message
  ----     ------                  ----                   ----     -------
  Warning  FailedCreatePodSandBox  20s (x215 over 4h23m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_alertmanager-main-2_openshift-monitoring_c074794b-5087-44aa-928d-9ca89e5a820d_0(b8b148d66cc8cd56f4ad22e403aceb71f510c23839636b9acec13cc58a62160e): error adding pod openshift-monitoring_alertmanager-main-2 to CNI network "multus-cni-network": [openshift-monitoring/alertmanager-main-2:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-monitoring/alertmanager-main-2 b8b148d66cc8cd56f4ad22e403aceb71f510c23839636b9acec13cc58a62160e] [openshift-monitoring/alertmanager-main-2 b8b148d66cc8cd56f4ad22e403aceb71f510c23839636b9acec13cc58a62160e] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded


# oc -n openshift-monitoring get pod alertmanager-main-2 -oyaml | grep annotations -A10
  annotations:
    kubectl.kubernetes.io/default-container: alertmanager
    openshift.io/scc: nonroot
  creationTimestamp: "2021-09-13T03:23:34Z"
  generateName: alertmanager-main-
  labels:
    alertmanager: main
    app: alertmanager
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/managed-by: prometheus-operator

normal alertmanager pod should be like 
# oc -n openshift-monitoring get pod alertmanager-main-1 -oyaml | grep annotations -A30
  annotations:
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.131.0.19/23"],"mac_address":"0a:58:0a:83:00:13","gateway_ips":["10.131.0.1"],"ip_address":"10.131.0.19/23","gateway_ip":"10.131.0.1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.131.0.19"
          ],
          "mac": "0a:58:0a:83:00:13",
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.131.0.19"
          ],
          "mac": "0a:58:0a:83:00:13",
          "default": true,
          "dns": {}
      }]
    kubectl.kubernetes.io/default-container: alertmanager
    openshift.io/scc: nonroot
  creationTimestamp: "2021-09-13T03:23:34Z"
  generateName: alertmanager-main-
  labels:
    alertmanager: main
    app: alertmanager


Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-09-10-170926

How reproducible:
not sure

Steps to Reproduce:
1. check cluster operator
2.
3.

Actual results:
alertmanager-main-2 pod down

Expected results:
alertmanager-main-2 should be normal

Additional info:

Comment 4 Patryk Diak 2021-09-17 09:27:47 UTC

*** This bug has been marked as a duplicate of bug 1997072 ***