Bug 1997205
| Summary: | ovn-kube test flake "operator conditions network" "DaemonSet "..." is not available (awaiting 1 nodes) | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> |
| Component: | Networking | Assignee: | Surya Seetharaman <surya> |
| Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | low | CC: | qili, rravaiol, trozet, zzhao |
| Version: | 4.9 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-23 10:03:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Dan Winship
2021-08-24 15:56:22 UTC
we also hit this issue on ipv4 cluster.
$ oc get pod -n openshift-network-diagnostics -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
network-check-source-79dbb9d7bf-mc6qh 1/1 Running 0 10h 10.128.2.13 ip-10-0-219-200.us-east-2.compute.internal <none> <none>
network-check-target-4wk49 1/1 Running 0 10h 10.129.2.4 ip-10-0-163-213.us-east-2.compute.internal <none> <none>
network-check-target-68bgx 1/1 Running 0 10h 10.128.0.4 ip-10-0-175-19.us-east-2.compute.internal <none> <none>
network-check-target-9d8s4 1/1 Running 0 10h 10.130.0.4 ip-10-0-153-43.us-east-2.compute.internal <none> <none>
network-check-target-jsnpx 0/1 ContainerCreating 0 10h <none> ip-10-0-146-178.us-east-2.compute.internal <none> <none>
network-check-target-pwsfs 1/1 Running 0 10h 10.128.2.4 ip-10-0-219-200.us-east-2.compute.internal <none> <none>
network-check-target-zpnfg 1/1 Running 0 10h 10.129.0.4 ip-10-0-210-23.us-east-2.compute.internal <none> <none>
$ oc describe pod network-check-target-jsnpx -n openshift-network-diagnostics
Name: network-check-target-jsnpx
Namespace: openshift-network-diagnostics
Priority: 1000000000
Priority Class Name: openshift-user-critical
Node: ip-10-0-146-178.us-east-2.compute.internal/10.0.146.178
Start Time: Thu, 02 Sep 2021 00:49:29 +0800
Labels: app=network-check-target
controller-revision-hash=55cd96fd4f
kubernetes.io/os=linux
pod-template-generation=1
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.131.0.4/23"],"mac_address":"0a:58:0a:83:00:04","gateway_ips":["10.131.0.1"],"ip_address":"10.131.0.4/23","...
openshift.io/scc: restricted
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/network-check-target
Containers:
network-check-target-container:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:159bd605c612c8eee47fa1538420d3161e5b26baae02e63b10750f30da758818
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
Command:
cluster-network-check-target
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 10m
memory: 15Mi
Readiness: http-get http://:8080/ delay=30s timeout=10s period=10s #success=1 #failure=3
Environment:
K8S_NODE_NAME: (v1:spec.nodeName)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rkzsh (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-rkzsh:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 4m17s (x501 over 10h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-check-target-jsnpx_openshift-network-diagnostics_7edf4ac7-239f-44c8-8761-7e77ad1ab0f4_0(c3533e05e600600ea0ca59ca5df4f30452d9f3f0f7e91186287cfea2b202fb13): error adding pod openshift-network-diagnostics_network-check-target-jsnpx to CNI network "multus-cni-network": [openshift-network-diagnostics/network-check-target-jsnpx:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-network-diagnostics/network-check-target-jsnpx c3533e05e600600ea0ca59ca5df4f30452d9f3f0f7e91186287cfea2b202fb13] [openshift-network-diagnostics/network-check-target-jsnpx c3533e05e600600ea0ca59ca5df4f30452d9f3f0f7e91186287cfea2b202fb13] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:83:00:04 [10.131.0.4/23]
from above the pod is binding to 0a:58:0a:83:00:04 [10.131.0.4/23]. However from the ovn-controller logs. it is binding Claiming 0a:58:0a:83:00:03 10.131.0.3
ovnkube-node-nd844-ovn-controller.log:2021-09-02T00:37:19Z|02417|binding|INFO|openshift-network-diagnostics_network-check-target-jsnpx: Claiming 0a:58:0a:83:00:03 10.131.0.3
must-gather logs in : http://file.apac.redhat.com/~zzhao/must-gather.local.203549028165079256.tar.gz *** Bug 2003558 has been marked as a duplicate of this bug. *** If we look at the logs on ovnkube-master:
2021-08-23T19:02:07.677909398Z I0823 19:02:07.677878 1 kube.go:67] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["fd01:0:0:4::3/64"],"mac_address":"0a:58:92:02:16:1f","gateway_ips":["fd01:0:0:4::1"],"ip_address":"fd01:0:0:4::3/64","gateway_ip":"fd01:0:0:4::1"}}] on pod openshift-multus/network-metrics-daemon-8nn5z
2021-08-23T19:02:07.719510291Z I0823 19:02:07.719472 1 ovn.go:458] [b463fd6d-b041-478e-867a-c3c92244f8bf/openshift-multus/network-metrics-daemon-8nn5z] retry pod setup
2021-08-23T19:02:07.720064280Z I0823 19:02:07.720044 1 pods.go:251] [openshift-multus/network-metrics-daemon-8nn5z] addLogicalPort took 66.78163ms
2021-08-23T19:02:07.775485278Z I0823 19:02:07.775415 1 kube.go:67] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["fd01:0:0:4::4/64"],"mac_address":"0a:58:50:2d:e9:e7","gateway_ips":["fd01:0:0:4::1"],"ip_address":"fd01:0:0:4::4/64","gateway_ip":"fd01:0:0:4::1"}}] on pod openshift-multus/network-metrics-daemon-8nn5z
2021-08-23T19:02:07.815623813Z I0823 19:02:07.810948 1 pods.go:251] [openshift-multus/network-metrics-daemon-8nn5z] addLogicalPort took 91.446109ms
2021-08-23T19:02:07.815623813Z E0823 19:02:07.810975 1 ovn.go:555] error while creating logical port openshift-multus_network-metrics-daemon-8nn5z error: Transaction Failed due to an error: constraint violation details: Transaction causes multiple rows in "Logical_Switch_Port" table to have identical values (openshift-multus_network-metrics-daemon-8nn5z) for index on column "name". First row, with UUID 356acf71-787c-4b49-a048-2c69b9a1ff63, existed in the database before this transaction, which modified some of the row's columns but not any columns in this index. Second row, with UUID 1d285325-fbeb-47ba-8338-bafbb2a49276, was inserted by this transaction. in committing transaction
2021-08-23T19:02:07.815623813Z I0823 19:02:07.811029 1 ovn.go:464] [b463fd6d-b041-478e-867a-c3c92244f8bf/openshift-multus/network-metrics-daemon-8nn5z] setup retry failed; will try again later
2021-08-23T19:02:07.815623813Z I0823 19:02:07.811249 1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-multus", Name:"network-metrics-daemon-8nn5z", UID:"b463fd6d-b041-478e-867a-c3c92244f8bf", APIVersion:"v1", ResourceVersion:"26503", FieldPath:""}): type: 'Warning' reason: 'ErrorAddingLogicalPort' error while creating logical port openshift-multus_network-metrics-daemon-8nn5z error: Transaction Failed due to an error: constraint violation details: Transaction causes multiple rows in "Logical_Switch_Port" table to have identical values (openshift-multus_network-metrics-daemon-8nn5z) for index on column "name". First row, with UUID 356acf71-787c-4b49-a048-2c69b9a1ff63, existed in the database before this transaction, which modified some of the row's columns but not any columns in this index. Second row, with UUID 1d285325-fbeb-47ba-8338-bafbb2a49276, was inserted by this transaction. in committing transaction
You are most likely hitting either/both of:
https://bugzilla.redhat.com/show_bug.cgi?id=2004340
https://bugzilla.redhat.com/show_bug.cgi?id=1998614
Looks like this job has been mostly passing now for the last 4-5 days: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-ovn-ipv6 Dan, are we OK to close this bug? *** Bug 2006751 has been marked as a duplicate of this bug. *** Yeah, if you think it's fixed, then go ahead and close it. (This is just a recurring failure I noticed in the CI logs. I don't have any extra insight about it, or any way of reproducing it.) @Tim @Dan I reproduced this issue on the latest 4.9 build 4.9.0-0.nightly-2021-09-21-215600 see https://bugzilla.redhat.com/show_bug.cgi?id=2006751 @zhaozhanqi: https://bugzilla.redhat.com/show_bug.cgi?id=2006751 is not a bug actually. When you see the message "Reconnecting...Transaction Failed due to an error: constraint violation details:", and then subsequent logs show the pod creations happening successfully, its all good. https://bugzilla.redhat.com/show_bug.cgi?id=1998614 actually fixes the transaction error bug and the fix forces a reconnect from ovn-k to nbdb to solve it. If the transaction error persists that's when its a bug. Basically its what Tim said above. Closing this as dupe of 1998614 and 2004340. Please reopen in case you see pods are stuck and not getting created or something along those lines. Cheers, Surya. *** This bug has been marked as a duplicate of bug 1998614 *** *** Bug 2006751 has been marked as a duplicate of this bug. *** |