Bug 1908076
Summary: | OVN installation degrades auth and network operator on s390x | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Tom Dale <tdale> |
Component: | Networking | Assignee: | Peng Liu <pliu> |
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | high | ||
Priority: | medium | CC: | aconstan, aos-bugs, danili, dcbw, jboxman, jokerman, krmoser, tdale, wvoesch |
Version: | 4.7 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | s390x | ||
OS: | Linux | ||
Whiteboard: | multi-arch | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-01-07 14:17:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1903544 | ||
Attachments: |
Description
Tom Dale
2020-12-15 19:46:03 UTC
After chatting with the bug creator, user stated that he was able to install OVN a week ago; however, earlier this week, user saw a degradation with installation on zVM or KVM and is now encountering this bug. Therefore, I'm marking this bug as "Blocker+" at the moment. If the Networking team deems it to be not a blocker, then please feel free to change the Blocker flag Hi Peng FYI: we assigned this to you seeing as how it concerns itself with an SDN -> OVN migration. Feel free to dipatch back to anyone else in case it turns out to be un-related to the migration procedure. Also, once we have a better picture of what is causing the issue, we can assess if it's a blocker or not /Alex Actually, taking a quick look in the attachment: could you please provide a description of which pod is failing in which networking namespace? Is it ovn-kubernetes? From the attachment it seems multus is crashLooping Could you do: oc get pod -A -owide oc get co could you please get all logs for all pods in openshift-ovn-kubernetes? /Alex Created attachment 1739680 [details]
oc get pods -A -o wide
❯ oc get pods -n openshift-ovn-kubernetes No resources found in openshift-ovn-kubernetes namespace. ❯ oc get pods -n openshift-sdn No resources found in openshift-sdn namespace. ❯ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-s390x-2020-12-10-094353 False True True 24h baremetal 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h cloud-credential 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h cluster-autoscaler 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h config-operator 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h console 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h csi-snapshot-controller 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 2d23h dns 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 3d etcd 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h image-registry 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 4d22h ingress 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 3d7h insights 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h kube-apiserver 4.7.0-0.nightly-s390x-2020-12-10-094353 True True False 5d1h kube-controller-manager 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h kube-scheduler 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h kube-storage-version-migrator 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 3d machine-api 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h machine-approver 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h machine-config 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 3d1h marketplace 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 3d monitoring 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 2d23h network 4.7.0-0.nightly-s390x-2020-12-10-094353 True True True 5d1h node-tuning 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h openshift-apiserver 4.7.0-0.nightly-s390x-2020-12-10-094353 False False False 24h openshift-controller-manager 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 3d10h openshift-samples 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h operator-lifecycle-manager 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h operator-lifecycle-manager-catalog 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h operator-lifecycle-manager-packageserver 4.7.0-0.nightly-s390x-2020-12-10-094353 False True False 24h service-ca 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h storage 4.7.0-0.nightly-s390x-2020-12-10-094353 True False False 5d1h My apologies, there were no pods showing in namespace because I was trying to recover to openshiftSDN. A fresh migration to OVNkube shows ❯ oc get pods -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-master-ks54v 6/6 Running 0 15m ovnkube-master-kt7zw 6/6 Running 1 15m ovnkube-master-mqbv4 6/6 Running 2 15m ovnkube-node-29fk8 3/3 Running 0 15m ovnkube-node-2tksm 2/3 CrashLoopBackOff 5 15m ovnkube-node-4b9wc 3/3 Running 3 15m ovnkube-node-dfwf6 3/3 Running 4 15m ovnkube-node-mt5mp 2/3 CrashLoopBackOff 5 15m ovs-node-2ssqr 1/1 Running 0 15m ovs-node-d5bnp 1/1 Running 0 15m ovs-node-hctzw 1/1 Running 0 15m ovs-node-jlbkz 1/1 Running 0 15m ovs-node-q2nt6 1/1 Running 0 15m logs to follow Created attachment 1739698 [details]
oc logs -n openshift-ovn-kubernetes --all-containers=true pod/ovnkube-node-mt5mp
Logs for the failed ovnkube pod .
Tried OVN migration on fresh ocp Version: 4.7.0-0.nightly-s390x-2020-12-15-081322 z/VM install and got same issues with the ovnkube pod logs showing same errors as attached KVM logs WARN|Bridge 'br-local' not found for network 'locnet Followed through updated ovn documentation with steps for and still am getting degraded network, openshift-apiserver, and authentication operators. Hitting issue at step 10c. # oc get pod -n openshift-machine-config-operator NAME READY STATUS RESTARTS AGE machine-config-controller-7685b58b68-bv95p 0/1 ContainerCreating 2 42h machine-config-daemon-4tdmt 2/2 Running 0 41h machine-config-daemon-75vp8 2/2 Running 0 42h machine-config-daemon-fxkt7 2/2 Running 0 42h machine-config-daemon-gclhz 2/2 Running 0 41h machine-config-daemon-q72vl 2/2 Running 0 42h machine-config-operator-5ccbfcbdfd-b7r4b 0/1 ContainerCreating 1 42h machine-config-server-b965g 1/1 Running 0 42h machine-config-server-bqfzj 1/1 Running 0 42h machine-config-server-r5gfq 1/1 Running 0 42h However even as system:admin I cannot read the logs from the two pods that are stuck in a ContainerCreating state [root@ospamgr3 ovn-debug]# oc logs pod/machine-config-controller-7685b58b68-bv95p -n openshift-machine-config-operator unable to retrieve container logs for cri-o://0ad931954727ba5d5e0def37a4c32e63d8c2a3d776d022ae2a552f49f26939ee Created attachment 1744353 [details]
oc describe co network
Created attachment 1744354 [details]
oc describe co openshift-apiserver
Created attachment 1744357 [details]
oc describe pods -n openshift-ovn-kubernetes
Re-assigning this bug to the Networking team to get their input on Comment 12 as the creator followed the updated documentation and observed the bug. Please re-assign if necessary. ovn-kube node seems to be crashing with: State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: hub.com/openshift/ovn-kubernetes/go-controller/pkg/node/startup-waiter.go:44 +0x7e created by github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*startupWaiter).Wait /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/startup-waiter.go:42 +0xde panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11f6c38] goroutine 268 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x162 panic(0x13fff00, 0x2358ee0) /usr/lib/golang/src/runtime/panic.go:969 +0x16e github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*startupWaiter).Wait.func1.1(0x0, 0x0, 0x0) /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/startup-waiter.go:45 +0x28 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0xc000744fa0, 0x1497540, 0x0, 0x0) /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:211 +0x66 k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal(0xc000787f20, 0xc0001c6fa0, 0xc000787f20, 0x0) /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:445 +0x2a k8s.io/apimachinery/pkg/util/wait.PollImmediate(0x1dcd6500, 0x45d964b800, 0xc000744fa0, 0x0, 0x0) /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:441 +0x48 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*startupWaiter).Wait.func1(0xc000182a00, 0xc0004c7d40, 0xc0002bfa70) /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/startup-waiter.go:44 +0x7e created by github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*startupWaiter).Wait /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/startup-waiter.go:42 +0xde Is this a dupe of Bug 1908231 ? Same backtrace. The multi-arch bug triage team looked through the bugs and we think that this bug is similar to BZ 1909187 found on Power Not sure if its the same as 1909187 as I don't have any csrs # oc get csr --all-namespaces No resources found @Tom it looks like a dupe of Bug 1908231. Could you test with the latest 4.7 build? This tested and failed with OCP 4.7.0-0.nightly-s390x-2020-12-21-160105, the latest build available on the public mirror. Update, looks like this issue is fixed in new build Server Version: 4.7.0-0.nightly-s390x-2021-01-05-214454. Successfully installed OVN on z-KVM. Will close issue once I verify no issue on z/VM as well. Issue fixed on z/VM cluster as well. Thanks for the help. *** This bug has been marked as a duplicate of bug 1908231 *** |