Bug 1920867
| Summary: | OCP 4.7 on Z fails to install for KVM when specifying networkType OVNKubernetes in the install-config.yaml file | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | krmoser | ||||||
| Component: | Multi-Arch | Assignee: | Dennis Gilmore <dgilmore> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | Barry Donahue <bdonahue> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 4.7 | CC: | amccrae, cbaus, chanphil, christian.lapolt, Holger.Wolf, krmoser, psundara, tdale, wvoesch | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-01-29 13:06:45 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1903544 | ||||||||
| Attachments: |
|
||||||||
Can you provide the oc adm must-gather output? Carvel, Please find attached 2 oc adm must-gather tar.gz files: 1. 4.7.0-0.nightly-s390x-2021-01-27-164108.must-gather.local.2636577120531980503.tar.gz for OCP build 4.7.0-0.nightly-s390x-2021-01-27-164108 with networkType "OpenShiftSDN" value. This installation completes successfully. 2. 4.7.0-0.nightly-s390x-2021-01-27-164108.must-gather.local.5719890522243309048.OVNKubenetes.tar.gz for OCP build 4.7.0-0.nightly-s390x-2021-01-27-164108 with networkType "OVNKubernetes" value. This installation fails to install successfully with the issue described in my initial post. Thank you, Kyle Created attachment 1751605 [details]
OCP 4.7 install with networkType "OpenShiftSDN"
oc adm must gather for networkType "OpenShiftSDN": installs successfully
Created attachment 1751611 [details]
OCP 4.7 install with networkType "OVNKubernetes"
oc adm must gather for networkType "OVNKubernetes": fails to complete installation
Summarizing the discussion with Kyle today: This problem seems to be because the master nodes were allocated only 8G of memory. We were able to reproduce the issue and the problem did not exist when the memory was bumped to 16G. 16G is the recommended default memory setting for masters for OCP. The symptoms pointing us to the fact that it was due to insufficient memory was logs like these: Jan 28 15:10:34 root-ctlplane-2 hyperkube[1791]: W0128 15:10:34.934196 1791 predicate.go:113] Failed to admit pod kube-apiserver-root-ctlplane-2_openshift-kube-apiserver(dd3c4e54-b16e-4625-911d-a6fcb30e888d) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: memory, q: 124293632), which indicate that kubernetes tries to remove pods with lower priority in order to accommodate higher priority pods and in this case it did not find any pods which it could kick out and thus the apiserver never started on some masters. Kyle, Do you agree that this bug can be closed? Thanks Prashanth Prashanth, Thank you again to Andy and you for your assistance. Yes, please close this bug. FYI. Using the appropriate master node memory size (16GB), we've successfully tested the 10 following OCP 4.7 on Z builds using the networkType: OVNKubernetes install option. 1. 4.7.0-0.nightly-s390x-2021-01-27-164108 2. 4.7.0-0.nightly-s390x-2021-01-28-005008 3. 4.7.0-0.nightly-s390x-2021-01-28-023813 4. 4.7.0-0.nightly-s390x-2021-01-28-052030 5. 4.7.0-0.nightly-s390x-2021-01-28-064716 6. 4.7.0-0.nightly-s390x-2021-01-28-084706 7. 4.7.0-0.nightly-s390x-2021-01-28-113553 8. 4.7.0-0.nightly-s390x-2021-01-28-140116 9. 4.7.0-0.nightly-s390x-2021-01-28-192809 10. 4.7.0-0.nightly-s390x-2021-01-28-220317 Thank you, Kyle |
Description of problem: 1. OCP 4.7 on Z fails to install for KVM when specifying networkType "OVNKubernetes" in the install-config.yaml file. 2. OCP 4.7 on Z successfully installs for zVM when specifying networkType "OVNKubernetes" in the install-config.yaml file. 3. The exact same OCP 4.7 on Z builds successfully install for KVM when specifying (the default) networkType "OpenShiftSDN" value in the install-config.yaml file. Version-Release number of selected component (if applicable): 4.7.0-0.nightly-s390x-2021-01-24-004935 and 3-4 additional recent OCP 4.7 builds tested. How reproducible: Consistently reproducible Steps to Reproduce: 1. Update the networkType value to OVNKubernetes in the install-config.yaml file. 2. Proceed with OCP 4.7 on Z KVM cluster installation. Actual results: 1. The OCP 4.7 on Z build fails to complete the installation, including after 10+ hours tested/observed. 2. The authentication, console, and kube-apiserver cluster operators do not achieve AVAILABLE status. 3. The kube-controller-manager cluster operator is in a degraded state. 4. Actual OCP 4.7 on Z cluster install information after 10+ hours: # ssh 192.168.79.1 oc get nodes NAME STATUS ROLES AGE VERSION master-0.pok-243.ocptest.pok.stglabs.ibm.com Ready master 10h v1.20.0+70dd98e master-1.pok-243.ocptest.pok.stglabs.ibm.com Ready master 10h v1.20.0+70dd98e master-2.pok-243.ocptest.pok.stglabs.ibm.com Ready master 10h v1.20.0+70dd98e worker-0.pok-243.ocptest.pok.stglabs.ibm.com Ready worker 10h v1.20.0+70dd98e worker-1.pok-243.ocptest.pok.stglabs.ibm.com Ready worker 10h v1.20.0+70dd98e [root@t90ocp3 ~]# ssh 192.168.79.1 oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 10h Unable to apply 4.7.0-0.nightly-s390x-2021-01-24-004935: an unknown error has occurred: MultipleErrors # ssh 192.168.79.1 oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-s390x-2021-01-24-004935 False False True 10h baremetal 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h cloud-credential 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h cluster-autoscaler 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h config-operator 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h console 4.7.0-0.nightly-s390x-2021-01-24-004935 False True True 10h csi-snapshot-controller 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h dns 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h etcd 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h image-registry 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h ingress 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h insights 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h kube-apiserver 4.7.0-0.nightly-s390x-2021-01-24-004935 False True True 10h kube-controller-manager 4.7.0-0.nightly-s390x-2021-01-24-004935 True True True 10h kube-scheduler 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h kube-storage-version-migrator 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h machine-api 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h machine-approver 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h machine-config 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h marketplace 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h monitoring 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h network 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h node-tuning 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h openshift-apiserver 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h openshift-controller-manager 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h openshift-samples 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h operator-lifecycle-manager 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h operator-lifecycle-manager-catalog 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h operator-lifecycle-manager-packageserver 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h service-ca 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h storage 4.7.0-0.nightly-s390x-2021-01-24-004935 True False False 10h # 5. [root@bastion ocp4-workdir]# cat install-config.copy.yaml apiVersion: v1 baseDomain: "ocptest.pok.stglabs.ibm.com" compute: - hyperthreading: Enabled name: worker replicas: 0 controlPlane: hyperthreading: Enabled name: master replicas: 3 metadata: name: "pok-243" networking: clusterNetworks: - cidr: 10.128.0.0/14 hostPrefix: 23 networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16 platform: none: {} pullSecret: <not included> 6. # virsh list Id Name State ----------------------------- 2 bastion running 3 bootstrap-0 running 4 master-0 running 5 master-1 running 6 master-2 running 7 infnod-0 running 8 infnod-1 running Expected results: 1. OCP 4.7 on Z should successfully install for KVM when specifying networkType "OVNKubernetes" in the install-config.yaml file. Additional info: Thank you.