Description of problem: Trying to deploy an OpenShift on Baremetal environment with jumbo frames enabled fails as the install never completes with some pods never coming up (kube-api-server etc). I believe it is strongly tied to the jumbo MTU configuration as the install on the same hardware with same OCP version and default MTU succeeds flawlessly. Version-Release number of selected component (if applicable): 4.4.6 How reproducible: 100% Steps to Reproduce: 1. Follow the instructions at https://docs.openshift.com/container-platform/4.4/installing/installing_bare_metal/installing-bare-metal-network-customizations.html to se teh MTU to 8950 in cluster-network-03-config.yml for OpenShiftSDN 2. Since the manifests do not configure the MTU on the baremetal interface (configures only MTU for VXLAN interface and veths), use ignition files to create ifcfg files to set the MTU to 9000 on the baremetal interface 3. Run openshift install Actual results: All the MTU on the vxlan and veth interfaces is configured correctly to 8950 and the baremetal interface has n MTU of 9000 as expected,the install fails. Expected results: Additional info:
It would be interesting to know what is different at a NIC/SDN level between this setup, and an AWS setup which also uses jumbo frames and which we know works correctly because we test it every commit in CI...
I initially thought this could be due to the bootstrap VM MTU and the baremetal bridge MTU not being set to 9000. On a subsequent deployment attempt, I set the bootstrap VM MTU as well as baremetal bridge/interface MTU on the provisioning hos to jumbo, yet deploy hangs NAMESPACE NAME READY STATUS RESTARTS AGE openshift-apiserver-operator openshift-apiserver-operator-f79557665-pktwj 0/1 Pending 0 108m openshift-authentication-operator authentication-operator-64d4ddc475-kgz6s 0/1 Pending 0 108m openshift-cluster-machine-approver machine-approver-6d54996f4-6g8p6 0/2 Pending 0 109m openshift-cluster-node-tuning-operator cluster-node-tuning-operator-6dcd5cbfcc-p4pt8 0/1 Pending 0 109m openshift-cluster-storage-operator csi-snapshot-controller-operator-bf96f6cc7-852cv 0/1 Pending 0 108m openshift-cluster-version cluster-version-operator-7c44bdbb69-ncgrp 0/1 Pending 0 109m openshift-controller-manager-operator openshift-controller-manager-operator-7976ddf498-hxj9h 0/1 Pending 0 108m openshift-dns-operator dns-operator-778fd8fbb5-g2ssg 0/2 Pending 0 109m openshift-etcd-operator etcd-operator-5998db5474-c25rv 0/1 Pending 0 108m openshift-kube-apiserver-operator kube-apiserver-operator-d4dfb74f8-lkhzd 0/1 Pending 0 108m openshift-kube-controller-manager-operator kube-controller-manager-operator-787c59c5bf-qvttb 0/1 Pending 0 108m openshift-kube-scheduler-operator openshift-kube-scheduler-operator-59d76b9498-w4ncg 0/1 Pending 0 108m openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-55f49cff56-l68ng 0/1 Pending 0 109m openshift-machine-config-operator machine-config-operator-54d7fd979-f8vcn 0/1 Pending 0 109m openshift-network-operator network-operator-66ddfd8657-4hbnc 0/1 Pending 0 108m openshift-operator-lifecycle-manager catalog-operator-565b8d557b-hpdj4 0/1 Pending 0 109m openshift-operator-lifecycle-manager olm-operator-5fc685ddfd-l5bfh 0/1 Pending 0 109m openshift-service-ca-operator service-ca-operator-5b9647cbd6-bl4pz 0/1 Pending 0 108m openshift-service-catalog-apiserver-operator openshift-service-catalog-apiserver-operator-66c64c4f64-4qsqj 0/1 Pending 0 108m openshift-service-catalog-controller-manager-operator openshift-service-catalog-controller-manager-operator-75c7kjxbl 0/1 Pending 0 108m
*** Bug 1846499 has been marked as a duplicate of this bug. ***
Sounds like this may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1846499 ? I think we need more information to proceed - can you provide the exact ignition customizations used, and also details of the provisioning host configuration - can you provide more details on exact steps to reproduce please? Is it possible the provisioning host networking where the installer is run did not get correctly configured, as that is not automated by the installer atm?
Sai, Out of curiosity could you provide us `ip link` output from the baremetal host and bootstrap vm? I seem to remember there was a kernel behavior change with-in the last few years that started to truncate packets across bridges, so getting ip link information would be super helpful for us to try and understand exactly what is occurring.
Steve, Julia I will get you the information you need shortly and could possibly even give you access to the environment when/where this happens if that is something you are interested in. However, I have a question here. This bug is about deployments that fail when jumbo mtu is being used. However https://bugzilla.redhat.com/show_bug.cgi?id=1846499 which as been marked as a duplicate of this bug addresses a very specific case of the MTU in manifests not being translated into the MTU on baremetal interfaces and needing a custom ignition file to configure MTU on baremetal. WHile these bugs are related, I don;t think https://bugzilla.redhat.com/show_bug.cgi?id=1846499 is a duplicate because that one is not about deployments failing. It's about the MTU on the baremetal interface not automcatically being driven off the SDN MTU in the manifests.
So, here's the manifest I used to set the custom MTU (cluster-network-03-config.yml) apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16 defaultNetwork: type: OpenShiftSDN openshiftSDNConfig: mode: NetworkPolicy mtu: 8950 vxlanPort: 4789 I set the MTU on the Baremetal interface also to 9000 (+50 bytes to account for vxlan overhead) using ignition files master.ign {"ignition": {"config": {"append": [{"source": "https://10.1.59.3:22623/config/master", "verification": {}}]}, "security": {"tls": {"certificateAuthorities": [{"source": "data:text/plain;charset=utf-8;base64,LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURFRENDQWZpZ0F3SUJBZ0lJUldDbDRhK1dLNXd3RFFZSktvWklodmNOQVFFTEJRQXdKakVTTUJBR0ExVUUKQ3hNSmIzQmxibk5vYVdaME1SQXdEZ1lEVlFRREV3ZHliMjkwTFdOaE1CNFhEVEl3TURZeE9EQXdOVFUwTWxvWApEVE13TURZeE5qQXdOVFUwTWxvd0pqRVNNQkFHQTFVRUN4TUpiM0JsYm5Ob2FXWjBNUkF3RGdZRFZRUURFd2R5CmIyOTBMV05oTUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFvUVBId29LVGtXVy8KVnh2czdnKzB4THhjWTkwdlJvbmVVYlJDRm9PRk54WXVtODFoNUd6U0dQdkJRTEFPV3IzOG44MU9ZM2dUd2trYQpvNmRjd3VEZFFJMDZJUTJERlpocWwxRGZSbUxsd3Nyamd2S20zbVFrOGxUd2N1dDliRER3emFxaXI4R0FWbll5CnJ6MWJESWR6VWljc01WWCtkQmMzekpyZWhaczVCOVNvdkRpdTQySjloeXI4RlR5TldvSCsvV1RjNU1tVEM5cjYKODdUZnUvbFl5WWpwVTAwQmxOUVFVVEF5amNHOEV2YytOZnRLRVhHTk1PZmp5SWw3Y2NmZ1VnOE5vbnJ6bDJJMwo1TktPZkt0SXNPbnpqYkRqZWtkanFrQXlaNE9IUlVuclBmZTJCZVEyN1k5cVZYOFdXT2l6MG1BNWlhc25pL3p1CnJ0U0VCajVxcFFJREFRQUJvMEl3UURBT0JnTlZIUThCQWY4RUJBTUNBcVF3RHdZRFZSMFRBUUgvQkFVd0F3RUIKL3pBZEJnTlZIUTRFRmdRVThlWFBTY253QkxQcGxpcjF4QzRFeG1pNVdFTXdEUVlKS29aSWh2Y05BUUVMQlFBRApnZ0VCQUNWRHcxZWJZR3JiNmIyWVd6amNQcGt4OWcySCtqNGt4cndsU2Y5Ylg1TGpHOXFiMXU0WU9pTVpJMFp4CjRJdkZsRmErSDFONWxWUlBxemxIL3JHY3JtdWFzTDR5ZU55b0hzRHBsanVOVkZuRW12YVNUcEYrdXMzd09GQzQKUkdMMC9XZjZ2UVdjbEV3dDVKTHNRTVpMd3R6bExtd2gyOE9nZUsyU0xaanJnYlFvSURxUlNyM1NNRG8zVk9tVwoxd1Uxd3dPNEN2K2ZNMnhQMzNKQXFzdndPTHlHcElDNU91UHZpc3dRclNVemlaZTh2UVRGWEVxTUlqUnhJWHZICk5QMXVDVm5SdmdjSU4yOXQ0UzQ4R0VwTjVwQzFQb1hPVE5KbzV4RW9oa05VU3ZNa1dKV2dNVS9nOVZBeFRvajQKd2NuU2owYmJWUmVLMWF2Vnd0Q3FydWZFalRzPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==", "verification": {}}]}}, "timeouts": {}, "version": "2.2.0"}, "networkd": {}, "passwd": {}, "storage": {"files": [{"path": "/etc/sysconfig/network-scripts/ifcfg-ens2f1", "filesystem": "root", "mode": 436, "contents": {"source": "data:,DEVICE%3Dens2f1%0ABOOTPROTO%3Ddhcp%0AONBOOT%3Dyes%0AMTU%3D9000%0A"}}, {"path": "/etc/sysconfig/network-scripts/ifcfg-eno1", "filesystem": "root", "mode": 436, "contents": {"source": "data:,DEVICE%3Deno1%0ABOOTPROTO%3Dnone%0AONBOOT%3Dno%0A"}}]}, "systemd": {}} worker.ign {"ignition": {"config": {"append": [{"source": "https://10.1.59.3:22623/config/worker", "verification": {}}]}, "security": {"tls": {"certificateAuthorities": [{"source": "data:text/plain;charset=utf-8;base64,LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURFRENDQWZpZ0F3SUJBZ0lJUldDbDRhK1dLNXd3RFFZSktvWklodmNOQVFFTEJRQXdKakVTTUJBR0ExVUUKQ3hNSmIzQmxibk5vYVdaME1SQXdEZ1lEVlFRREV3ZHliMjkwTFdOaE1CNFhEVEl3TURZeE9EQXdOVFUwTWxvWApEVE13TURZeE5qQXdOVFUwTWxvd0pqRVNNQkFHQTFVRUN4TUpiM0JsYm5Ob2FXWjBNUkF3RGdZRFZRUURFd2R5CmIyOTBMV05oTUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFvUVBId29LVGtXVy8KVnh2czdnKzB4THhjWTkwdlJvbmVVYlJDRm9PRk54WXVtODFoNUd6U0dQdkJRTEFPV3IzOG44MU9ZM2dUd2trYQpvNmRjd3VEZFFJMDZJUTJERlpocWwxRGZSbUxsd3Nyamd2S20zbVFrOGxUd2N1dDliRER3emFxaXI4R0FWbll5CnJ6MWJESWR6VWljc01WWCtkQmMzekpyZWhaczVCOVNvdkRpdTQySjloeXI4RlR5TldvSCsvV1RjNU1tVEM5cjYKODdUZnUvbFl5WWpwVTAwQmxOUVFVVEF5amNHOEV2YytOZnRLRVhHTk1PZmp5SWw3Y2NmZ1VnOE5vbnJ6bDJJMwo1TktPZkt0SXNPbnpqYkRqZWtkanFrQXlaNE9IUlVuclBmZTJCZVEyN1k5cVZYOFdXT2l6MG1BNWlhc25pL3p1CnJ0U0VCajVxcFFJREFRQUJvMEl3UURBT0JnTlZIUThCQWY4RUJBTUNBcVF3RHdZRFZSMFRBUUgvQkFVd0F3RUIKL3pBZEJnTlZIUTRFRmdRVThlWFBTY253QkxQcGxpcjF4QzRFeG1pNVdFTXdEUVlKS29aSWh2Y05BUUVMQlFBRApnZ0VCQUNWRHcxZWJZR3JiNmIyWVd6amNQcGt4OWcySCtqNGt4cndsU2Y5Ylg1TGpHOXFiMXU0WU9pTVpJMFp4CjRJdkZsRmErSDFONWxWUlBxemxIL3JHY3JtdWFzTDR5ZU55b0hzRHBsanVOVkZuRW12YVNUcEYrdXMzd09GQzQKUkdMMC9XZjZ2UVdjbEV3dDVKTHNRTVpMd3R6bExtd2gyOE9nZUsyU0xaanJnYlFvSURxUlNyM1NNRG8zVk9tVwoxd1Uxd3dPNEN2K2ZNMnhQMzNKQXFzdndPTHlHcElDNU91UHZpc3dRclNVemlaZTh2UVRGWEVxTUlqUnhJWHZICk5QMXVDVm5SdmdjSU4yOXQ0UzQ4R0VwTjVwQzFQb1hPVE5KbzV4RW9oa05VU3ZNa1dKV2dNVS9nOVZBeFRvajQKd2NuU2owYmJWUmVLMWF2Vnd0Q3FydWZFalRzPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==", "verification": {}}]}}, "timeouts": {}, "version": "2.2.0"}, "networkd": {}, "passwd": {}, "storage": {"files": [{"path": "/etc/sysconfig/network-scripts/ifcfg-ens2f1", "filesystem": "root", "mode": 436, "contents": {"source": "data:,DEVICE%3Dens2f1%0ABOOTPROTO%3Ddhcp%0AONBOOT%3Dyes%0AMTU%3D9000%0A"}}, {"path": "/etc/sysconfig/network-scripts/ifcfg-eno1", "filesystem": "root", "mode": 436, "contents": {"source": "data:,DEVICE%3Deno1%0ABOOTPROTO%3Dnone%0AONBOOT%3Dno%0A"}}]}, "systemd": {}} Before kicking the deploy, I changed the MTU of the Baremetal interface on my provisioning host, here's the output of ip a on the provisioning host ======================================================================================================================================================= [kni@e19-h24-b04-fc640 clusterconfigs]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:4e:01:3f:31:78 brd ff:ff:ff:ff:ff:ff inet 10.1.39.6/22 brd 10.1.39.255 scope global dynamic noprefixroute eno1 valid_lft 231182sec preferred_lft 231182sec 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:4e:01:3f:31:79 brd ff:ff:ff:ff:ff:ff 4: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master provisioning state UP group default qlen 1000 link/ether 3c:fd:fe:e7:94:50 brd ff:ff:ff:ff:ff:ff 5: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master baremetal state UP group default qlen 1000 link/ether 3c:fd:fe:e7:94:51 brd ff:ff:ff:ff:ff:ff 9: baremetal: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether 3c:fd:fe:e7:94:51 brd ff:ff:ff:ff:ff:ff inet 10.1.59.1/24 brd 10.1.59.255 scope global noprefixroute baremetal valid_lft forever preferred_lft forever inet6 2620:52:0:13b:bbd2:3811:34b3:452a/64 scope global dynamic noprefixroute valid_lft 2591791sec preferred_lft 604591sec inet6 fe80::c8ce:23b7:4e83:2880/64 scope link noprefixroute valid_lft forever preferred_lft forever 11: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether 52:54:00:49:21:3f brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 12: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:49:21:3f brd ff:ff:ff:ff:ff:ff 14: provisioning: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 3c:fd:fe:e7:94:50 brd ff:ff:ff:ff:ff:ff inet 172.22.0.1/24 brd 172.22.0.255 scope global noprefixroute provisioning valid_lft forever preferred_lft forever 15: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel master baremetal state UNKNOWN group default qlen 1000 link/ether fe:54:00:da:5e:68 brd ff:ff:ff:ff:ff:ff 16: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master provisioning state UNKNOWN group default qlen 1000 link/ether fe:54:00:82:32:24 brd ff:ff:ff:ff:ff:ff ========================================================================================================================================================= I do not set the MTU to Jumbo in my bootstrap ignition but as soon as the bootstrap VM was spawned I logged into the VM and changed MTU (before the pods got spawned ========================================================================================================================================================== Here' the output of ip a inside the bootstrap VM [root@localhost core]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:da:5e:68 brd ff:ff:ff:ff:ff:ff inet 10.1.59.79/24 brd 10.1.59.255 scope global dynamic noprefixroute ens3 valid_lft 3469sec preferred_lft 3469sec inet 10.1.59.2/24 scope global secondary ens3 valid_lft forever preferred_lft forever inet 10.1.59.3/24 scope global secondary ens3 valid_lft forever preferred_lft forever inet6 2620:52:0:13b:13b3:8dad:7631:7615/64 scope global dynamic noprefixroute valid_lft 2591871sec preferred_lft 604671sec inet6 fe80::a335:27d0:a4db:8a24/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:82:32:24 brd ff:ff:ff:ff:ff:ff inet 172.22.0.2/24 brd 172.22.0.255 scope global noprefixroute ens4 valid_lft forever preferred_lft forever inet6 fe80::9de8:19ba:dd86:48f8/64 scope link noprefixroute valid_lft forever preferred_lft forever ========================================================================================================================================================= Since the pods use host networking, they also have the jumbo MTU ============================================================================================================================================================ [root@localhost core]# podman exec -it mariadb bash [root@localhost /]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:da:5e:68 brd ff:ff:ff:ff:ff:ff inet 10.1.59.79/24 brd 10.1.59.255 scope global dynamic noprefixroute ens3 valid_lft 3462sec preferred_lft 3462sec inet 10.1.59.2/24 scope global secondary ens3 valid_lft forever preferred_lft forever inet 10.1.59.3/24 scope global secondary ens3 valid_lft forever preferred_lft forever inet6 2620:52:0:13b:13b3:8dad:7631:7615/64 scope global dynamic noprefixroute valid_lft 2591864sec preferred_lft 604664sec inet6 fe80::a335:27d0:a4db:8a24/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:82:32:24 brd ff:ff:ff:ff:ff:ff inet 172.22.0.2/24 brd 172.22.0.255 scope global noprefixroute ens4 valid_lft forever preferred_lft forever inet6 fe80::9de8:19ba:dd86:48f8/64 scope link noprefixroute valid_lft forever preferred_lft forever ===================================================================================================================================================
Install keeps getting stuck and does not make progress No resources found in openshift-kni-infra namespace. [kni@e19-h24-b04-fc640 clusterconfigs]$ oc get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE openshift-apiserver-operator openshift-apiserver-operator-8596449546-z54mc 0/1 Pending 0 41m openshift-authentication-operator authentication-operator-66f85cff9-tj6bh 0/1 Pending 0 41m openshift-cloud-credential-operator cloud-credential-operator-695f4895db-dvdjh 0/1 Pending 0 42m openshift-cluster-machine-approver machine-approver-685c8468fb-7928j 0/2 Pending 0 41m openshift-cluster-node-tuning-operator cluster-node-tuning-operator-6688b7b566-bp7nt 0/1 Pending 0 42m openshift-cluster-storage-operator csi-snapshot-controller-operator-84dd5b859b-89dxz 0/1 Pending 0 41m openshift-cluster-version cluster-version-operator-79bbd9b569-whbsl 0/1 Pending 0 42m openshift-controller-manager-operator openshift-controller-manager-operator-7ff98b7969-7p9gf 0/1 Pending 0 41m openshift-dns-operator dns-operator-7c947d89c6-mqrb6 0/2 Pending 0 42m openshift-etcd-operator etcd-operator-5d97b6445f-cj7f4 0/1 Pending 0 41m openshift-kube-apiserver-operator kube-apiserver-operator-8d9b94dbb-mht5m 0/1 Pending 0 41m openshift-kube-controller-manager-operator kube-controller-manager-operator-6fdcc5987c-dsxn5 0/1 Pending 0 41m openshift-kube-scheduler-operator openshift-kube-scheduler-operator-68c9564886-95vjp 0/1 Pending 0 41m openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-5fd77bc4c8-4pjrx 0/1 Pending 0 41m openshift-machine-api machine-api-operator-5c4dd5d794-484p7 0/2 Pending 0 41m openshift-machine-config-operator machine-config-operator-78db57d645-m5qrm 0/1 Pending 0 42m openshift-must-gather-6q6jv must-gather-dn7m7 0/1 Pending 0 12m openshift-network-operator network-operator-7856c8dd68-4l56z 0/1 Pending 0 41m openshift-operator-lifecycle-manager catalog-operator-59dc594f8f-9fxws 0/1 Pending 0 42m openshift-operator-lifecycle-manager olm-operator-778c69c9f6-h9p22 0/1 Pending 0 42m openshift-service-ca-operator service-ca-operator-5b4f8f7649-4x8xl 0/1 Pending 0 41m openshift-service-catalog-apiserver-operator openshift-service-catalog-apiserver-operator-5f5f55469f-pclbl 0/1 Pending 0 41m openshift-service-catalog-controller-manager-operator openshift-service-catalog-controller-manager-operator-79784tcm2 0/1 Pending 0 41m [kni@e19-h24-b04-fc640 clusterconfigs]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE cloud-credential True False False 41m [kni@e19-h24-b04-fc640 clusterconfigs]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 42m Working towards 4.4.4: 73% complete Some pods seems to be running on the masters [core@master-0 ~]$ sudo su [systemd] Failed Units: 1 NetworkManager-wait-online.service [root@master-0 core]# crictl ps CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID 773b67ad92827 ee7065c322c2add50de27f32cc37656366c004cd5868b5993f50a37d9dea2a76 20 minutes ago Running coredns-monitor 0 4d23a14542b3a 86bf319780841 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6594664ba965e195e06a70814e88895b2e92dc4746bdb1ec17b068f082405baf 20 minutes ago Running mdns-publisher 0 f650eecb018e8 e68f6e275c2ba quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:29ddffd83d2035f76a649223a0fa850ad63a3ca441f6d217a721574465f47338 20 minutes ago Running coredns 0 4d23a14542b3a ae6fd5b009327 ee7065c322c2add50de27f32cc37656366c004cd5868b5993f50a37d9dea2a76 20 minutes ago Running keepalived-monitor 0 e4d9465bda1df 33f9413033c84 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:78241892eb5ceb7e7c8f6d2b2f890b8a0514a94152ed81b2781024385d984b42 20 minutes ago Running keepalived 0 e4d9465bda1df 1703529fae9c3 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e0f9b3b61b5bfdc543373de25764958c4c1bbc639501924268c6cf4cd455f53e 20 minutes ago Running haproxy-monitor 0 9864c2194e4d7 3d0107ea55fcb quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d34b20bb3302bbd408a46002c47678f5bf613cf6c15966126967e7abd26c49d3 20 minutes ago Running haproxy 0 9864c2194e4d7 ip a output from master during deploy (after deploy stopped making progress) [root@master-0 core]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:4e:01:40:b9:51 brd ff:ff:ff:ff:ff:ff 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:4e:01:40:b9:52 brd ff:ff:ff:ff:ff:ff 4: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 3c:fd:fe:e7:a3:00 brd ff:ff:ff:ff:ff:ff inet 172.22.0.207/24 brd 172.22.0.255 scope global dynamic noprefixroute ens2f0 valid_lft 2207sec preferred_lft 2207sec inet6 fe80::2420:3bd9:dc23:afdd/64 scope link noprefixroute valid_lft forever preferred_lft forever 5: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000 link/ether 3c:fd:fe:e7:a3:01 brd ff:ff:ff:ff:ff:ff inet 10.1.59.10/24 brd 10.1.59.255 scope global dynamic noprefixroute ens2f1 valid_lft 2207sec preferred_lft 2207sec inet6 fe80::3efd:feff:fee7:a301/64 scope link valid_lft forever preferred_lft forever
> 1. Follow the instructions at > https://docs.openshift.com/container-platform/4.4/installing/ > installing_bare_metal/installing-bare-metal-network-customizations.html to > se teh MTU to 8950 in cluster-network-03-config.yml for OpenShiftSDN > 2. Since the manifests do not configure the MTU on the baremetal interface > (configures only MTU for VXLAN interface and veths), use ignition files to > create ifcfg files to set the MTU to 9000 on the baremetal interface You do not need to create a cluster-network-03-config.yml manifest. You only need to set the MTU manually if the MTU is not consistent across the cluster and therefore can't be autodetected. If the MTU is correctly configured on the baremetal interfaces, CNO will configure the VXLAN MTU correctly automatically.
(In reply to Dan Winship from comment #11) > > 1. Follow the instructions at > > https://docs.openshift.com/container-platform/4.4/installing/ > > installing_bare_metal/installing-bare-metal-network-customizations.html to > > se teh MTU to 8950 in cluster-network-03-config.yml for OpenShiftSDN > > 2. Since the manifests do not configure the MTU on the baremetal interface > > (configures only MTU for VXLAN interface and veths), use ignition files to > > create ifcfg files to set the MTU to 9000 on the baremetal interface > > You do not need to create a cluster-network-03-config.yml manifest. You only > need to set the MTU manually if the MTU is not consistent across the cluster > and therefore can't be autodetected. If the MTU is correctly configured on > the baremetal interfaces, CNO will configure the VXLAN MTU correctly > automatically. So looks like we need to fix our docs too then...
I just did a deployment with 4.5.4 with OVNKubernetes and configured MTU on the baremetal interface using dhcp option 26 (instead of ifcfg files through ignition like I previously did) and did not muck with any OpenShift Manifests. Install went through successfully. I'm going to try again with OpenShiftSDN and report back here, as the original bug was against OpenShiftSDN. BTW, I think we need to fix our docs as stated in my previous comment as we no longer need to muck with the manifests to set the SDN MTU as CNO should be able to set the appropriate MTU by reading the MTU of the baremetal interface.
This is working with OpenShiftSDN as well on 4.5.4, I think we can close this bug safely.