Bug 1846485
| Summary: | Deployment with Jumbo Frames (MTU 9000) on Baremetal fails | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Sai Sindhur Malleni <smalleni> |
| Component: | Installer | Assignee: | Antoni Segura Puimedon <asegurap> |
| Installer sub component: | OpenShift on Bare Metal IPI | QA Contact: | Nataf Sharabi <nsharabi> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | bfournie, danw, dblack, dcbw, racedoro, tsedovic, william.caban |
| Version: | 4.4 | Keywords: | Triaged |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-08-04 13:37:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Sai Sindhur Malleni
2020-06-11 16:58:06 UTC
It would be interesting to know what is different at a NIC/SDN level between this setup, and an AWS setup which also uses jumbo frames and which we know works correctly because we test it every commit in CI... I initially thought this could be due to the bootstrap VM MTU and the baremetal bridge MTU not being set to 9000. On a subsequent deployment attempt, I set the bootstrap VM MTU as well as baremetal bridge/interface MTU on the provisioning hos to jumbo, yet deploy hangs NAMESPACE NAME READY STATUS RESTARTS AGE openshift-apiserver-operator openshift-apiserver-operator-f79557665-pktwj 0/1 Pending 0 108m openshift-authentication-operator authentication-operator-64d4ddc475-kgz6s 0/1 Pending 0 108m openshift-cluster-machine-approver machine-approver-6d54996f4-6g8p6 0/2 Pending 0 109m openshift-cluster-node-tuning-operator cluster-node-tuning-operator-6dcd5cbfcc-p4pt8 0/1 Pending 0 109m openshift-cluster-storage-operator csi-snapshot-controller-operator-bf96f6cc7-852cv 0/1 Pending 0 108m openshift-cluster-version cluster-version-operator-7c44bdbb69-ncgrp 0/1 Pending 0 109m openshift-controller-manager-operator openshift-controller-manager-operator-7976ddf498-hxj9h 0/1 Pending 0 108m openshift-dns-operator dns-operator-778fd8fbb5-g2ssg 0/2 Pending 0 109m openshift-etcd-operator etcd-operator-5998db5474-c25rv 0/1 Pending 0 108m openshift-kube-apiserver-operator kube-apiserver-operator-d4dfb74f8-lkhzd 0/1 Pending 0 108m openshift-kube-controller-manager-operator kube-controller-manager-operator-787c59c5bf-qvttb 0/1 Pending 0 108m openshift-kube-scheduler-operator openshift-kube-scheduler-operator-59d76b9498-w4ncg 0/1 Pending 0 108m openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-55f49cff56-l68ng 0/1 Pending 0 109m openshift-machine-config-operator machine-config-operator-54d7fd979-f8vcn 0/1 Pending 0 109m openshift-network-operator network-operator-66ddfd8657-4hbnc 0/1 Pending 0 108m openshift-operator-lifecycle-manager catalog-operator-565b8d557b-hpdj4 0/1 Pending 0 109m openshift-operator-lifecycle-manager olm-operator-5fc685ddfd-l5bfh 0/1 Pending 0 109m openshift-service-ca-operator service-ca-operator-5b9647cbd6-bl4pz 0/1 Pending 0 108m openshift-service-catalog-apiserver-operator openshift-service-catalog-apiserver-operator-66c64c4f64-4qsqj 0/1 Pending 0 108m openshift-service-catalog-controller-manager-operator openshift-service-catalog-controller-manager-operator-75c7kjxbl 0/1 Pending 0 108m *** Bug 1846499 has been marked as a duplicate of this bug. *** Sounds like this may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1846499 ? I think we need more information to proceed - can you provide the exact ignition customizations used, and also details of the provisioning host configuration - can you provide more details on exact steps to reproduce please? Is it possible the provisioning host networking where the installer is run did not get correctly configured, as that is not automated by the installer atm? Sai, Out of curiosity could you provide us `ip link` output from the baremetal host and bootstrap vm? I seem to remember there was a kernel behavior change with-in the last few years that started to truncate packets across bridges, so getting ip link information would be super helpful for us to try and understand exactly what is occurring. Steve, Julia I will get you the information you need shortly and could possibly even give you access to the environment when/where this happens if that is something you are interested in. However, I have a question here. This bug is about deployments that fail when jumbo mtu is being used. However https://bugzilla.redhat.com/show_bug.cgi?id=1846499 which as been marked as a duplicate of this bug addresses a very specific case of the MTU in manifests not being translated into the MTU on baremetal interfaces and needing a custom ignition file to configure MTU on baremetal. WHile these bugs are related, I don;t think https://bugzilla.redhat.com/show_bug.cgi?id=1846499 is a duplicate because that one is not about deployments failing. It's about the MTU on the baremetal interface not automcatically being driven off the SDN MTU in the manifests. So, here's the manifest I used to set the custom MTU (cluster-network-03-config.yml)
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
serviceNetwork:
- 172.30.0.0/16
defaultNetwork:
type: OpenShiftSDN
openshiftSDNConfig:
mode: NetworkPolicy
mtu: 8950
vxlanPort: 4789
I set the MTU on the Baremetal interface also to 9000 (+50 bytes to account for vxlan overhead) using ignition files
master.ign
{"ignition": {"config": {"append": [{"source": "https://10.1.59.3:22623/config/master", "verification": {}}]}, "security": {"tls": {"certificateAuthorities": [{"source": "data:text/plain;charset=utf-8;base64,LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURFRENDQWZpZ0F3SUJBZ0lJUldDbDRhK1dLNXd3RFFZSktvWklodmNOQVFFTEJRQXdKakVTTUJBR0ExVUUKQ3hNSmIzQmxibk5vYVdaME1SQXdEZ1lEVlFRREV3ZHliMjkwTFdOaE1CNFhEVEl3TURZeE9EQXdOVFUwTWxvWApEVE13TURZeE5qQXdOVFUwTWxvd0pqRVNNQkFHQTFVRUN4TUpiM0JsYm5Ob2FXWjBNUkF3RGdZRFZRUURFd2R5CmIyOTBMV05oTUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFvUVBId29LVGtXVy8KVnh2czdnKzB4THhjWTkwdlJvbmVVYlJDRm9PRk54WXVtODFoNUd6U0dQdkJRTEFPV3IzOG44MU9ZM2dUd2trYQpvNmRjd3VEZFFJMDZJUTJERlpocWwxRGZSbUxsd3Nyamd2S20zbVFrOGxUd2N1dDliRER3emFxaXI4R0FWbll5CnJ6MWJESWR6VWljc01WWCtkQmMzekpyZWhaczVCOVNvdkRpdTQySjloeXI4RlR5TldvSCsvV1RjNU1tVEM5cjYKODdUZnUvbFl5WWpwVTAwQmxOUVFVVEF5amNHOEV2YytOZnRLRVhHTk1PZmp5SWw3Y2NmZ1VnOE5vbnJ6bDJJMwo1TktPZkt0SXNPbnpqYkRqZWtkanFrQXlaNE9IUlVuclBmZTJCZVEyN1k5cVZYOFdXT2l6MG1BNWlhc25pL3p1CnJ0U0VCajVxcFFJREFRQUJvMEl3UURBT0JnTlZIUThCQWY4RUJBTUNBcVF3RHdZRFZSMFRBUUgvQkFVd0F3RUIKL3pBZEJnTlZIUTRFRmdRVThlWFBTY253QkxQcGxpcjF4QzRFeG1pNVdFTXdEUVlKS29aSWh2Y05BUUVMQlFBRApnZ0VCQUNWRHcxZWJZR3JiNmIyWVd6amNQcGt4OWcySCtqNGt4cndsU2Y5Ylg1TGpHOXFiMXU0WU9pTVpJMFp4CjRJdkZsRmErSDFONWxWUlBxemxIL3JHY3JtdWFzTDR5ZU55b0hzRHBsanVOVkZuRW12YVNUcEYrdXMzd09GQzQKUkdMMC9XZjZ2UVdjbEV3dDVKTHNRTVpMd3R6bExtd2gyOE9nZUsyU0xaanJnYlFvSURxUlNyM1NNRG8zVk9tVwoxd1Uxd3dPNEN2K2ZNMnhQMzNKQXFzdndPTHlHcElDNU91UHZpc3dRclNVemlaZTh2UVRGWEVxTUlqUnhJWHZICk5QMXVDVm5SdmdjSU4yOXQ0UzQ4R0VwTjVwQzFQb1hPVE5KbzV4RW9oa05VU3ZNa1dKV2dNVS9nOVZBeFRvajQKd2NuU2owYmJWUmVLMWF2Vnd0Q3FydWZFalRzPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==", "verification": {}}]}}, "timeouts": {}, "version": "2.2.0"}, "networkd": {}, "passwd": {}, "storage": {"files": [{"path": "/etc/sysconfig/network-scripts/ifcfg-ens2f1", "filesystem": "root", "mode": 436, "contents": {"source": "data:,DEVICE%3Dens2f1%0ABOOTPROTO%3Ddhcp%0AONBOOT%3Dyes%0AMTU%3D9000%0A"}}, {"path": "/etc/sysconfig/network-scripts/ifcfg-eno1", "filesystem": "root", "mode": 436, "contents": {"source": "data:,DEVICE%3Deno1%0ABOOTPROTO%3Dnone%0AONBOOT%3Dno%0A"}}]}, "systemd": {}}
worker.ign
{"ignition": {"config": {"append": [{"source": "https://10.1.59.3:22623/config/worker", "verification": {}}]}, "security": {"tls": {"certificateAuthorities": [{"source": "data:text/plain;charset=utf-8;base64,LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURFRENDQWZpZ0F3SUJBZ0lJUldDbDRhK1dLNXd3RFFZSktvWklodmNOQVFFTEJRQXdKakVTTUJBR0ExVUUKQ3hNSmIzQmxibk5vYVdaME1SQXdEZ1lEVlFRREV3ZHliMjkwTFdOaE1CNFhEVEl3TURZeE9EQXdOVFUwTWxvWApEVE13TURZeE5qQXdOVFUwTWxvd0pqRVNNQkFHQTFVRUN4TUpiM0JsYm5Ob2FXWjBNUkF3RGdZRFZRUURFd2R5CmIyOTBMV05oTUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFvUVBId29LVGtXVy8KVnh2czdnKzB4THhjWTkwdlJvbmVVYlJDRm9PRk54WXVtODFoNUd6U0dQdkJRTEFPV3IzOG44MU9ZM2dUd2trYQpvNmRjd3VEZFFJMDZJUTJERlpocWwxRGZSbUxsd3Nyamd2S20zbVFrOGxUd2N1dDliRER3emFxaXI4R0FWbll5CnJ6MWJESWR6VWljc01WWCtkQmMzekpyZWhaczVCOVNvdkRpdTQySjloeXI4RlR5TldvSCsvV1RjNU1tVEM5cjYKODdUZnUvbFl5WWpwVTAwQmxOUVFVVEF5amNHOEV2YytOZnRLRVhHTk1PZmp5SWw3Y2NmZ1VnOE5vbnJ6bDJJMwo1TktPZkt0SXNPbnpqYkRqZWtkanFrQXlaNE9IUlVuclBmZTJCZVEyN1k5cVZYOFdXT2l6MG1BNWlhc25pL3p1CnJ0U0VCajVxcFFJREFRQUJvMEl3UURBT0JnTlZIUThCQWY4RUJBTUNBcVF3RHdZRFZSMFRBUUgvQkFVd0F3RUIKL3pBZEJnTlZIUTRFRmdRVThlWFBTY253QkxQcGxpcjF4QzRFeG1pNVdFTXdEUVlKS29aSWh2Y05BUUVMQlFBRApnZ0VCQUNWRHcxZWJZR3JiNmIyWVd6amNQcGt4OWcySCtqNGt4cndsU2Y5Ylg1TGpHOXFiMXU0WU9pTVpJMFp4CjRJdkZsRmErSDFONWxWUlBxemxIL3JHY3JtdWFzTDR5ZU55b0hzRHBsanVOVkZuRW12YVNUcEYrdXMzd09GQzQKUkdMMC9XZjZ2UVdjbEV3dDVKTHNRTVpMd3R6bExtd2gyOE9nZUsyU0xaanJnYlFvSURxUlNyM1NNRG8zVk9tVwoxd1Uxd3dPNEN2K2ZNMnhQMzNKQXFzdndPTHlHcElDNU91UHZpc3dRclNVemlaZTh2UVRGWEVxTUlqUnhJWHZICk5QMXVDVm5SdmdjSU4yOXQ0UzQ4R0VwTjVwQzFQb1hPVE5KbzV4RW9oa05VU3ZNa1dKV2dNVS9nOVZBeFRvajQKd2NuU2owYmJWUmVLMWF2Vnd0Q3FydWZFalRzPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==", "verification": {}}]}}, "timeouts": {}, "version": "2.2.0"}, "networkd": {}, "passwd": {}, "storage": {"files": [{"path": "/etc/sysconfig/network-scripts/ifcfg-ens2f1", "filesystem": "root", "mode": 436, "contents": {"source": "data:,DEVICE%3Dens2f1%0ABOOTPROTO%3Ddhcp%0AONBOOT%3Dyes%0AMTU%3D9000%0A"}}, {"path": "/etc/sysconfig/network-scripts/ifcfg-eno1", "filesystem": "root", "mode": 436, "contents": {"source": "data:,DEVICE%3Deno1%0ABOOTPROTO%3Dnone%0AONBOOT%3Dno%0A"}}]}, "systemd": {}}
Before kicking the deploy, I changed the MTU of the Baremetal interface on my provisioning host, here's the output of ip a on the provisioning host
=======================================================================================================================================================
[kni@e19-h24-b04-fc640 clusterconfigs]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:4e:01:3f:31:78 brd ff:ff:ff:ff:ff:ff
inet 10.1.39.6/22 brd 10.1.39.255 scope global dynamic noprefixroute eno1
valid_lft 231182sec preferred_lft 231182sec
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:4e:01:3f:31:79 brd ff:ff:ff:ff:ff:ff
4: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master provisioning state UP group default qlen 1000
link/ether 3c:fd:fe:e7:94:50 brd ff:ff:ff:ff:ff:ff
5: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master baremetal state UP group default qlen 1000
link/ether 3c:fd:fe:e7:94:51 brd ff:ff:ff:ff:ff:ff
9: baremetal: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether 3c:fd:fe:e7:94:51 brd ff:ff:ff:ff:ff:ff
inet 10.1.59.1/24 brd 10.1.59.255 scope global noprefixroute baremetal
valid_lft forever preferred_lft forever
inet6 2620:52:0:13b:bbd2:3811:34b3:452a/64 scope global dynamic noprefixroute
valid_lft 2591791sec preferred_lft 604591sec
inet6 fe80::c8ce:23b7:4e83:2880/64 scope link noprefixroute
valid_lft forever preferred_lft forever
11: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 52:54:00:49:21:3f brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
12: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
link/ether 52:54:00:49:21:3f brd ff:ff:ff:ff:ff:ff
14: provisioning: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 3c:fd:fe:e7:94:50 brd ff:ff:ff:ff:ff:ff
inet 172.22.0.1/24 brd 172.22.0.255 scope global noprefixroute provisioning
valid_lft forever preferred_lft forever
15: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel master baremetal state UNKNOWN group default qlen 1000
link/ether fe:54:00:da:5e:68 brd ff:ff:ff:ff:ff:ff
16: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master provisioning state UNKNOWN group default qlen 1000
link/ether fe:54:00:82:32:24 brd ff:ff:ff:ff:ff:ff
=========================================================================================================================================================
I do not set the MTU to Jumbo in my bootstrap ignition but as soon as the bootstrap VM was spawned I logged into the VM and changed MTU (before the pods got spawned
==========================================================================================================================================================
Here' the output of ip a inside the bootstrap VM
[root@localhost core]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:da:5e:68 brd ff:ff:ff:ff:ff:ff
inet 10.1.59.79/24 brd 10.1.59.255 scope global dynamic noprefixroute ens3
valid_lft 3469sec preferred_lft 3469sec
inet 10.1.59.2/24 scope global secondary ens3
valid_lft forever preferred_lft forever
inet 10.1.59.3/24 scope global secondary ens3
valid_lft forever preferred_lft forever
inet6 2620:52:0:13b:13b3:8dad:7631:7615/64 scope global dynamic noprefixroute
valid_lft 2591871sec preferred_lft 604671sec
inet6 fe80::a335:27d0:a4db:8a24/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:82:32:24 brd ff:ff:ff:ff:ff:ff
inet 172.22.0.2/24 brd 172.22.0.255 scope global noprefixroute ens4
valid_lft forever preferred_lft forever
inet6 fe80::9de8:19ba:dd86:48f8/64 scope link noprefixroute
valid_lft forever preferred_lft forever
=========================================================================================================================================================
Since the pods use host networking, they also have the jumbo MTU
============================================================================================================================================================
[root@localhost core]# podman exec -it mariadb bash
[root@localhost /]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:da:5e:68 brd ff:ff:ff:ff:ff:ff
inet 10.1.59.79/24 brd 10.1.59.255 scope global dynamic noprefixroute ens3
valid_lft 3462sec preferred_lft 3462sec
inet 10.1.59.2/24 scope global secondary ens3
valid_lft forever preferred_lft forever
inet 10.1.59.3/24 scope global secondary ens3
valid_lft forever preferred_lft forever
inet6 2620:52:0:13b:13b3:8dad:7631:7615/64 scope global dynamic noprefixroute
valid_lft 2591864sec preferred_lft 604664sec
inet6 fe80::a335:27d0:a4db:8a24/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:82:32:24 brd ff:ff:ff:ff:ff:ff
inet 172.22.0.2/24 brd 172.22.0.255 scope global noprefixroute ens4
valid_lft forever preferred_lft forever
inet6 fe80::9de8:19ba:dd86:48f8/64 scope link noprefixroute
valid_lft forever preferred_lft forever
===================================================================================================================================================
Install keeps getting stuck and does not make progress
No resources found in openshift-kni-infra namespace.
[kni@e19-h24-b04-fc640 clusterconfigs]$ oc get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-apiserver-operator openshift-apiserver-operator-8596449546-z54mc 0/1 Pending 0 41m
openshift-authentication-operator authentication-operator-66f85cff9-tj6bh 0/1 Pending 0 41m
openshift-cloud-credential-operator cloud-credential-operator-695f4895db-dvdjh 0/1 Pending 0 42m
openshift-cluster-machine-approver machine-approver-685c8468fb-7928j 0/2 Pending 0 41m
openshift-cluster-node-tuning-operator cluster-node-tuning-operator-6688b7b566-bp7nt 0/1 Pending 0 42m
openshift-cluster-storage-operator csi-snapshot-controller-operator-84dd5b859b-89dxz 0/1 Pending 0 41m
openshift-cluster-version cluster-version-operator-79bbd9b569-whbsl 0/1 Pending 0 42m
openshift-controller-manager-operator openshift-controller-manager-operator-7ff98b7969-7p9gf 0/1 Pending 0 41m
openshift-dns-operator dns-operator-7c947d89c6-mqrb6 0/2 Pending 0 42m
openshift-etcd-operator etcd-operator-5d97b6445f-cj7f4 0/1 Pending 0 41m
openshift-kube-apiserver-operator kube-apiserver-operator-8d9b94dbb-mht5m 0/1 Pending 0 41m
openshift-kube-controller-manager-operator kube-controller-manager-operator-6fdcc5987c-dsxn5 0/1 Pending 0 41m
openshift-kube-scheduler-operator openshift-kube-scheduler-operator-68c9564886-95vjp 0/1 Pending 0 41m
openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-5fd77bc4c8-4pjrx 0/1 Pending 0 41m
openshift-machine-api machine-api-operator-5c4dd5d794-484p7 0/2 Pending 0 41m
openshift-machine-config-operator machine-config-operator-78db57d645-m5qrm 0/1 Pending 0 42m
openshift-must-gather-6q6jv must-gather-dn7m7 0/1 Pending 0 12m
openshift-network-operator network-operator-7856c8dd68-4l56z 0/1 Pending 0 41m
openshift-operator-lifecycle-manager catalog-operator-59dc594f8f-9fxws 0/1 Pending 0 42m
openshift-operator-lifecycle-manager olm-operator-778c69c9f6-h9p22 0/1 Pending 0 42m
openshift-service-ca-operator service-ca-operator-5b4f8f7649-4x8xl 0/1 Pending 0 41m
openshift-service-catalog-apiserver-operator openshift-service-catalog-apiserver-operator-5f5f55469f-pclbl 0/1 Pending 0 41m
openshift-service-catalog-controller-manager-operator openshift-service-catalog-controller-manager-operator-79784tcm2 0/1 Pending 0 41m
[kni@e19-h24-b04-fc640 clusterconfigs]$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
cloud-credential True False False 41m
[kni@e19-h24-b04-fc640 clusterconfigs]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version False True 42m Working towards 4.4.4: 73% complete
Some pods seems to be running on the masters
[core@master-0 ~]$ sudo su
[systemd]
Failed Units: 1
NetworkManager-wait-online.service
[root@master-0 core]# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
773b67ad92827 ee7065c322c2add50de27f32cc37656366c004cd5868b5993f50a37d9dea2a76 20 minutes ago Running coredns-monitor 0 4d23a14542b3a
86bf319780841 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6594664ba965e195e06a70814e88895b2e92dc4746bdb1ec17b068f082405baf 20 minutes ago Running mdns-publisher 0 f650eecb018e8
e68f6e275c2ba quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:29ddffd83d2035f76a649223a0fa850ad63a3ca441f6d217a721574465f47338 20 minutes ago Running coredns 0 4d23a14542b3a
ae6fd5b009327 ee7065c322c2add50de27f32cc37656366c004cd5868b5993f50a37d9dea2a76 20 minutes ago Running keepalived-monitor 0 e4d9465bda1df
33f9413033c84 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:78241892eb5ceb7e7c8f6d2b2f890b8a0514a94152ed81b2781024385d984b42 20 minutes ago Running keepalived 0 e4d9465bda1df
1703529fae9c3 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e0f9b3b61b5bfdc543373de25764958c4c1bbc639501924268c6cf4cd455f53e 20 minutes ago Running haproxy-monitor 0 9864c2194e4d7
3d0107ea55fcb quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d34b20bb3302bbd408a46002c47678f5bf613cf6c15966126967e7abd26c49d3 20 minutes ago Running haproxy 0 9864c2194e4d7
ip a output from master during deploy (after deploy stopped making progress)
[root@master-0 core]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:4e:01:40:b9:51 brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:4e:01:40:b9:52 brd ff:ff:ff:ff:ff:ff
4: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 3c:fd:fe:e7:a3:00 brd ff:ff:ff:ff:ff:ff
inet 172.22.0.207/24 brd 172.22.0.255 scope global dynamic noprefixroute ens2f0
valid_lft 2207sec preferred_lft 2207sec
inet6 fe80::2420:3bd9:dc23:afdd/64 scope link noprefixroute
valid_lft forever preferred_lft forever
5: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
link/ether 3c:fd:fe:e7:a3:01 brd ff:ff:ff:ff:ff:ff
inet 10.1.59.10/24 brd 10.1.59.255 scope global dynamic noprefixroute ens2f1
valid_lft 2207sec preferred_lft 2207sec
inet6 fe80::3efd:feff:fee7:a301/64 scope link
valid_lft forever preferred_lft forever
> 1. Follow the instructions at
> https://docs.openshift.com/container-platform/4.4/installing/
> installing_bare_metal/installing-bare-metal-network-customizations.html to
> se teh MTU to 8950 in cluster-network-03-config.yml for OpenShiftSDN
> 2. Since the manifests do not configure the MTU on the baremetal interface
> (configures only MTU for VXLAN interface and veths), use ignition files to
> create ifcfg files to set the MTU to 9000 on the baremetal interface
You do not need to create a cluster-network-03-config.yml manifest. You only need to set the MTU manually if the MTU is not consistent across the cluster and therefore can't be autodetected. If the MTU is correctly configured on the baremetal interfaces, CNO will configure the VXLAN MTU correctly automatically.
(In reply to Dan Winship from comment #11) > > 1. Follow the instructions at > > https://docs.openshift.com/container-platform/4.4/installing/ > > installing_bare_metal/installing-bare-metal-network-customizations.html to > > se teh MTU to 8950 in cluster-network-03-config.yml for OpenShiftSDN > > 2. Since the manifests do not configure the MTU on the baremetal interface > > (configures only MTU for VXLAN interface and veths), use ignition files to > > create ifcfg files to set the MTU to 9000 on the baremetal interface > > You do not need to create a cluster-network-03-config.yml manifest. You only > need to set the MTU manually if the MTU is not consistent across the cluster > and therefore can't be autodetected. If the MTU is correctly configured on > the baremetal interfaces, CNO will configure the VXLAN MTU correctly > automatically. So looks like we need to fix our docs too then... I just did a deployment with 4.5.4 with OVNKubernetes and configured MTU on the baremetal interface using dhcp option 26 (instead of ifcfg files through ignition like I previously did) and did not muck with any OpenShift Manifests. Install went through successfully. I'm going to try again with OpenShiftSDN and report back here, as the original bug was against OpenShiftSDN. BTW, I think we need to fix our docs as stated in my previous comment as we no longer need to muck with the manifests to set the SDN MTU as CNO should be able to set the appropriate MTU by reading the MTU of the baremetal interface. This is working with OpenShiftSDN as well on 4.5.4, I think we can close this bug safely. |