Section Number and Name:
Describe the issue:
In 3.10+ the openvswitch cli is not present on the node as ovs runs in a pod.
The command `ovs-vsctl del-br br0` doesnt not work without installing openvswitch rpm.
Further restarting the atomic-openshift-node service will not restart the SDN
Suggestions for improvement:
- Outline using `oc exec` into a ovs or sdn pod via the openshift-sdn project
- To restart the sdn the sdn pod needs to be restarted
- To make changes to mtu the node configmap via openshift-node project needs to be changed. Then wait for the sync pod to sync the changes before restarting the sdn pod.
I just went through this process on OCP 3.11. I'm not sure if this came out of one of the support cases that I had open but here is what I found:
- There is no installing the openvswitch rpm on the masters in 3.10 or 3.11. It was there in 3.9 but it isn't in the 3.10 or 3.11 yum repos. There is still an openvswitch RPM in fedora but it has no RHEL build and the commands are different.
- You can use the ovs-vsctl del-br br0 inside the SDN pods. You don't need to install anything to use it inside the pods which is where it needs to be run.
- The docs need to explain which SDN pods you need to run the command against. There appears to be an SDN pod for each host (master/infra/compute) but not all of the SDN pods let you oc rsh into them so you end up trying to get into each one and blindly run the command.
- Once you update the config map with the new MTU and go into the SDN pods to run the ovs-vsctl del-br br0 command it doesn't appear that you need to restart anything. If you wait a minute or so you can go back into the SDN pods and the device will be there with the new MTU.
- It is hard to understand which one of the config maps to change for the desired MTU outcome. I was able to edit the compute one and get the MTU size to change for compute nodes. If I edit the all-in-one and run the same steps after that the MTU doesn't change. I don't know why that is, perhaps in that case things have to be restarted? If so then the docs have to clear up that issue as well.
The docs also leave you to essentially guess what to set the new MTU size to for IPSec and doesn't explain why things are the way they are unless you hunt through multiple docs. I ended up finding that eth0 is 9001 because that is the max MTU in AWS for jumbo frames. The SDN is set to 8951 because eth0 will add another header to packets that traverse that device. The IPSec docs say the default header for IPSec is 62 bytes so you have to do 8951 - 62 (maybe a few more just in case). But if you look more closely at the docs you'll find that you can't set the MTU of the SDN "too low" or the kernel might reject packets where the MTU difference is too large. The docs don't explain what "too low" is. You just guess and hope the kernel doesn't squash your packets.
Made a small change. Checking with QA to make sure it is the right change or if there needs to be more.
I see the change in production. https://docs.openshift.com/container-platform/3.11/admin_guide/ipsec.html#admin-guide-ipsec-encrypting-hosts
Thanks for fixing this!
*** Bug 1652277 has been marked as a duplicate of this bug. ***