Description of problem: It's possible to delete the clusternetwork object. It does not automatically recreate itself. Since this can cause new pods to fail to start up, if important pods are then deleted, the cluster becomes unresponsive. I discovered this when poking around, trying to change the network plugin. Editing the default networkconfigs.networkoperator.openshift.io alone didnt seem to do it. As per the pastebins in [1], I: oc delete configmap applied-defaults -n openshift-network-operator This also did not result in the networkplugin changing. (Not sure why deleting a confimap would do that, but hey I'm willing to try anything once) So then I delete the clusternetwork to see if OCP will rebuild it or not. It doesn't. This comes with the side effect that pod networking breaks! [1] https://mojo.redhat.com/docs/DOC-1185646 Version-Release number of selected component (if applicable): 4.0 HTB How reproducible: Easily Steps to Reproduce: $ oc edit networkconfigs.networkoperator.openshift.io networkconfig "default" edited $ oc get clusternetwork NAME NETWORK HOST SUBNET LENGTH SERVICE NETWORK PLUGIN NAME default 10.128.0.0/14 9 172.30.0.0/16 redhat/openshift-ovs-networkpolicy $ oc project openshift-cluster-network-operator Now using project "openshift-cluster-network-operator" on server "https://stwalter-g5corp-api.rhcee.support:6443". $ oc get cm NAME DATA AGE applied-default 1 1h cluster-network-operator 0 1h $ oc delete cm applied-default configmap "applied-default" deleted $ oc get clusternetwork NAME NETWORK HOST SUBNET LENGTH SERVICE NETWORK PLUGIN NAME default 10.128.0.0/14 9 172.30.0.0/16 redhat/openshift-ovs-networkpolicy $ oc delete clusternetwork default clusternetwork "default" deleted $ oc get clusternetwork No resources found. $ oc get networkconfigs.networkoperator.openshift.io NAME KIND default NetworkConfig.v1.networkoperator.openshift.io $ oc get networkconfigs.networkoperator.openshift.io NAME KIND default NetworkConfig.v1.networkoperator.openshift.io $ oc get netnamespace NAME NETID default 0 kube-public 15255683 kube-system 3130926 . . . $ oc get pod -n openshift-sdn NAME READY STATUS RESTARTS AGE ovs-69vj8 1/1 Running 0 1h . . . $ oc delete pod --all -n openshift-sdn pod "ovs-69vj8" deleted pod "ovs-bzzqr" deleted pod "ovs-c8q69" deleted . . . $ oc get pod -n openshift-sdn NAME READY STATUS RESTARTS AGE ovs-69vj8 1/1 Terminating 0 1h ovs-bzzqr 1/1 Terminating 0 1h ovs-c8q69 0/1 Terminating 0 56m ovs-gl8jh 0/1 Pending 0 0s ovs-nj7px 0/1 ContainerCreating 0 0s ovs-p9qdf 1/1 Running 0 3s sdn-bk5qx 0/1 CrashLoopBackOff 1 4s sdn-controller-cq4hm 1/1 Terminating 0 1h sdn-controller-ngvkf 1/1 Terminating 1 1h sdn-controller-q6smm 1/1 Terminating 0 1h sdn-dl562 0/1 Terminating 0 56m sdn-ft2fh 0/1 CrashLoopBackOff 1 4s sdn-jr57r 0/1 CrashLoopBackOff 1 8s sdn-l7gn7 0/1 CrashLoopBackOff 1 5s sdn-tqcgt 0/1 Error 1 6s $ oc get pod -n openshift-sdn oc pro^C $ oc project openshift-sdn ^C $ oc get node Error from server (ServerTimeout): the server cannot complete the requested operation at this time, try again later (get nodes)
Yup, we don't support changing the network mode on a running cluster, as you've found out. We should definitely look in to reconciling cluster networks, but I suspect this has been this way since 3.0.
Hi, Is that true? I've found: https://docs.openshift.com/container-platform/3.9/install_config/configuring_sdn.html#migrating-between-sdn-plugins
Migrating between SDN providers or openshift-sdn modes won't work in 4.0. We intend to add it to the operator, but it's just a matter of time / prioritization.
We need to block people from being able to delete the network until such an operation is possible then.
Only the admin user can delete this object... they can delete lots of objects that will break the cluster. I'm not sure that this is a bug.
Sure, but do we *want* people to be able to delete this -- even administrators? Is there a legitimate use case for doing so? (If there is that's fine, we just need steps to recreate it documented)