Description of problem: We taint a particular node if default interface MTU is less than configured but doesn't un-taint it back when the MTU value is fixed Following is observation post fixing MTU value, node still remains as NoSchedule $ oc describe nodes ip-10-0-137-240.us-east-2.compute.internal | grep -i sch Taints: network.openshift.io/mtu-too-small=value:NoSchedule Unschedulable: false Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-07-08-142835 How reproducible: Always Steps to Reproduce: 1. Systemctl Stop NetworkManager 2. ifconfig <interface> mtu 1300 <<< It was 9001 3. oc delete pod <sdn_pod> 4. Systemctl Start NetworkManager <<< This should enable dhcp and corrects the MTU value as default Actual results: 4th step corrects set the MTU value back to 9001 but node remains Tainted Expected results: Node should get un-taint when MTU value is fixed Additional info: Will be furnished if requested
(In reply to Anurag saxena from comment #0) > 4. Systemctl Start NetworkManager <<< This should enable dhcp and corrects > the > MTU value as default > > Actual results: 4th step corrects set the MTU value back to 9001 but node > remains Tainted > > > Expected results: Node should get un-taint when MTU value is fixed The taint doesn't need to be removed until OpenShift SDN is restarted.
(In reply to Dan Winship from comment #1) > (In reply to Anurag saxena from comment #0) > > 4. Systemctl Start NetworkManager <<< This should enable dhcp and corrects > > the > > MTU value as default > > > > Actual results: 4th step corrects set the MTU value back to 9001 but node > > remains Tainted > > > > > > Expected results: Node should get un-taint when MTU value is fixed > > The taint doesn't need to be removed until OpenShift SDN is restarted. Hmm..i tried to kill openshift-sdn process post step 4.Openshift-sdn got restarted but following remained the same $ oc describe nodes ip-10-0-132-170.ap-northeast-1.compute.internal | grep -i sch Taints: network.openshift.io/mtu-too-small=value:NoSchedule <<<<<<<<<<<<<<<< Unschedulable: false
No, I mean, with the current state of the code, the taint never gets removed, but with the fixed version, the expected behavior will be that it gets removed after restart, not that it gets removed immediately.
(In reply to Dan Winship from comment #3) > No, I mean, with the current state of the code, the taint never gets > removed, but with the fixed version, the expected behavior will be that it > gets removed after restart, not that it gets removed immediately. Got it. Thanks for clarification, Dan.
https://github.com/openshift/sdn/pull/11/files
Thanks for the fix! This works okay now following steps mentioned in comment 1 and the restarting openshift-sdn Steps from comment 1 tainted the node to NoSchedule [core@ip-10-0-x-x ~]$ oc describe nodes ip-10-0-130-153.ap-northeast-1.compute.internal | grep -i sch Taints: network.openshift.io/mtu-too-small=value:NoSchedule Unschedulable: false Post openshift-sdn restart, node got untainted: [core@ip-10-0-x-x ~]$ oc describe nodes ip-10-0-130-153.ap-northeast-1.compute.internal | grep -i sch Unschedulable: false Verifying based on above checks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922