1728518 – [SDN] Node doesn't get un-taint when MTU value is fixed

Bug 1728518 - [SDN] Node doesn't get un-taint when MTU value is fixed

Summary: [SDN] Node doesn't get un-taint when MTU value is fixed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Ricardo Carrillo Cruz
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-10 06:12 UTC by Anurag saxena
Modified:	2019-10-16 06:33 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:33:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:33:36 UTC

Description Anurag saxena 2019-07-10 06:12:30 UTC

Description of problem: We taint a particular node if default interface MTU is less than configured but doesn't un-taint it back when the MTU value is fixed

Following is observation post fixing MTU value, node still remains as NoSchedule

$ oc describe nodes ip-10-0-137-240.us-east-2.compute.internal | grep -i sch
Taints:             network.openshift.io/mtu-too-small=value:NoSchedule
Unschedulable:      false


Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-07-08-142835


How reproducible: Always


Steps to Reproduce:
1. Systemctl Stop NetworkManager
2. ifconfig <interface> mtu 1300   <<< It was 9001
3. oc delete pod <sdn_pod>
4. Systemctl Start NetworkManager  <<< This should enable dhcp and corrects the
   MTU value as default

Actual results: 4th step corrects set the MTU value back to 9001 but node remains Tainted


Expected results: Node should get un-taint when MTU value is fixed


Additional info: Will be furnished if requested

Comment 1 Dan Winship 2019-07-10 12:29:06 UTC

(In reply to Anurag saxena from comment #0)
> 4. Systemctl Start NetworkManager  <<< This should enable dhcp and corrects
> the
>    MTU value as default
> 
> Actual results: 4th step corrects set the MTU value back to 9001 but node
> remains Tainted
> 
> 
> Expected results: Node should get un-taint when MTU value is fixed

The taint doesn't need to be removed until OpenShift SDN is restarted.

Comment 2 Anurag saxena 2019-07-10 14:20:37 UTC

(In reply to Dan Winship from comment #1)
> (In reply to Anurag saxena from comment #0)
> > 4. Systemctl Start NetworkManager  <<< This should enable dhcp and corrects
> > the
> >    MTU value as default
> > 
> > Actual results: 4th step corrects set the MTU value back to 9001 but node
> > remains Tainted
> > 
> > 
> > Expected results: Node should get un-taint when MTU value is fixed
> 
> The taint doesn't need to be removed until OpenShift SDN is restarted.

Hmm..i tried to kill openshift-sdn process post step 4.Openshift-sdn got restarted but following remained the same

$ oc describe nodes ip-10-0-132-170.ap-northeast-1.compute.internal | grep -i sch
Taints:             network.openshift.io/mtu-too-small=value:NoSchedule            <<<<<<<<<<<<<<<<
Unschedulable:      false

Comment 3 Dan Winship 2019-07-10 14:57:05 UTC

No, I mean, with the current state of the code, the taint never gets removed, but with the fixed version, the expected behavior will be that it gets removed after restart, not that it gets removed immediately.

Comment 4 Anurag saxena 2019-07-16 09:04:43 UTC

(In reply to Dan Winship from comment #3)
> No, I mean, with the current state of the code, the taint never gets
> removed, but with the fixed version, the expected behavior will be that it
> gets removed after restart, not that it gets removed immediately.

Got it. Thanks for clarification, Dan.

Comment 5 Ricardo Carrillo Cruz 2019-07-22 14:55:33 UTC

https://github.com/openshift/sdn/pull/11/files

Comment 7 Anurag saxena 2019-08-05 16:10:55 UTC

Thanks for the fix! This works okay now following steps mentioned in comment 1 and the restarting openshift-sdn

Steps from comment 1 tainted the node to NoSchedule

[core@ip-10-0-x-x ~]$ oc describe nodes ip-10-0-130-153.ap-northeast-1.compute.internal | grep -i sch
Taints:             network.openshift.io/mtu-too-small=value:NoSchedule
Unschedulable:      false

Post openshift-sdn restart, node got untainted: 

[core@ip-10-0-x-x ~]$ oc describe nodes ip-10-0-130-153.ap-northeast-1.compute.internal | grep -i sch
Unschedulable:      false


Verifying based on above checks.

Comment 8 errata-xmlrpc 2019-10-16 06:33:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.