Bug 1868616

Summary:	[RFE] enable NFD operator to deploy nfd-worker pods on nodes labeled other than "node-role.kubernetes.io/worker"
Product:	OpenShift Container Platform	Reporter:	Andreas Bleischwitz <ableisch>
Component:	Node Feature Discovery Operator	Assignee:	Carlos Eduardo Arango Gutierrez <carangog>
Status:	CLOSED ERRATA	QA Contact:	Walid A. <wabouham>
Severity:	high	Docs Contact:
Priority:	medium
Version:	4.4	CC:	carangog, sejug
Target Milestone:	---
Target Release:	4.6.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-10-27 15:09:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Andreas Bleischwitz 2020-08-13 10:03:02 UTC

* Description of problem:

Current implementation of the NFD-operator create a daemonset which is only taking nodes with "node-role.kubernetes.io/worker" into accout. Customers which do have separated infra nodes, won't be able to have nfd-workers deployed on infra nodes.

* Version-Release number of selected component (if applicable):

{"level":"info","ts":1597222546.4806738,"logger":"cmd","msg":"Go Version: go1.13.4"}
{"level":"info","ts":1597222546.4808075,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1597222546.4808278,"logger":"cmd","msg":"Version of operator-sdk: v0.4.0+git"}

registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:c34f885bc15d76d45486b1a6fb41990d07896ff5e319905d937d6ccf6affb44c

* Proposed title of this feature request

Enable the NFD-operator to deploy NFD-workers on nodes other than the ones with a worker label.

* What is the nature and description of the request?

Currently the NFD operator deploys NFD-workers only on nodes which do have the node-role.kubernetes.io/worker label set. Customers who need to have separated infra nodes, are currently unable to have the NFD-workers deployed on those nodes.

* Why does the customer need this? (List the business requirements here)

For cost reasons, dedicated nodes for infrastructure pods have been created. Those nodes have a node-role.kubernetes.io/infra label set. Those nodes do have special hardware installed (f.e. special network-cards), which need to be identified by the scheduler.

* List any affected packages or components

registry.redhat.io/openshift4/ose-node-feature-discovery

Comment 2 Carlos Eduardo Arango Gutierrez 2020-09-08 21:52:16 UTC

Is deploying the operator from the Operator-Hub a constrain? 
if not, you could deploy the operator from the source and change the node selector in line 

https://github.com/openshift/cluster-nfd-operator/blob/release-4.4/assets/worker/0700_worker_daemonset.yaml#L17

```yaml
    spec:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
```

Follwing

git clone git:openshift/cluster-nfd-operator.git 
cd cluster-nfd-operator
git checkout release-4.4
sed -i '17 s/node-role.kubernetes.io\/worker/node-role.kubernetes.io\/infra/'  assets/worker/0700_worker_daemonset.yaml
make deploy

If this RFE is to get this change as a dinamic choice via Operator-Hub, then it must be against release 4.6 and we can then cherry pick it to the desired version.

Comment 3 Andreas Bleischwitz 2020-09-14 15:21:18 UTC

Hi Carlos,

while some customers may be able to deploy upstream images, most of mine are not. They have to fulfill compliance regulations and won't be able to do so. Some even don't have access to github.

So this request is to make the operator from Operator-Hub able to deploy NFD-worker nodes on "worker" and "infra" nodes. Infra nodes will be created as per the following KVS:

https://access.redhat.com/solutions/4287111

Hope that answers your question.
/Andreas

Comment 4 Carlos Eduardo Arango Gutierrez 2020-09-14 15:31:31 UTC

Hi Andreas

my question is, are customers removing the  node-role.kubernetes.io/worker label after labelling it as infra? that KVS only informs how to add labels to a node, but in a vanilla deployment, the label node-role.kubernetes.io/worker should be there.

Also the KVS mentions "app" and "infra", which are 2 cases. I think if we open the door for infra nodes, we should consider then both cases for customers. Or are we sure only worker and infra are going to be the supported use cases? 

Another thing, you have filed this BZ against 4.4, but this would be a new feature of NFD, so this will need to go to master (4.7) and be back ported. is that ok? or this is an urgent fix from the field?

Comment 5 Andreas Bleischwitz 2020-09-18 12:28:39 UTC

Hi Carlos,

when following the KCS mentioned in #3, the worker labels remain, but the default node selector is set to nodes with an "app" role.

% oc get scheduler/cluster -o json | jq ".spec.defaultNodeSelector"
"node-role.kubernetes.io/app="

That way one will have to either configure the namespace with a proper annotation or will have to override the selector in some other positions.

The NFD-operator does not allow such modification and also will have to deploy nodes on infra nodes (nfd-master, nfd-operators) but also on the worker nodes.
In order to separate the infra nodes, one will then have to follow the instructions of the KCS and will have to annotate the NFD-namespace with "openshift.io/node-selector": "node-role.kubernetes.io/infra=" to override the default node-selector.
This in turn will render the deployment of NFD-workers as impossible as this would require to either change the deployment (which is overwritten by the operator and would require a second deployment for the worker nodes) or one would have to deploy with the default nodeselector in place, which will render the NFD-master nodes to be stuck (as they are waiting for master nodes, which are not in available within the "app" labeled nodes)

So we have the following situation:
% oc get nodes -o wide
NAME                                    STATUS   ROLES       
master0                                 Ready    infra,master
master1                                 Ready    infra,master
master2                                 Ready    infra,master
worker0                                 Ready    infra,worker
worker1                                 Ready    infra,worker
worker2                                 Ready    app,worker  
worker3                                 Ready    app,worker  

% oc get ns/operator-nfd -o json | jq '.metadata.annotations."openshift.io/node-selector"'
"node-role.kubernetes.io/infra="

% oc get pods -o wide -n operator-nfd
NAME                            READY   STATUS    RESTARTS   AGE     IP             NODE   
nfd-master-5vn7d                1/1     Running   1          7d20h   10.128.4.98    master0
nfd-master-9xnrp                1/1     Running   0          7d20h   10.129.0.42    master2
nfd-master-mvlwv                1/1     Running   0          7d20h   10.129.2.86    master1
nfd-operator-5cf5c9b74d-hrnxd   1/1     Running   0          7d22h   10.129.0.30    master2
nfd-operator-d5bf59888-f9t8h    0/1     Running   0          3d      10.129.0.132   master2
nfd-worker-4k9f9                1/1     Running   10         7d20h   192.168.4.81   worker1
nfd-worker-76r26                1/1     Running   20         7d20h   192.168.4.80   worker0

As you can see, there are no NFD-workers running on the "app" labeled nodes because the annotation sets the node-selector to "infra" nodes. This would be required to have the NFD-master and operator pods started.

The operator now would require a separated daemonset for the "app" labeled nodes, as it now seem to combine the NS toleration and the node-selector "worker" into account and only creates 2 replicas:

% oc get ds -n operator-nfd
NAME         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
nfd-master   3         3         3       3            3           node-role.kubernetes.io/master=   16d
nfd-worker   2         2         2       2            2           node-role.kubernetes.io/worker=   16d   <== there are actually 4 worker nodes in the cluster.

And this lack of functionality should be addressed with a 4.6.z release as this would be the one which is used by my customer. Currently there is no need for further backports other than this.

Comment 6 Carlos Eduardo Arango Gutierrez 2020-09-25 15:11:53 UTC

https://github.com/kubernetes-sigs/node-feature-discovery-operator/pull/31 adds this functionality. 

This bug is filed against 4.4, just to double check, we need this backported?

Comment 7 Andreas Bleischwitz 2020-09-29 07:07:19 UTC

Hi,

my customer is aiming for OCP-4.6 and are currently on 4.5. I don't think that we would need a backport to 4.4, but it would be great if we could test the change in a 4.5 release already.
/Andreas

Comment 10 Walid A. 2020-10-09 14:59:00 UTC

Verified on OCP 4.6.0-0.nightly-2020-10-08-182439  with NFD operator deployed from github master repo and also from OperatorHub.
Followed steps in https://access.redhat.com/solutions/4287111 to create infra nodes.
Infra nodes labeled by NFD

Comment 13 errata-xmlrpc 2020-10-27 15:09:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.1 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4198