Bug 1902546

Summary: Cinder csi driver node pod doesn't run on master node
Product: OpenShift Container Platform Reporter: Wei Duan <wduan>
Component: StorageAssignee: Martin André <m.andre>
Storage sub component: OpenStack CSI Drivers QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, m.andre, pprinett
Version: 4.7Keywords: UpcomingSprint
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:36:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wei Duan 2020-11-30 02:51:09 UTC
Description of problem:
Cinder-csi-driver-node pod doesn't run on master node

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-11-29-133728

Steps to Reproduce:
1. Install OSP cluster and cinder csi driver is installed. 

2. Check CSI driver pods:
   oc -n openshift-cluster-csi-drivers get pod -o wide

3. Create a pod on a master that uses PVC.

Actual results:
1. CSI driver node pods run only on worker nodes
$ oc -n openshift-cluster-csi-drivers get pod -o wide
...
openstack-cinder-csi-driver-node-42svt                    2/2     Running   0          49m   192.168.2.110   wduan-1130a-fcw45-worker-0-jvtg8   <none>           <none>
openstack-cinder-csi-driver-node-72flp                    2/2     Running   1          47m   192.168.3.54    wduan-1130a-fcw45-worker-0-vnqmn   <none>           <none>
openstack-cinder-csi-driver-node-mh9js                    2/2     Running   0          49m   192.168.0.14    wduan-1130a-fcw45-worker-0-qh5dx   <none>           <none>
...

2. Masters can't use a PVC provided by the CSI driver

Expected results:
Masters should have openstack-cinder-csi-driver-node pod then Masters can use a PVC provided by the CSI driver.

Comment 1 Martin André 2020-12-10 09:18:05 UTC
We believe this might have been an infra issue. Mike to double check.

Comment 2 Martin André 2021-01-11 18:12:10 UTC
Hey Wei Duan, just to make sure that I understand the issue, do we expect the cinder-csi-driver-node pods to run on the master nodes all the time or only when they are schedulable?

If it's the former, I think we can add the following toleration to the Deployment spec:

    tolerations:
    - key: node-role.kubernetes.io/master
      operator: Exists
      effect: "NoSchedule"

And I suppose it's the same for the cinder-csi-driver-controller pods in https://bugzilla.redhat.com/show_bug.cgi?id=1902547 ?

Comment 3 Wei Duan 2021-01-12 02:27:09 UTC
Hi replied in the slack, let's discuss there and make a decision.

Comment 5 Wei Duan 2021-01-13 07:28:31 UTC
Verified pass on 4.7.0-0.nightly-2021-01-12-203716

$ oc -n openshift-cluster-csi-drivers get pod -o wide | grep "cinder-csi-driver-node"
openstack-cinder-csi-driver-node-8p9t6                    2/2     Running   0          20m   192.168.2.181   wduan-0113b-x98wd-master-2         <none>           <none>
openstack-cinder-csi-driver-node-fbmv9                    2/2     Running   0          20m   192.168.1.51    wduan-0113b-x98wd-master-1         <none>           <none>
openstack-cinder-csi-driver-node-lll9s                    2/2     Running   0          20m   192.168.2.208   wduan-0113b-x98wd-worker-0-nb4rq   <none>           <none>
openstack-cinder-csi-driver-node-nblb7                    2/2     Running   0          20m   192.168.1.22    wduan-0113b-x98wd-worker-0-q7xfx   <none>           <none>
openstack-cinder-csi-driver-node-nxkv7                    2/2     Running   0          20m   192.168.3.40    wduan-0113b-x98wd-worker-0-qjshv   <none>           <none>
openstack-cinder-csi-driver-node-pnddv                    2/2     Running   0          19m   192.168.3.129   wduan-0113b-x98wd-master-0         <none>           <none>

And test pod could be running on a master.
$ oc get pod -o wide -w
NAME    READY   STATUS              RESTARTS   AGE   IP       NODE                         NOMINATED NODE   READINESS GATES
mypod   1/1     Running             0          23s   10.128.0.93   wduan-0113b-x98wd-master-0   <none>           <none>

Comment 9 errata-xmlrpc 2021-02-24 15:36:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633