Description of problem:
Cinder-csi-driver-node pod doesn't run on master node
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install OSP cluster and cinder csi driver is installed.
2. Check CSI driver pods:
oc -n openshift-cluster-csi-drivers get pod -o wide
3. Create a pod on a master that uses PVC.
1. CSI driver node pods run only on worker nodes
$ oc -n openshift-cluster-csi-drivers get pod -o wide
openstack-cinder-csi-driver-node-42svt 2/2 Running 0 49m 192.168.2.110 wduan-1130a-fcw45-worker-0-jvtg8 <none> <none>
openstack-cinder-csi-driver-node-72flp 2/2 Running 1 47m 192.168.3.54 wduan-1130a-fcw45-worker-0-vnqmn <none> <none>
openstack-cinder-csi-driver-node-mh9js 2/2 Running 0 49m 192.168.0.14 wduan-1130a-fcw45-worker-0-qh5dx <none> <none>
2. Masters can't use a PVC provided by the CSI driver
Masters should have openstack-cinder-csi-driver-node pod then Masters can use a PVC provided by the CSI driver.
We believe this might have been an infra issue. Mike to double check.
Hey Wei Duan, just to make sure that I understand the issue, do we expect the cinder-csi-driver-node pods to run on the master nodes all the time or only when they are schedulable?
If it's the former, I think we can add the following toleration to the Deployment spec:
- key: node-role.kubernetes.io/master
And I suppose it's the same for the cinder-csi-driver-controller pods in https://bugzilla.redhat.com/show_bug.cgi?id=1902547 ?
Hi replied in the slack, let's discuss there and make a decision.
Verified pass on 4.7.0-0.nightly-2021-01-12-203716
$ oc -n openshift-cluster-csi-drivers get pod -o wide | grep "cinder-csi-driver-node"
openstack-cinder-csi-driver-node-8p9t6 2/2 Running 0 20m 192.168.2.181 wduan-0113b-x98wd-master-2 <none> <none>
openstack-cinder-csi-driver-node-fbmv9 2/2 Running 0 20m 192.168.1.51 wduan-0113b-x98wd-master-1 <none> <none>
openstack-cinder-csi-driver-node-lll9s 2/2 Running 0 20m 192.168.2.208 wduan-0113b-x98wd-worker-0-nb4rq <none> <none>
openstack-cinder-csi-driver-node-nblb7 2/2 Running 0 20m 192.168.1.22 wduan-0113b-x98wd-worker-0-q7xfx <none> <none>
openstack-cinder-csi-driver-node-nxkv7 2/2 Running 0 20m 192.168.3.40 wduan-0113b-x98wd-worker-0-qjshv <none> <none>
openstack-cinder-csi-driver-node-pnddv 2/2 Running 0 19m 192.168.3.129 wduan-0113b-x98wd-master-0 <none> <none>
And test pod could be running on a master.
$ oc get pod -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mypod 1/1 Running 0 23s 10.128.0.93 wduan-0113b-x98wd-master-0 <none> <none>
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.