Description of problem: Cinder-csi-driver-node pod doesn't run on master node Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-11-29-133728 Steps to Reproduce: 1. Install OSP cluster and cinder csi driver is installed. 2. Check CSI driver pods: oc -n openshift-cluster-csi-drivers get pod -o wide 3. Create a pod on a master that uses PVC. Actual results: 1. CSI driver node pods run only on worker nodes $ oc -n openshift-cluster-csi-drivers get pod -o wide ... openstack-cinder-csi-driver-node-42svt 2/2 Running 0 49m 192.168.2.110 wduan-1130a-fcw45-worker-0-jvtg8 <none> <none> openstack-cinder-csi-driver-node-72flp 2/2 Running 1 47m 192.168.3.54 wduan-1130a-fcw45-worker-0-vnqmn <none> <none> openstack-cinder-csi-driver-node-mh9js 2/2 Running 0 49m 192.168.0.14 wduan-1130a-fcw45-worker-0-qh5dx <none> <none> ... 2. Masters can't use a PVC provided by the CSI driver Expected results: Masters should have openstack-cinder-csi-driver-node pod then Masters can use a PVC provided by the CSI driver.
We believe this might have been an infra issue. Mike to double check.
Hey Wei Duan, just to make sure that I understand the issue, do we expect the cinder-csi-driver-node pods to run on the master nodes all the time or only when they are schedulable? If it's the former, I think we can add the following toleration to the Deployment spec: tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: "NoSchedule" And I suppose it's the same for the cinder-csi-driver-controller pods in https://bugzilla.redhat.com/show_bug.cgi?id=1902547 ?
Hi replied in the slack, let's discuss there and make a decision.
Verified pass on 4.7.0-0.nightly-2021-01-12-203716 $ oc -n openshift-cluster-csi-drivers get pod -o wide | grep "cinder-csi-driver-node" openstack-cinder-csi-driver-node-8p9t6 2/2 Running 0 20m 192.168.2.181 wduan-0113b-x98wd-master-2 <none> <none> openstack-cinder-csi-driver-node-fbmv9 2/2 Running 0 20m 192.168.1.51 wduan-0113b-x98wd-master-1 <none> <none> openstack-cinder-csi-driver-node-lll9s 2/2 Running 0 20m 192.168.2.208 wduan-0113b-x98wd-worker-0-nb4rq <none> <none> openstack-cinder-csi-driver-node-nblb7 2/2 Running 0 20m 192.168.1.22 wduan-0113b-x98wd-worker-0-q7xfx <none> <none> openstack-cinder-csi-driver-node-nxkv7 2/2 Running 0 20m 192.168.3.40 wduan-0113b-x98wd-worker-0-qjshv <none> <none> openstack-cinder-csi-driver-node-pnddv 2/2 Running 0 19m 192.168.3.129 wduan-0113b-x98wd-master-0 <none> <none> And test pod could be running on a master. $ oc get pod -o wide -w NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mypod 1/1 Running 0 23s 10.128.0.93 wduan-0113b-x98wd-master-0 <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633