Description of problem: Tried to delete ds(aws-ebs-csi-driver-node) and to if the aws ebs csi driver can recreate the ds again, the ds is recreated successfully, but one pod is always in pending status. Tried to uninstall the aws ebs csi driver and reinstall, all the pods created by ds are in pending status. Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-05-13-030007 True False 22h Cluster version is 4.5.0-0.nightly-2020-05-13-030007 aws-ebs-csi-driver-operator version 554623c-554623c765661f1b4b1a488441424307c5759df9 How reproducible: Hit once Steps to Reproduce: 1. Install aws ebs csi driver operator from the web console 2. Create a AWSEBSDrive to deploy aws ebs csi driver 3. Create a block pvc and deploymentconfig to test the driver 4. Set the dc.replicas=0 5. Deleted the aws-ebs-csi-driver-controller deployment and the deployment recreated successfully. 5. Deleted the aws-ebs-csi-driver-node ds, one of pod is in pending status. 6. Uninstall aws ebs csi driver 7. Reinstall aws ebs csi driver Actual results: 1. aws-ebs-csi-driver-controller installed successfully 2. all the pod created by ds aws-ebs-csi-driver-node are in pending status. 3. tried to create a pod, the new created pod can be scheduled and run succcessfully. $ oc get pod -n openshift-aws-ebs-csi-driver NAME READY STATUS RESTARTS AGE aws-ebs-csi-driver-controller-5dc5dc55d7-lc9vv 5/5 Running 0 19m aws-ebs-csi-driver-node-4lhzz 0/3 Pending 0 19m aws-ebs-csi-driver-node-6b8mp 0/3 Pending 0 19m aws-ebs-csi-driver-node-wxt2w 0/3 Pending 0 19m $ oc get pod NAME READY STATUS RESTARTS AGE aws-ebs-csi-driver-operator-66f948694b-7w9nr 1/1 Running 0 9h hello-block-1-7tz25 1/1 Running 0 15m hello-block-1-deploy 0/1 Completed 0 15m Expected results: The pods created by ds are in "Running" status. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: $ oc describe pod -n openshift-aws-ebs-csi-driver Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 49s (x19 over 19m) default-scheduler 0/6 nodes are available: 3 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match node selector.
On slack we diagnosed that upstream AWS EBS CSI driver was installed manually before our operator tried to install "our" AWS EBS CSI driver. $ oc -n kube-system get pod NAME READY STATUS RESTARTS AGE ebs-csi-controller-0 5/5 Running 0 7h24m ebs-csi-node-99gnv 3/3 Running 0 7h24m ebs-csi-node-k2gtt 3/3 Running 0 7h24m ebs-csi-node-krr8p 3/3 Running 0 7h24m IMO, there is some room for improvement in the operator - it should detect that there is another driver already installed, report it and refuse to install a duplicate driver. In this case it collided on node port and did not actually run the driver and nothing bad happened, however, running two instances of the same driver at the same time could have unpredictable results (especially the DaemonSet pods could be CrashLooping). Straw man design how to detect presence of another AWS EBS CSI driver: - Some CSINode has the driver. - And CSIDriver either does not exist or exists, but has not "our" label / annotation. - And Namespace openshift-aws-ebs-csi-driver does not exist or is empty?
verified with: aws-ebs-csi-driver-operator version 554623c-554623c765661f1b4b1a488441424307c5759df9 Status: Conditions: Last Transition Time: 2020-05-27T13:51:43Z Status: False Type: ManagementStateDegraded Last Transition Time: 2020-05-27T13:51:43Z Message: AWS EBS CSI driver is already installed on the cluster. Reason: OtherDriverInstalled Status: False Type: PrereqsSatisfied Last Transition Time: 2020-05-27T13:51:43Z Message: AWS EBS CSI driver is already installed on the cluster. Reason: OtherDriverInstalled Status: False Type: Progressing Last Transition Time: 2020-05-27T13:51:43Z Message: CSIDriver "ebs.csi.aws.com" is already installed, please uninstall it first before using this operator Reason: OperatorSync Status: True Type: Degraded Ready Replicas: 0 Events: <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409