1835778 – [aws-ebs-csi-driver-operator] The pods created by ds aws-ebs-csi-driver-node always in Pending status

Bug 1835778 - [aws-ebs-csi-driver-operator] The pods created by ds aws-ebs-csi-driver-node always in Pending status

Summary: [aws-ebs-csi-driver-operator] The pods created by ds aws-ebs-csi-driver-node ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Jan Safranek
QA Contact:	Qin Ping
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-14 13:28 UTC by Qin Ping
Modified:	2020-07-13 17:39 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-13 17:38:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift aws-ebs-csi-driver-operator pull 56	0	None	closed	Bug 1835778: Detect CSI driver installed by cluster admin	2020-10-29 11:29:10 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:39:01 UTC

Description Qin Ping 2020-05-14 13:28:10 UTC

Description of problem:
Tried to delete ds(aws-ebs-csi-driver-node) and to if the aws ebs csi driver can recreate the ds again, the ds is recreated successfully, but one pod is always in pending status. Tried to uninstall the aws ebs csi driver and reinstall, all the pods created by ds are in pending status.

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.5.0-0.nightly-2020-05-13-030007 True False 22h Cluster version is 4.5.0-0.nightly-2020-05-13-030007

aws-ebs-csi-driver-operator version 554623c-554623c765661f1b4b1a488441424307c5759df9

How reproducible:
Hit once

Steps to Reproduce:
1. Install aws ebs csi driver operator from the web console
2. Create a AWSEBSDrive to deploy aws ebs csi driver
3. Create a block pvc and deploymentconfig to test the driver
4. Set the dc.replicas=0
5. Deleted the aws-ebs-csi-driver-controller deployment and the deployment recreated successfully.
5. Deleted the aws-ebs-csi-driver-node ds, one of pod is in pending status.
6. Uninstall aws ebs csi driver
7. Reinstall aws ebs csi driver

Actual results:
1. aws-ebs-csi-driver-controller installed successfully
2. all the pod created by ds aws-ebs-csi-driver-node are in pending status.
3. tried to create a pod, the new created pod can be scheduled and run succcessfully.
$ oc get pod -n openshift-aws-ebs-csi-driver
NAME READY STATUS RESTARTS AGE
aws-ebs-csi-driver-controller-5dc5dc55d7-lc9vv 5/5 Running 0 19m
aws-ebs-csi-driver-node-4lhzz 0/3 Pending 0 19m
aws-ebs-csi-driver-node-6b8mp 0/3 Pending 0 19m
aws-ebs-csi-driver-node-wxt2w 0/3 Pending 0 19m

$ oc get pod
NAME READY STATUS RESTARTS AGE
aws-ebs-csi-driver-operator-66f948694b-7w9nr 1/1 Running 0 9h
hello-block-1-7tz25 1/1 Running 0 15m
hello-block-1-deploy 0/1 Completed 0 15m

Expected results:
The pods created by ds are in "Running" status.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
$ oc describe pod -n openshift-aws-ebs-csi-driver
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 49s (x19 over 19m) default-scheduler 0/6 nodes are available: 3 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match node selector.

Comment 1 Jan Safranek 2020-05-14 16:01:42 UTC

On slack we diagnosed that upstream AWS EBS CSI driver was installed manually before our operator tried to install "our" AWS EBS CSI driver.

$ oc -n kube-system get pod
NAME                   READY   STATUS    RESTARTS   AGE
ebs-csi-controller-0   5/5     Running   0          7h24m
ebs-csi-node-99gnv     3/3     Running   0          7h24m
ebs-csi-node-k2gtt     3/3     Running   0          7h24m
ebs-csi-node-krr8p     3/3     Running   0          7h24m

IMO, there is some room for improvement in the operator - it should detect that there is another driver already installed, report it and refuse to install a duplicate driver. In this case it collided on node port and did not actually run the driver and nothing bad happened, however, running two instances of the same driver at the same time could have unpredictable results (especially the DaemonSet pods could be CrashLooping).

Straw man design how to detect presence of another AWS EBS CSI driver:
- Some CSINode has the driver.
- And CSIDriver either does not exist or exists, but has not "our" label / annotation.
- And Namespace openshift-aws-ebs-csi-driver does not exist or is empty?

Comment 4 Qin Ping 2020-05-27 14:00:22 UTC

verified with: aws-ebs-csi-driver-operator version 554623c-554623c765661f1b4b1a488441424307c5759df9

Status:
  Conditions:
    Last Transition Time:  2020-05-27T13:51:43Z
    Status:                False
    Type:                  ManagementStateDegraded
    Last Transition Time:  2020-05-27T13:51:43Z
    Message:               AWS EBS CSI driver is already installed on the cluster.
    Reason:                OtherDriverInstalled
    Status:                False
    Type:                  PrereqsSatisfied
    Last Transition Time:  2020-05-27T13:51:43Z
    Message:               AWS EBS CSI driver is already installed on the cluster.
    Reason:                OtherDriverInstalled
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2020-05-27T13:51:43Z
    Message:               CSIDriver "ebs.csi.aws.com" is already installed, please uninstall it first before using this operator
    Reason:                OperatorSync
    Status:                True
    Type:                  Degraded
  Ready Replicas:          0
Events:                    <none>

Comment 5 errata-xmlrpc 2020-07-13 17:38:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.