Bug 1835778 - [aws-ebs-csi-driver-operator] The pods created by ds aws-ebs-csi-driver-node always in Pending status
Summary: [aws-ebs-csi-driver-operator] The pods created by ds aws-ebs-csi-driver-node ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Jan Safranek
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-14 13:28 UTC by Qin Ping
Modified: 2020-07-13 17:39 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:38:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift aws-ebs-csi-driver-operator pull 56 0 None closed Bug 1835778: Detect CSI driver installed by cluster admin 2020-10-29 11:29:10 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:39:01 UTC

Description Qin Ping 2020-05-14 13:28:10 UTC
Description of problem:
Tried to delete ds(aws-ebs-csi-driver-node) and to if the aws ebs csi driver can recreate the ds again, the ds is recreated successfully, but one pod is always in pending status. Tried to uninstall the aws ebs csi driver and reinstall, all the pods created by ds are in pending status.

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-05-13-030007   True        False         22h     Cluster version is 4.5.0-0.nightly-2020-05-13-030007

aws-ebs-csi-driver-operator version 554623c-554623c765661f1b4b1a488441424307c5759df9

How reproducible:
Hit once

Steps to Reproduce:
1. Install aws ebs csi driver operator from the web console
2. Create a AWSEBSDrive to deploy aws ebs csi driver
3. Create a block pvc and deploymentconfig to test the driver
4. Set the dc.replicas=0
5. Deleted the aws-ebs-csi-driver-controller deployment and the deployment recreated successfully.
5. Deleted the aws-ebs-csi-driver-node ds, one of pod is in pending status.
6. Uninstall aws ebs csi driver
7. Reinstall aws ebs csi driver

Actual results:
1. aws-ebs-csi-driver-controller installed successfully
2. all the pod created by ds aws-ebs-csi-driver-node are in pending status.
3. tried to create a pod, the new created pod can be scheduled and run succcessfully.
$ oc get pod  -n openshift-aws-ebs-csi-driver
NAME                                             READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-controller-5dc5dc55d7-lc9vv   5/5     Running   0          19m
aws-ebs-csi-driver-node-4lhzz                    0/3     Pending   0          19m
aws-ebs-csi-driver-node-6b8mp                    0/3     Pending   0          19m
aws-ebs-csi-driver-node-wxt2w                    0/3     Pending   0          19m

$ oc get pod
NAME                                           READY   STATUS      RESTARTS   AGE
aws-ebs-csi-driver-operator-66f948694b-7w9nr   1/1     Running     0          9h
hello-block-1-7tz25                            1/1     Running     0          15m
hello-block-1-deploy                           0/1     Completed   0          15m


Expected results:
The pods created by ds are in "Running" status.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
$ oc describe pod -n openshift-aws-ebs-csi-driver
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  49s (x19 over 19m)  default-scheduler  0/6 nodes are available: 3 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match node selector.

Comment 1 Jan Safranek 2020-05-14 16:01:42 UTC
On slack we diagnosed that upstream AWS EBS CSI driver was installed manually before our operator tried to install "our" AWS EBS CSI driver.

$ oc -n kube-system get pod
NAME                   READY   STATUS    RESTARTS   AGE
ebs-csi-controller-0   5/5     Running   0          7h24m
ebs-csi-node-99gnv     3/3     Running   0          7h24m
ebs-csi-node-k2gtt     3/3     Running   0          7h24m
ebs-csi-node-krr8p     3/3     Running   0          7h24m

IMO, there is some room for improvement in the operator - it should detect that there is another driver already installed, report it and refuse to install a duplicate driver. In this case it collided on node port and did not actually run the driver and nothing bad happened, however, running two instances of the same driver at the same time could have unpredictable results (especially the DaemonSet pods could be CrashLooping).

Straw man design how to detect presence of another AWS EBS CSI driver:
- Some CSINode has the driver.
- And CSIDriver either does not exist or exists, but has not "our" label / annotation.
- And Namespace openshift-aws-ebs-csi-driver does not exist or is empty?

Comment 4 Qin Ping 2020-05-27 14:00:22 UTC
verified with: aws-ebs-csi-driver-operator version 554623c-554623c765661f1b4b1a488441424307c5759df9

Status:
  Conditions:
    Last Transition Time:  2020-05-27T13:51:43Z
    Status:                False
    Type:                  ManagementStateDegraded
    Last Transition Time:  2020-05-27T13:51:43Z
    Message:               AWS EBS CSI driver is already installed on the cluster.
    Reason:                OtherDriverInstalled
    Status:                False
    Type:                  PrereqsSatisfied
    Last Transition Time:  2020-05-27T13:51:43Z
    Message:               AWS EBS CSI driver is already installed on the cluster.
    Reason:                OtherDriverInstalled
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2020-05-27T13:51:43Z
    Message:               CSIDriver "ebs.csi.aws.com" is already installed, please uninstall it first before using this operator
    Reason:                OperatorSync
    Status:                True
    Type:                  Degraded
  Ready Replicas:          0
Events:                    <none>

Comment 5 errata-xmlrpc 2020-07-13 17:38:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.