Bug 1947402 - Single Node cluster upgrade: AWS EBS CSI driver deployment is stuck on rollout
Summary: Single Node cluster upgrade: AWS EBS CSI driver deployment is stuck on rollout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.8.0
Assignee: Fabio Bertinatto
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-08 12:03 UTC by Vadim Rutkovsky
Modified: 2021-07-27 22:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:58:16 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift api pull 929 0 None open Bug 1947402: Add policyv1 scheme 2021-05-26 11:07:48 UTC
Github openshift aws-ebs-csi-driver-operator pull 122 0 None open Bug 1947402: Prevent deployment rollouts getting stuck in single-node clusters 2021-05-21 13:46:28 UTC
Github openshift cluster-storage-operator pull 171 0 None open Bug 1947402: Add permissions poddisruptionbudgets in AWS CSI operator' 2021-05-31 18:05:13 UTC
Github openshift library-go pull 1056 0 None open Bug 1947402: Prevent deployment rollouts getting stuck 2021-04-26 09:50:24 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:58:32 UTC

Description Vadim Rutkovsky 2021-04-08 12:03:40 UTC
Description of problem:
`aws-ebs-csi-driver-controller` uses RollingUpdate strategy, so on a single node cluster new deployment rollout will get stuck:
```
'0/1 nodes are available: 1 node(s) didn''t have free ports for the requested
      pod ports.'
```

This is happening during AWS SNO upgrade tests - see https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/17519/rehearse-17519-periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-sno/1380088237434867712 for example.

Possible solution would be switching deployment to Recreate if its a single node install

Comment 2 Jan Safranek 2021-04-20 14:43:01 UTC
We've been thinking about this for a while. Why do you run SNO clusters with AWS cloud provider? I though they were bare metal by definition.

Comment 4 Vadim Rutkovsky 2021-04-20 15:16:04 UTC
(In reply to Jan Safranek from comment #2)
> We've been thinking about this for a while. Why do you run SNO clusters with
> AWS cloud provider? I though they were bare metal by definition.

SNO would be supported (in tech preview) on BM UPI for customers, but in CI its expensive. Making upgrade work on AWS would enable us to add this test to many affected repos.

atm we're using Azure, but I assume it may be affected by a similar bug eventually

Comment 5 Wei Duan 2021-04-26 09:26:59 UTC
Just mark here we met the same issue on one from scratch install aws cluster, will follow up this bz.

Comment 6 Fabio Bertinatto 2021-06-04 16:22:43 UTC
Everything is merged, except for https://github.com/openshift/api/pull/929, which isn't really a blocker.

Moving to MODIFIED.

Comment 8 Chao Yang 2021-06-11 09:28:15 UTC
oc get pods
NAME                                             READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-controller-8454cc878d-92vcg   11/11   Running   16         141m
aws-ebs-csi-driver-node-pxdst                    3/3     Running   0          141m
aws-ebs-csi-driver-operator-fdf948697-qpjmq      1/1     Running   2          142m

4.8.0-0.nightly-2021-06-10-224448

Comment 9 Chao Yang 2021-06-11 16:08:17 UTC
Upgrade to 4.8.0-0.nightly-2021-06-11-024306
oc get co storage
storage                                    4.8.0-0.nightly-2021-06-11-024306   True        False         False      24m

oc get pods -n openshift-cluster-csi-drivers
NAME                                            READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-controller-dfdd67b49-jzgjv   11/11   Running   0          2m38s
aws-ebs-csi-driver-node-js8kk                   3/3     Running   0          14s
aws-ebs-csi-driver-operator-7fffb85749-qgvpd    1/1     Running   0          3m50s

Comment 12 errata-xmlrpc 2021-07-27 22:58:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.