1947402 – Single Node cluster upgrade: AWS EBS CSI driver deployment is stuck on rollout

Bug 1947402 - Single Node cluster upgrade: AWS EBS CSI driver deployment is stuck on rollout

Summary: Single Node cluster upgrade: AWS EBS CSI driver deployment is stuck on rollout

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Fabio Bertinatto
QA Contact:	Chao Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-08 12:03 UTC by Vadim Rutkovsky
Modified:	2021-07-27 22:58 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 22:58:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift api pull 929	None	open	Bug 1947402: Add policyv1 scheme	2021-05-26 11:07:48 UTC
Github	openshift aws-ebs-csi-driver-operator pull 122	None	open	Bug 1947402: Prevent deployment rollouts getting stuck in single-node clusters	2021-05-21 13:46:28 UTC
Github	openshift cluster-storage-operator pull 171	None	open	Bug 1947402: Add permissions poddisruptionbudgets in AWS CSI operator'	2021-05-31 18:05:13 UTC
Github	openshift library-go pull 1056	None	open	Bug 1947402: Prevent deployment rollouts getting stuck	2021-04-26 09:50:24 UTC
Red Hat Product Errata	RHSA-2021:2438	None	None	None	2021-07-27 22:58:32 UTC

Description Vadim Rutkovsky 2021-04-08 12:03:40 UTC

Description of problem:
`aws-ebs-csi-driver-controller` uses RollingUpdate strategy, so on a single node cluster new deployment rollout will get stuck:
```
'0/1 nodes are available: 1 node(s) didn''t have free ports for the requested
      pod ports.'
```

This is happening during AWS SNO upgrade tests - see https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/17519/rehearse-17519-periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-sno/1380088237434867712 for example.

Possible solution would be switching deployment to Recreate if its a single node install

Comment 2 Jan Safranek 2021-04-20 14:43:01 UTC

We've been thinking about this for a while. Why do you run SNO clusters with AWS cloud provider? I though they were bare metal by definition.

Comment 4 Vadim Rutkovsky 2021-04-20 15:16:04 UTC

(In reply to Jan Safranek from comment #2)
> We've been thinking about this for a while. Why do you run SNO clusters with
> AWS cloud provider? I though they were bare metal by definition.

SNO would be supported (in tech preview) on BM UPI for customers, but in CI its expensive. Making upgrade work on AWS would enable us to add this test to many affected repos.

atm we're using Azure, but I assume it may be affected by a similar bug eventually

Comment 5 Wei Duan 2021-04-26 09:26:59 UTC

Just mark here we met the same issue on one from scratch install aws cluster, will follow up this bz.

Comment 6 Fabio Bertinatto 2021-06-04 16:22:43 UTC

Everything is merged, except for https://github.com/openshift/api/pull/929, which isn't really a blocker.

Moving to MODIFIED.

Comment 8 Chao Yang 2021-06-11 09:28:15 UTC

oc get pods
NAME                                             READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-controller-8454cc878d-92vcg   11/11   Running   16         141m
aws-ebs-csi-driver-node-pxdst                    3/3     Running   0          141m
aws-ebs-csi-driver-operator-fdf948697-qpjmq      1/1     Running   2          142m

4.8.0-0.nightly-2021-06-10-224448

Comment 9 Chao Yang 2021-06-11 16:08:17 UTC

Upgrade to 4.8.0-0.nightly-2021-06-11-024306
oc get co storage
storage                                    4.8.0-0.nightly-2021-06-11-024306   True        False         False      24m

oc get pods -n openshift-cluster-csi-drivers
NAME                                            READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-controller-dfdd67b49-jzgjv   11/11   Running   0          2m38s
aws-ebs-csi-driver-node-js8kk                   3/3     Running   0          14s
aws-ebs-csi-driver-operator-7fffb85749-qgvpd    1/1     Running   0          3m50s

Comment 12 errata-xmlrpc 2021-07-27 22:58:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.