Bug 1947402

Summary: Single Node cluster upgrade: AWS EBS CSI driver deployment is stuck on rollout
Product: OpenShift Container Platform Reporter: Vadim Rutkovsky <vrutkovs>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Operators QA Contact: Chao Yang <chaoyang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: aos-bugs, chaoyang, dhellmann, jsafrane, otuchfel, rfreiman, wduan
Version: 4.8Keywords: Upgrades
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:58:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Vadim Rutkovsky 2021-04-08 12:03:40 UTC
Description of problem:
`aws-ebs-csi-driver-controller` uses RollingUpdate strategy, so on a single node cluster new deployment rollout will get stuck:
'0/1 nodes are available: 1 node(s) didn''t have free ports for the requested
      pod ports.'

This is happening during AWS SNO upgrade tests - see https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/17519/rehearse-17519-periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-sno/1380088237434867712 for example.

Possible solution would be switching deployment to Recreate if its a single node install

Comment 2 Jan Safranek 2021-04-20 14:43:01 UTC
We've been thinking about this for a while. Why do you run SNO clusters with AWS cloud provider? I though they were bare metal by definition.

Comment 4 Vadim Rutkovsky 2021-04-20 15:16:04 UTC
(In reply to Jan Safranek from comment #2)
> We've been thinking about this for a while. Why do you run SNO clusters with
> AWS cloud provider? I though they were bare metal by definition.

SNO would be supported (in tech preview) on BM UPI for customers, but in CI its expensive. Making upgrade work on AWS would enable us to add this test to many affected repos.

atm we're using Azure, but I assume it may be affected by a similar bug eventually

Comment 5 Wei Duan 2021-04-26 09:26:59 UTC
Just mark here we met the same issue on one from scratch install aws cluster, will follow up this bz.

Comment 6 Fabio Bertinatto 2021-06-04 16:22:43 UTC
Everything is merged, except for https://github.com/openshift/api/pull/929, which isn't really a blocker.

Moving to MODIFIED.

Comment 8 Chao Yang 2021-06-11 09:28:15 UTC
oc get pods
NAME                                             READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-controller-8454cc878d-92vcg   11/11   Running   16         141m
aws-ebs-csi-driver-node-pxdst                    3/3     Running   0          141m
aws-ebs-csi-driver-operator-fdf948697-qpjmq      1/1     Running   2          142m


Comment 9 Chao Yang 2021-06-11 16:08:17 UTC
Upgrade to 4.8.0-0.nightly-2021-06-11-024306
oc get co storage
storage                                    4.8.0-0.nightly-2021-06-11-024306   True        False         False      24m

oc get pods -n openshift-cluster-csi-drivers
NAME                                            READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-controller-dfdd67b49-jzgjv   11/11   Running   0          2m38s
aws-ebs-csi-driver-node-js8kk                   3/3     Running   0          14s
aws-ebs-csi-driver-operator-7fffb85749-qgvpd    1/1     Running   0          3m50s

Comment 12 errata-xmlrpc 2021-07-27 22:58:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.