Bug 1947402
| Summary: | Single Node cluster upgrade: AWS EBS CSI driver deployment is stuck on rollout | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vadim Rutkovsky <vrutkovs> |
| Component: | Storage | Assignee: | Fabio Bertinatto <fbertina> |
| Storage sub component: | Operators | QA Contact: | Chao Yang <chaoyang> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | high | CC: | aos-bugs, chaoyang, dhellmann, jsafrane, otuchfel, rfreiman, wduan |
| Version: | 4.8 | Keywords: | Upgrades |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 22:58:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
We've been thinking about this for a while. Why do you run SNO clusters with AWS cloud provider? I though they were bare metal by definition. (In reply to Jan Safranek from comment #2) > We've been thinking about this for a while. Why do you run SNO clusters with > AWS cloud provider? I though they were bare metal by definition. SNO would be supported (in tech preview) on BM UPI for customers, but in CI its expensive. Making upgrade work on AWS would enable us to add this test to many affected repos. atm we're using Azure, but I assume it may be affected by a similar bug eventually Just mark here we met the same issue on one from scratch install aws cluster, will follow up this bz. Everything is merged, except for https://github.com/openshift/api/pull/929, which isn't really a blocker. Moving to MODIFIED. oc get pods NAME READY STATUS RESTARTS AGE aws-ebs-csi-driver-controller-8454cc878d-92vcg 11/11 Running 16 141m aws-ebs-csi-driver-node-pxdst 3/3 Running 0 141m aws-ebs-csi-driver-operator-fdf948697-qpjmq 1/1 Running 2 142m 4.8.0-0.nightly-2021-06-10-224448 Upgrade to 4.8.0-0.nightly-2021-06-11-024306 oc get co storage storage 4.8.0-0.nightly-2021-06-11-024306 True False False 24m oc get pods -n openshift-cluster-csi-drivers NAME READY STATUS RESTARTS AGE aws-ebs-csi-driver-controller-dfdd67b49-jzgjv 11/11 Running 0 2m38s aws-ebs-csi-driver-node-js8kk 3/3 Running 0 14s aws-ebs-csi-driver-operator-7fffb85749-qgvpd 1/1 Running 0 3m50s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
Description of problem: `aws-ebs-csi-driver-controller` uses RollingUpdate strategy, so on a single node cluster new deployment rollout will get stuck: ``` '0/1 nodes are available: 1 node(s) didn''t have free ports for the requested pod ports.' ``` This is happening during AWS SNO upgrade tests - see https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/17519/rehearse-17519-periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-sno/1380088237434867712 for example. Possible solution would be switching deployment to Recreate if its a single node install