Bug 2077380
Summary: | machine-api-provider-openstack does not clean up OSP ports after failed server provisioning | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
Component: | Cloud Compute | Assignee: | Emilien Macchi <emacchi> |
Cloud Compute sub component: | OpenStack Provider | QA Contact: | rlobillo |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aos-bugs, gellner, m.andre, mbooth, mfedosin, pprinett, rlobillo |
Version: | 4.9 | Keywords: | Triaged |
Target Milestone: | --- | ||
Target Release: | 4.10.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-11 10:31:47 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2073398 | ||
Bug Blocks: | 2077381 |
Description
OpenShift BugZilla Robot
2022-04-21 08:57:55 UTC
Removing the Triaged keyword because: * the QE automation assessment (flag qe_test_coverage) is missing Verified on 4.10.3 on top of RHOS-16.2-RHEL-8-20220311.n.1.
On a running cluster with 1 single worker:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.13 True False 140m Cluster version is 4.10.13
$ openstack port list --network ostest-tq67q-openshift -c Name -c Status
+------------------------------------------------------------------+--------+
| Name | Status |
+------------------------------------------------------------------+--------+
| ostest-tq67q-master-2 | ACTIVE |
| ostest-tq67q-api-port | DOWN |
| ostest-tq67q-worker-0-zzs25-51fbcc18-5e0d-423f-8d62-8a1b2c561da5 | ACTIVE |
| ostest-tq67q-master-0 | ACTIVE |
| ostest-tq67q-master-1 | ACTIVE |
| | DOWN |
| ostest-tq67q-ingress-port | DOWN |
| | ACTIVE |
+------------------------------------------------------------------+--------+
Creating new machine set setting a bogus serverGroupID:
$ oc get machineset -n openshift-machine-api ostest-tq67q-worker-0 -o yaml > new_machineset.yaml
$ vi new_machineset.yaml
yq '.spec.template.spec.providerSpec.value.serverGroupID' < new_machineset.yaml
"abcd-1234"
Applying the change:
$ oc apply -f new_machineset.yaml
machineset.machine.openshift.io/ostest-tq67q-worker-0-bogus-servergroup created
$ oc get machineset -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
ostest-tq67q-worker-0 1 1 1 1 169m
ostest-tq67q-worker-0-bogus-servergroup 1 1 2m49s
$ oc get machine -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ostest-tq67q-master-0 Running 176m
ostest-tq67q-master-1 Running 176m
ostest-tq67q-master-2 Running 176m
ostest-tq67q-worker-0-bogus-servergroup-vng8p Provisioning 76s
ostest-tq67q-worker-0-zzs25 Running m4.xlarge regionOne nova 171m
$ oc logs -n openshift-machine-api machine-api-controllers-55b5559cdb-zffn4 -c machine-controller
[...]
E0505 12:23:20.641945 1 controller.go:317] controller/machine_controller "msg"="Reconciler error" "error"="error creating Openstack instance: Group must be a UUID" "name"="ostest-tq67q-worker-0-bogus-servergroup-qsbtl" "namespace"="openshift-machine-api"
I0505 12:24:42.563174 1 controller.go:175] ostest-tq67q-worker-0-bogus-servergroup-qsbtl: reconciling Machine
I0505 12:24:43.081466 1 controller.go:386] ostest-tq67q-worker-0-bogus-servergroup-qsbtl: reconciling machine triggers idempotent create
>>> I0505 12:24:45.976603 1 machineservice.go:700] Deleted stale trunk "e1afa4ff-a0f6-487c-a36d-46257d405ea6"
>>> I0505 12:24:46.731079 1 machineservice.go:674] Deleted stale port "0f36f937-6ca6-42b1-8023-201a4b9854e2"
I0505 12:24:46.731644 1 logr.go:252] events "msg"="Warning" "message"="CreateError" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ostest-tq67q-worker-0-bogus-servergroup-qsbtl","uid":"3af985e4-51c9-4eff-8444-bd5afdc6aae8","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"120542"} "reason"="FailedCreate"
E0505 12:24:46.763590 1 actuator.go:415] Machine error ostest-tq67q-worker-0-bogus-servergroup-qsbtl: error creating Openstack instance: Group must be a UUID
W0505 12:24:46.763653 1 controller.go:388] ostest-tq67q-worker-0-bogus-servergroup-qsbtl: failed to create machine: error creating Openstack instance: Group must be a UUID
E0505 12:24:46.763781 1 controller.go:317] controller/machine_controller "msg"="Reconciler error" "error"="error creating Openstack instance: Group must be a UUID" "name"="ostest-tq67q-worker-0-bogus-servergroup-qsbtl" "namespace"="openshift-machine-api"
The port for the bogus instance is appearing for a moment, but then it is removed after ethe failure in the instance creation:
$ openstack port list --network ostest-tq67q-openshift -c Name -c Status
+------------------------------------------------------------------------------------+--------+
| Name | Status |
+------------------------------------------------------------------------------------+--------+
| ostest-tq67q-master-2 | ACTIVE |
| ostest-tq67q-api-port | DOWN |
| ostest-tq67q-worker-0-zzs25-51fbcc18-5e0d-423f-8d62-8a1b2c561da5 | ACTIVE |
| ostest-tq67q-worker-0-bogus-servergroup-qsbtl-51fbcc18-5e0d-423f-8d62-8a1b2c561da5 | DOWN |
| ostest-tq67q-master-0 | ACTIVE |
| ostest-tq67q-master-1 | ACTIVE |
| | DOWN |
| ostest-tq67q-ingress-port | DOWN |
| | ACTIVE |
+------------------------------------------------------------------------------------+--------+
...after few seconds:
$ openstack port list --network ostest-tq67q-openshift -c Name -c Status
+------------------------------------------------------------------+--------+
| Name | Status |
+------------------------------------------------------------------+--------+
| ostest-tq67q-master-2 | ACTIVE |
| ostest-tq67q-api-port | DOWN |
| ostest-tq67q-worker-0-zzs25-51fbcc18-5e0d-423f-8d62-8a1b2c561da5 | ACTIVE |
| ostest-tq67q-master-0 | ACTIVE |
| ostest-tq67q-master-1 | ACTIVE |
| | DOWN |
| ostest-tq67q-ingress-port | DOWN |
| | ACTIVE |
+------------------------------------------------------------------+--------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.13 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1690 |