Hide Forgot
Created attachment 1852582 [details] must-gather Description of problem: Deploying a cluster with below section on install-config.yaml: compute: - name: worker platform: openstack: zones: ['AZ-0', 'AZ-0', 'AZ-0'] additionalNetworkIDs: ['ef087939-421b-4b4e-ba79-08406ec461b1'] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0', 'cinderAZ1', 'cinderAZ0'] replicas: 3 controlPlane: name: master platform: openstack: zones: ['AZ-0', 'AZ-0', 'AZ-0'] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0', 'cinderAZ1', 'cinderAZ0'] replicas: 3 is not adding any worker to the cluster. The machines are stuck on Provisioning status: $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE ostest-5548v-master-0 Running m4.xlarge regionOne AZ-0 63m ostest-5548v-master-1 Running m4.xlarge regionOne AZ-0 63m ostest-5548v-master-2 Running m4.xlarge regionOne AZ-0 63m ostest-5548v-worker-0-f2mmj Provisioning 43m ostest-5548v-worker-1-pdthb Provisioning 43m ostest-5548v-worker-2-xwv9d Provisioning 43m And machine-controller is showing below errors continuosly: $ oc logs -n openshift-machine-api machine-api-controllers-77b5487964-mqpzp machine-controller -f [...] E0121 17:20:43.424817 1 instance.go:204] capo-compute "msg"="failed to clean up ports after failure" "error"="Expected HTTP response code [] when accessing [DELETE https://10.46.44.10:13696/v2.0/ports/5b2484b7-6552-4482-876d-f7d2cf1ec279], but got 409 instead\n{\"NeutronError\": {\"type\": \"PortInUseAsTrunkParent\", \"message\": \"Port 5b2484b7-6552-4482-876d-f7d2cf1ec279 is currently a parent port for trunk 395abf86-4b9b-41c9-9e40-def14af36324.\", \"detail\": \"\"}}" "cluster"="openshift-machine-api-ostest-5548v" "machine"="ostest-5548v-worker-1-pdthb" I0121 17:20:43.425514 1 logr.go:252] events "msg"="Warning" "message"="CreateError" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ostest-5548v-worker-1-pdthb","uid":"c90eb71e-260d-4358-b077-2583eacb0635","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"18311"} "reason"="FailedCreate" E0121 17:20:43.595313 1 actuator.go:441] Machine error ostest-5548v-worker-1-pdthb: error creating Openstack instance: error creating Openstack instance: Bad request with: [POST https://10.46.44.10:13774/v2.1/servers], error message: {"badRequest": {"code": 400, "message": "Block Device Mapping is Invalid: failed to get image ostest-5548v-rhcos."}} W0121 17:20:43.595359 1 controller.go:388] ostest-5548v-worker-1-pdthb: failed to create machine: error creating Openstack instance: error creating Openstack instance: Bad request with: [POST https://10.46.44.10:13774/v2.1/servers], error message: {"badRequest": {"code": 400, "message": "Block Device Mapping is Invalid: failed to get image ostest-5548v-rhcos."}} E0121 17:20:43.595419 1 controller.go:317] controller/machine_controller "msg"="Reconciler error" "error"="error creating Openstack instance: error creating Openstack instance: Bad request with: [POST https://10.46.44.10:13774/v2.1/servers], error message: {\"badRequest\": {\"code\": 400, \"message\": \"Block Device Mapping is Invalid: failed to get image ostest-5548v-rhcos.\"}}" "name"="ostest-5548v-worker-1-pdthb" "namespace"="openshift-machine-api" The same is working if TechPrev featureGate is not enabled: $ oc get machines -n openshift-machine-api ostest-5548v-worker-0-f2mmj -o json | jq .spec.providerSpec.value.rootVolume { "availabilityZone": "cinderAZ0", "deviceType": "", "diskSize": 25, "sourceType": "image", "sourceUUID": "ostest-5548v-rhcos", "volumeType": "tripleo" } $ oc get machines -n openshift-machine-api ostest-5548v-worker-0-f2mmj -o json | jq .spec.providerSpec.value.image "" Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Install OCP Cluster enabling the techPrev features including a install-config.yaml section as detailed above. Actual results: Workers not added to the cluster. Expected results: Installation successful. Additional info: must-gather attached
The volumes are created on openstack only for masters: $ o volume list +--------------------------------------+-----------------------+--------+------+------------------------------------------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-----------------------+--------+------+------------------------------------------------+ | a4c14d0f-c88c-4869-8e7d-cae18dd2ed49 | ostest-5548v-master-0 | in-use | 25 | Attached to ostest-5548v-master-0 on /dev/vda | | 76e38ae9-4d1b-4ab9-97ff-dbe438aa3417 | ostest-5548v-master-1 | in-use | 25 | Attached to ostest-5548v-master-1 on /dev/vda | | 90375fe7-9847-4d3e-b4b1-d63c4d04c98e | ostest-5548v-master-2 | in-use | 25 | Attached to ostest-5548v-master-2 on /dev/vda | +--------------------------------------+-----------------------+--------+------+------------------------------------------------+
OCP version: 4.10.0-0.nightly-2022-01-21-074618
This is likely caused by Matt's latest work on volume AZ [1] missing downstream. [1] https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1030
Yes, this is a known limitation. Already addressed upstream, we just need to integrate it in MAPO now.
Verified with OCP 4.11.0-0.nightly-2022-03-29-152521 on top of RHOS-16.1-RHEL-8-20220315.n.1. (MAPO is the default for OpenStack deployments on this version) Verification steps: Deploying a cluster with AZ and root volumes in the install-config.yaml: ``` apiVersion: v1 baseDomain: "shiftstack.com" compute: - name: worker platform: openstack: zones: ['AZhci-0', 'AZhci-1', 'AZhci-2'] additionalNetworkIDs: ['f8b46595-abf1-43cb-b8ca-fdb2aa531c07'] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0', 'cinderAZ1', 'cinderAZ0'] replicas: 3 controlPlane: name: master platform: openstack: zones: ['AZhci-0', 'AZhci-1', 'AZhci-2'] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0', 'cinderAZ1', 'cinderAZ0'] replicas: 3 ``` The openshfit installer finished successfully, and the machines are running.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069