Bug 1906742
Summary: | [gcp]Machine should be "Failed" when creating a machine with invalid zone | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | sunzhaohua <zhsun> |
Component: | Cloud Compute | Assignee: | Samuel Stuchly <sstuchly> |
Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | mimccune, sstuchly |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-26 11:51:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
sunzhaohua
2020-12-11 10:49:05 UTC
I believe the logs pasted in this example are actually from the MachineSet controller rather than the Machine Controller. We will need to try to reproduce this and grab logs from the Machine controller instead $ oc logs -f machine-api-controllers-bdbc54576-dzjds -c machine-controller I1216 03:53:11.900999 1 controller.go:81] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsungcp16-dczjm-worker-c" "namespace"="openshift-machine-api" I1216 03:53:11.970222 1 controller.go:171] zhsungcp16-dczjm-worker-c-r88kz: reconciling Machine I1216 03:53:11.993600 1 controller.go:261] controller "msg"="Successfully Reconciled" "controller"="machine_controller" "name"="zhsungcp16-dczjm-worker-c-r88kz" "namespace"="openshift-machine-api" I1216 03:53:11.993677 1 controller.go:171] zhsungcp16-dczjm-worker-c-r88kz: reconciling Machine I1216 03:53:11.993690 1 actuator.go:84] zhsungcp16-dczjm-worker-c-r88kz: Checking if machine exists E1216 03:53:12.017657 1 controller.go:104] controllers/MachineSet "msg"="Failed to reconcile MachineSet" "error"="error fetching machine type \"n1-standard-4\": error fetching machine type \"n1-standard-4\" in zone \"us-central1-c-invalid\": googleapi: Error 400: Invalid value for field 'zone': 'us-central1-c-invalid'. Unknown zone., invalid" "machineset"="zhsungcp16-dczjm-worker-c" "namespace"="openshift-machine-api" I1216 03:53:12.019731 1 recorder.go:52] controller-runtime/manager/events "msg"="Warning" "message"="error fetching machine type \"n1-standard-4\": error fetching machine type \"n1-standard-4\" in zone \"us-central1-c-invalid\": googleapi: Error 400: Invalid value for field 'zone': 'us-central1-c-invalid'. Unknown zone., invalid" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"zhsungcp16-dczjm-worker-c","uid":"b599fc79-bdc4-4629-8e00-c0d9bfe9836f","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"66956"} "reason"="ReconcileError" I1216 03:53:12.039892 1 controller.go:261] controller "msg"="Successfully Reconciled" "controller"="machineset" "name"="zhsungcp16-dczjm-worker-c" "namespace"="openshift-machine-api" "reconcilerGroup"="machine.openshift.io" "reconcilerKind"="MachineSet" I1216 03:53:12.040124 1 controller.go:81] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsungcp16-dczjm-worker-c" "namespace"="openshift-machine-api" E1216 03:53:12.098246 1 controller.go:104] controllers/MachineSet "msg"="Failed to reconcile MachineSet" "error"="error fetching machine type \"n1-standard-4\": error fetching machine type \"n1-standard-4\" in zone \"us-central1-c-invalid\": googleapi: Error 400: Invalid value for field 'zone': 'us-central1-c-invalid'. Unknown zone., invalid" "machineset"="zhsungcp16-dczjm-worker-c" "namespace"="openshift-machine-api" I1216 03:53:12.098513 1 recorder.go:52] controller-runtime/manager/events "msg"="Warning" "message"="error fetching machine type \"n1-standard-4\": error fetching machine type \"n1-standard-4\" in zone \"us-central1-c-invalid\": googleapi: Error 400: Invalid value for field 'zone': 'us-central1-c-invalid'. Unknown zone., invalid" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"zhsungcp16-dczjm-worker-c","uid":"b599fc79-bdc4-4629-8e00-c0d9bfe9836f","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"66960"} "reason"="ReconcileError" E1216 03:53:12.117744 1 controller.go:274] zhsungcp16-dczjm-worker-c-r88kz: failed to check if machine exists: zhsungcp16-dczjm-worker-c-r88kz: Machine does not exist E1216 03:53:12.117836 1 controller.go:237] controller "msg"="Reconciler error" "error"="zhsungcp16-dczjm-worker-c-r88kz: Machine does not exist" "controller"="machine_controller" "name"="zhsungcp16-dczjm-worker-c-r88kz" "namespace"="openshift-machine-api" I1216 03:53:12.118632 1 controller.go:261] controller "msg"="Successfully Reconciled" "controller"="machineset" "name"="zhsungcp16-dczjm-worker-c" "namespace"="openshift-machine-api" "reconcilerGroup"="machine.openshift.io" "reconcilerKind"="MachineSet" I1216 03:53:13.118223 1 controller.go:171] zhsungcp16-dczjm-worker-c-r88kz: reconciling Machine I1216 03:53:13.118262 1 actuator.go:84] zhsungcp16-dczjm-worker-c-r88kz: Checking if machine exists E1216 03:53:13.309834 1 controller.go:274] zhsungcp16-dczjm-worker-c-r88kz: failed to check if machine exists: zhsungcp16-dczjm-worker-c-r88kz: Machine does not exist E1216 03:53:13.310026 1 controller.go:237] controller "msg"="Reconciler error" "error"="zhsungcp16-dczjm-worker-c-r88kz: Machine does not exist" "controller"="machine_controller" "name"="zhsungcp16-dczjm-worker-c-r88kz" "namespace"="openshift-machine-api" I1216 03:53:14.310447 1 controller.go:171] zhsungcp16-dczjm-worker-c-r88kz: reconciling Machine We should be able to detect a broken zone based on the 400 response from the exists call. Let's aim to fix this during the next release, setting target to --- until the 4.8 target is created Master is now open for 4.8 fixes so we can start looking into this now We need to be able to determine some way to identify that the zone does not exist, and mark the machine failed, only when the machine has not yet been created. The original proposed solution was too broad and would mark the machine as failed if exists ever failed. Perhaps instead we can make sure that the create call fails correctly when there is an invalid zone, this would be safer I believe Sam explored the various ways that we could potentially fix this issue, but with each one there was a risk that we might leak instances, which we cannot risk. The safest route for now unfortunately is to leave this bug as is, users will be warned when they have made a mistake and should be able to fix it. The only potential fix we could do here would be to make sure that the zone is immutable once created, but that is not guaranteed to be enforced as it has to be done via a webhook. |