Description of problem: Create a machine with invalid label "machine.openshift.io/cluster-api-cluster: zhsun3-8vcmx-invalid", the machine could be created successfully but the intance could join the cluster. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-14-211610 How reproducible: Always Steps to Reproduce: 1. Create a machine with invalid label apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: labels: machine.openshift.io/cluster-api-cluster: zhsun3-8vcmx-invalid machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker name: zhsun3-8vcmx-w-a-1 namespace: openshift-machine-api spec: metadata: creationTimestamp: null providerSpec: value: apiVersion: gcpprovider.openshift.io/v1beta1 canIPForward: false credentialsSecret: name: gcp-cloud-credentials deletionProtection: false disks: - autoDelete: true boot: true image: zhsun3-8vcmx-rhcos-image labels: null sizeGb: 128 type: pd-ssd kind: GCPMachineProviderSpec machineType: n1-standard-4 metadata: creationTimestamp: null networkInterfaces: - network: zhsun3-8vcmx-network subnetwork: zhsun3-8vcmx-worker-subnet projectID: openshift-gce-devel region: us-central1 serviceAccounts: - email: zhsun3-8vcmx-w.gserviceaccount.com scopes: - https://www.googleapis.com/auth/cloud-platform tags: - zhsun3-8vcmx-worker userDataSecret: name: worker-user-data zone: us-central1-a 2. Check machine, node and machine-controller logs Actual results: Machine could be created successful and instance could join the cluster. $ oc get machine NAME STATE TYPE REGION ZONE AGE zhsun3-8vcmx-m-0 6h57m zhsun3-8vcmx-m-1 6h57m zhsun3-8vcmx-m-2 6h57m zhsun3-8vcmx-w-a-1 27m zhsun3-8vcmx-w-a-tdkpr 6h56m zhsun3-8vcmx-w-b-8dh2m 4h34m zhsun3-8vcmx-w-c-k5fvd 4h37m $ oc get node NAME STATUS ROLES AGE VERSION zhsun3-8vcmx-m-0.c.openshift-gce-devel.internal Ready master 6h39m v1.14.0+24b552f85 zhsun3-8vcmx-m-1.c.openshift-gce-devel.internal Ready master 6h39m v1.14.0+24b552f85 zhsun3-8vcmx-m-2.c.openshift-gce-devel.internal Ready master 6h40m v1.14.0+24b552f85 zhsun3-8vcmx-w-a-1.c.openshift-gce-devel.internal Ready worker 3m6s v1.14.0+24b552f85 zhsun3-8vcmx-w-a-tdkpr.c.openshift-gce-devel.internal Ready worker 6h31m v1.14.0+24b552f85 zhsun3-8vcmx-w-b-8dh2m.c.openshift-gce-devel.internal Ready worker 4h10m v1.14.0+24b552f85 zhsun3-8vcmx-w-c-k5fvd.c.openshift-gce-devel.internal Ready worker 4h13m v1.14.0+24b552f85 I0815 07:41:00.579531 1 controller.go:141] Reconciling Machine "zhsun3-8vcmx-w-a-1" I0815 07:41:00.579583 1 controller.go:310] Machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0815 07:41:00.579596 1 actuator.go:80] zhsun3-8vcmx-w-a-1: Checking if machine exists I0815 07:41:01.154451 1 reconciler.go:258] zhsun3-8vcmx-w-a-1: Machine does not exist I0815 07:41:01.154502 1 controller.go:259] Reconciling machine object zhsun3-8vcmx-w-a-1 triggers idempotent create. I0815 07:41:01.154509 1 actuator.go:62] zhsun3-8vcmx-w-a-1: Creating machine I0815 07:41:02.522064 1 reconciler.go:151] zhsun3-8vcmx-w-a-1: Reconciling machine object with cloud state I0815 07:41:02.683244 1 reconciler.go:197] zhsun3-8vcmx-w-a-1: machine status is "PROVISIONING", requeuing... W0815 07:41:02.683333 1 controller.go:261] Failed to create machine "zhsun3-8vcmx-w-a-1": requeue in: 20s I0815 07:41:02.683346 1 controller.go:364] Actuator returned requeue-after error: requeue in: 20s I0815 07:41:22.683673 1 controller.go:141] Reconciling Machine "zhsun3-8vcmx-w-a-1" I0815 07:41:22.683711 1 controller.go:310] Machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0815 07:41:22.683722 1 actuator.go:80] zhsun3-8vcmx-w-a-1: Checking if machine exists I0815 07:41:23.020515 1 reconciler.go:253] Machine "zhsun3-8vcmx-w-a-1" already exists I0815 07:41:23.020666 1 controller.go:250] Reconciling machine "zhsun3-8vcmx-w-a-1" triggers idempotent update I0815 07:41:23.020674 1 actuator.go:98] zhsun3-8vcmx-w-a-1: Updating machine I0815 07:41:23.021227 1 reconciler.go:151] zhsun3-8vcmx-w-a-1: Reconciling machine object with cloud state I0815 07:41:23.249948 1 controller.go:141] Reconciling Machine "zhsun3-8vcmx-w-a-1" I0815 07:41:23.251611 1 controller.go:310] Machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0815 07:41:23.251756 1 actuator.go:80] zhsun3-8vcmx-w-a-1: Checking if machine exists I0815 07:41:23.627532 1 reconciler.go:253] Machine "zhsun3-8vcmx-w-a-1" already exists I0815 07:41:23.627710 1 controller.go:250] Reconciling machine "zhsun3-8vcmx-w-a-1" triggers idempotent update I0815 07:41:23.627753 1 actuator.go:98] zhsun3-8vcmx-w-a-1: Updating machine I0815 07:41:23.628281 1 reconciler.go:151] zhsun3-8vcmx-w-a-1: Reconciling machine object with cloud state E0815 07:41:23.786336 1 controller.go:252] Error updating machine "openshift-machine-api/zhsun3-8vcmx-w-a-1": [machinescope] failed to update machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api": Operation cannot be fulfilled on machines.machine.openshift.io "zhsun3-8vcmx-w-a-1": the object has been modified; please apply your changes to the latest version and try again I0815 07:41:24.786765 1 controller.go:141] Reconciling Machine "zhsun3-8vcmx-w-a-1" I0815 07:41:24.786822 1 controller.go:310] Machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0815 07:41:24.786839 1 actuator.go:80] zhsun3-8vcmx-w-a-1: Checking if machine exists I0815 07:41:25.106501 1 reconciler.go:253] Machine "zhsun3-8vcmx-w-a-1" already exists I0815 07:41:25.106542 1 controller.go:250] Reconciling machine "zhsun3-8vcmx-w-a-1" triggers idempotent update I0815 07:41:25.106597 1 actuator.go:98] zhsun3-8vcmx-w-a-1: Updating machine I0815 07:41:25.107248 1 reconciler.go:151] zhsun3-8vcmx-w-a-1: Reconciling machine object with cloud state Expected results: machine-controller logs output label is not correct. Additional info:
machine.openshift.io/cluster-api-cluster labels is completely ignored by gcp machine controller. In case of aws, the label is used to tag instances to state cluster ownership. In case of gcp it seems we don't do this yet. We should do the same for GCP: https://cloud.google.com/compute/docs/labeling-resources
Another thing we need to address for this: openshift-install destroy cluster only deletes instances that have the right prefix. We need to coordinate with the installer to ensure we cleanup all instances with appropriate tags, and those tags should come from a cluster-level resource rather than a machine-level resource.
bug is expecting that actuator should have failed machine creation because of invalid label. Two things: 1. This looks like machine object validation issue, which we dont do at the moment. instead actuator issue 2. Behaviour is consistent with aws i.e if machine is created with invalid cluster name label, node gets registered 3. tagging gcp instances, as Jan suggested, should be done orthogonally to this BZ and is unrelated
Instances tag through https://github.com/openshift/cluster-api-provider-gcp/pull/57
s/tag/labelled