Bug 1741759 - [gcp] Machine could be created successfully even label is not correct
Summary: [gcp] Machine could be created successfully even label is not correct
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.3.0
Assignee: Vikas Choudhary
QA Contact: sunzhaohua
URL:
Whiteboard: gcp
Depends On:
Blocks: 1742227
TreeView+ depends on / blocked
 
Reported: 2019-08-16 05:56 UTC by sunzhaohua
Modified: 2019-09-04 11:27 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1742227 (view as bug list)
Environment:
Last Closed: 2019-08-29 11:55:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description sunzhaohua 2019-08-16 05:56:02 UTC
Description of problem:
Create a machine with invalid label "machine.openshift.io/cluster-api-cluster: zhsun3-8vcmx-invalid", the machine could be created successfully but the intance could join the cluster.

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-08-14-211610

How reproducible:
Always

Steps to Reproduce:
1.  Create a machine with invalid label
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: zhsun3-8vcmx-invalid
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
  name: zhsun3-8vcmx-w-a-1
  namespace: openshift-machine-api

spec:
  metadata:
    creationTimestamp: null
  providerSpec:
    value:
      apiVersion: gcpprovider.openshift.io/v1beta1
      canIPForward: false
      credentialsSecret:
        name: gcp-cloud-credentials
      deletionProtection: false
      disks:
      - autoDelete: true
        boot: true
        image: zhsun3-8vcmx-rhcos-image
        labels: null
        sizeGb: 128
        type: pd-ssd
      kind: GCPMachineProviderSpec
      machineType: n1-standard-4
      metadata:
        creationTimestamp: null
      networkInterfaces:
      - network: zhsun3-8vcmx-network
        subnetwork: zhsun3-8vcmx-worker-subnet
      projectID: openshift-gce-devel
      region: us-central1
      serviceAccounts:
      - email: zhsun3-8vcmx-w.gserviceaccount.com
        scopes:
        - https://www.googleapis.com/auth/cloud-platform
      tags:
      - zhsun3-8vcmx-worker
      userDataSecret:
        name: worker-user-data
      zone: us-central1-a
      
2. Check machine, node and machine-controller logs


Actual results:
Machine could be created successful and instance could join the cluster.

$ oc get machine
NAME                     STATE   TYPE   REGION   ZONE   AGE
zhsun3-8vcmx-m-0                                        6h57m
zhsun3-8vcmx-m-1                                        6h57m
zhsun3-8vcmx-m-2                                        6h57m
zhsun3-8vcmx-w-a-1                                      27m
zhsun3-8vcmx-w-a-tdkpr                                  6h56m
zhsun3-8vcmx-w-b-8dh2m                                  4h34m
zhsun3-8vcmx-w-c-k5fvd                                  4h37m

$ oc get node
NAME                                                    STATUS   ROLES    AGE     VERSION
zhsun3-8vcmx-m-0.c.openshift-gce-devel.internal         Ready    master   6h39m   v1.14.0+24b552f85
zhsun3-8vcmx-m-1.c.openshift-gce-devel.internal         Ready    master   6h39m   v1.14.0+24b552f85
zhsun3-8vcmx-m-2.c.openshift-gce-devel.internal         Ready    master   6h40m   v1.14.0+24b552f85
zhsun3-8vcmx-w-a-1.c.openshift-gce-devel.internal       Ready    worker   3m6s    v1.14.0+24b552f85
zhsun3-8vcmx-w-a-tdkpr.c.openshift-gce-devel.internal   Ready    worker   6h31m   v1.14.0+24b552f85
zhsun3-8vcmx-w-b-8dh2m.c.openshift-gce-devel.internal   Ready    worker   4h10m   v1.14.0+24b552f85
zhsun3-8vcmx-w-c-k5fvd.c.openshift-gce-devel.internal   Ready    worker   4h13m   v1.14.0+24b552f85

I0815 07:41:00.579531       1 controller.go:141] Reconciling Machine "zhsun3-8vcmx-w-a-1"
I0815 07:41:00.579583       1 controller.go:310] Machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0815 07:41:00.579596       1 actuator.go:80] zhsun3-8vcmx-w-a-1: Checking if machine exists
I0815 07:41:01.154451       1 reconciler.go:258] zhsun3-8vcmx-w-a-1: Machine does not exist
I0815 07:41:01.154502       1 controller.go:259] Reconciling machine object zhsun3-8vcmx-w-a-1 triggers idempotent create.
I0815 07:41:01.154509       1 actuator.go:62] zhsun3-8vcmx-w-a-1: Creating machine
I0815 07:41:02.522064       1 reconciler.go:151] zhsun3-8vcmx-w-a-1: Reconciling machine object with cloud state
I0815 07:41:02.683244       1 reconciler.go:197] zhsun3-8vcmx-w-a-1: machine status is "PROVISIONING", requeuing...
W0815 07:41:02.683333       1 controller.go:261] Failed to create machine "zhsun3-8vcmx-w-a-1": requeue in: 20s
I0815 07:41:02.683346       1 controller.go:364] Actuator returned requeue-after error: requeue in: 20s
I0815 07:41:22.683673       1 controller.go:141] Reconciling Machine "zhsun3-8vcmx-w-a-1"
I0815 07:41:22.683711       1 controller.go:310] Machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0815 07:41:22.683722       1 actuator.go:80] zhsun3-8vcmx-w-a-1: Checking if machine exists
I0815 07:41:23.020515       1 reconciler.go:253] Machine "zhsun3-8vcmx-w-a-1" already exists
I0815 07:41:23.020666       1 controller.go:250] Reconciling machine "zhsun3-8vcmx-w-a-1" triggers idempotent update
I0815 07:41:23.020674       1 actuator.go:98] zhsun3-8vcmx-w-a-1: Updating machine
I0815 07:41:23.021227       1 reconciler.go:151] zhsun3-8vcmx-w-a-1: Reconciling machine object with cloud state
I0815 07:41:23.249948       1 controller.go:141] Reconciling Machine "zhsun3-8vcmx-w-a-1"
I0815 07:41:23.251611       1 controller.go:310] Machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0815 07:41:23.251756       1 actuator.go:80] zhsun3-8vcmx-w-a-1: Checking if machine exists
I0815 07:41:23.627532       1 reconciler.go:253] Machine "zhsun3-8vcmx-w-a-1" already exists
I0815 07:41:23.627710       1 controller.go:250] Reconciling machine "zhsun3-8vcmx-w-a-1" triggers idempotent update
I0815 07:41:23.627753       1 actuator.go:98] zhsun3-8vcmx-w-a-1: Updating machine
I0815 07:41:23.628281       1 reconciler.go:151] zhsun3-8vcmx-w-a-1: Reconciling machine object with cloud state
E0815 07:41:23.786336       1 controller.go:252] Error updating machine "openshift-machine-api/zhsun3-8vcmx-w-a-1": [machinescope] failed to update machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api": Operation cannot be fulfilled on machines.machine.openshift.io "zhsun3-8vcmx-w-a-1": the object has been modified; please apply your changes to the latest version and try again
I0815 07:41:24.786765       1 controller.go:141] Reconciling Machine "zhsun3-8vcmx-w-a-1"
I0815 07:41:24.786822       1 controller.go:310] Machine "zhsun3-8vcmx-w-a-1" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0815 07:41:24.786839       1 actuator.go:80] zhsun3-8vcmx-w-a-1: Checking if machine exists
I0815 07:41:25.106501       1 reconciler.go:253] Machine "zhsun3-8vcmx-w-a-1" already exists
I0815 07:41:25.106542       1 controller.go:250] Reconciling machine "zhsun3-8vcmx-w-a-1" triggers idempotent update
I0815 07:41:25.106597       1 actuator.go:98] zhsun3-8vcmx-w-a-1: Updating machine
I0815 07:41:25.107248       1 reconciler.go:151] zhsun3-8vcmx-w-a-1: Reconciling machine object with cloud state

Expected results:
machine-controller logs output label is not correct.

Additional info:

Comment 1 Jan Chaloupka 2019-08-16 10:43:51 UTC
machine.openshift.io/cluster-api-cluster labels is completely ignored by gcp machine controller. In case of aws, the label is used to tag instances to state cluster ownership. In case of gcp it seems we don't do this yet.

We should do the same for GCP: https://cloud.google.com/compute/docs/labeling-resources

Comment 2 Michael Gugino 2019-08-16 16:51:13 UTC
Another thing we need to address for this: openshift-install destroy cluster only deletes instances that have the right prefix.  We need to coordinate with the installer to ensure we cleanup all instances with appropriate tags, and those tags should come from a cluster-level resource rather than a machine-level resource.

Comment 3 Vikas Choudhary 2019-08-29 11:55:10 UTC
bug is expecting that actuator should have failed machine creation because of invalid label. Two things:
1. This looks like machine object validation issue, which we dont do at the moment. instead actuator issue
2. Behaviour is consistent with aws i.e if machine is created with invalid cluster name label, node gets registered
3. tagging gcp instances, as Jan suggested, should be done orthogonally to this BZ and is unrelated

Comment 4 Jan Chaloupka 2019-09-04 11:27:10 UTC
Instances tag through https://github.com/openshift/cluster-api-provider-gcp/pull/57

Comment 5 Jan Chaloupka 2019-09-04 11:27:26 UTC
s/tag/labelled


Note You need to log in before you can comment on or make changes to this bug.