Description of problem: While attempting to create an additional machineset on an azure cluster for a node to host workload testing ( or another example could be infra nodes) Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-07-31-162901 How reproducible: For this build it is reproducible Steps to Reproduce: 1. Deploy Azure cluster via IPI installer 2. Create a machineset with https://gist.github.com/akrzos/24880453b050047e11723c28ad778154 3. View logs from machine-controller pod Actual results: Error in logs - Machine error: failed to reconcile machine "akrzos-test-w5t9j-workload-centralus1-6kznm"s: failed to create nic akrzos-test-w5t9j-workload-centralus1-6kznm-nic for machine akrzos-test-w5t9j-workload-centralus1-6kznm: unable to create Public IP: cannot create public ip: network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidDomainNameLabel" Message="The domain name label akrzos-test-w5t9j-akrzos-test-w5t9j-workload-centralus1-6kznm-publicip is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$." Details=[] Expected results: Machinset to be created and machine to be provisioned in cluster and added to ocp cluster Additional info: Afterwards attempting to delete the machineset results in a panic in the pod - https://gist.github.com/akrzos/e3617c5bb8be4bb15c2c9521542613d9
The problem is the public IP resource name: akrzos-test-w5t9j-akrzos-test-w5t9j-workload-centralus1-6kznm-publicip It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$. So it can be at most 63 characters long. In your case it's 70 characters long. Is there any way to make the name shorter? I.e. making the machineset name shorter? The public IP name is constructed as CLUSTERID+MACHINENAME+"publicip".
we should also probably drop "publicip" from the name
(In reply to Jan Chaloupka from comment #1) > The problem is the public IP resource name: > akrzos-test-w5t9j-akrzos-test-w5t9j-workload-centralus1-6kznm-publicip > > It must conform to the following regular expression: > ^[a-z][a-z0-9-]{1,61}[a-z0-9]$. > > So it can be at most 63 characters long. In your case it's 70 characters > long. Is there any way to make the name shorter? I.e. making the machineset > name shorter? The public IP name is constructed as > CLUSTERID+MACHINENAME+"publicip". I will use a smaller machineset name, however I was just "copying" the same names as the worker node machinesets.
> we should also probably drop "publicip" from the name Even that will not help. I don't see any other way but to generate the name randomly. We can still use CLUSTERID prefix at least and maybe first xxx characters of the machine name so the fixed length is 50. Randomize the last 13 chars.
how would reduce characters won't help? also what's the value of having "publicip" in the name?
> how would reduce characters won't help? also what's the value of having "publicip" in the name? "Even that will not help" = "Even that will not be sufficient"
PR: https://github.com/openshift/cluster-api-provider-azure/pull/69
Another PR related to the issue was merged: https://github.com/openshift/cluster-api-provider-azure/pull/72 Instead of generating random names, we error when the name is too long. In that case either machine name generated by the machineset need to be made shorter or the publicIp field needs to be set to false.
@Jan, I create a machine set "publicIP: true" "name: zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5", logs output "unable to create Public IP: machine public IP name is longer than 63 characters" Then I delete machine zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5, the machine couldn't be deleted. I0823 02:17:42.160824 1 controller.go:141] Reconciling Machine "zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5" I0823 02:17:42.160861 1 controller.go:310] Machine "zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0823 02:17:42.160876 1 actuator.go:200] Checking if machine zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5 exists I0823 02:17:42.354384 1 controller.go:259] Reconciling machine object zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5 triggers idempotent create. I0823 02:17:42.354410 1 actuator.go:93] Creating machine zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5 E0823 02:17:42.366884 1 actuator.go:87] Machine error: failed to reconcile machine "zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5"s: failed to create nic zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5-nic for machine zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5: unable to create Public IP: machine public IP name is longer than 63 characters W0823 02:17:42.366903 1 controller.go:261] Failed to create machine "zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5": requeue in: 1m0s $ oc delete machine zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5 machine.machine.openshift.io "zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5" deleted ^C I0823 03:34:52.378589 1 controller.go:205] Reconciling machine "zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5" triggers delete I0823 03:34:52.378596 1 actuator.go:128] Deleting machine zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5 I0823 03:34:52.379291 1 virtualmachines.go:225] deleting vm zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5 I0823 03:34:52.614725 1 virtualmachines.go:242] successfully deleted vm zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5 I0823 03:34:52.614751 1 disks.go:49] deleting disk zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5_OSDisk I0823 03:34:52.650286 1 disks.go:65] successfully deleted disk zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5_OSDisk I0823 03:34:52.650319 1 networkinterfaces.go:178] deleting nic zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5-nic I0823 03:34:52.712865 1 networkinterfaces.go:197] successfully deleted nic zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5-nic E0823 03:34:52.727330 1 actuator.go:87] Machine error: failed to delete machine "zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5": unable to create Public IP: machine public IP name is longer than 63 characters E0823 03:34:52.727348 1 controller.go:220] Failed to delete machine "zhsun4-5b994-worker-centralus1-test1-test2-test3-test4-test5": requeue in: 1m0s I0823 03:34:52.727360 1 controller.go:364] Actuator returned requeue-after error: requeue in: 1m0s
Fix for the deletion case: https://github.com/openshift/cluster-api-provider-azure/pull/75 The generated name is longer than allowed by the Azure portal under following OR conditions: - machine name is changed (can't happen without creating a new CR) - cluster name is changed (could happen but then we will get different name anyway) - machine CR was created with too long public ip name (in which case no instance was created) - machine config was edited and the publicIP field was set to true (no public ip resource is created after an instance was created) In all cases there is nothing to delete. So the deletion can be skipped.
@sunzhaohua, https://github.com/openshift/cluster-api-provider-azure/pull/75 just merged.
Verified. $ oc delete machine zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5 machine.machine.openshift.io "zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5" deleted $ oc logs -f machine-api-controllers-7b97cbd9f4-h8mgj -c machine-controller 0830 02:54:55.238249 1 controller.go:310] Machine "zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0830 02:54:55.238267 1 controller.go:205] Reconciling machine "zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5" triggers delete I0830 02:54:55.238278 1 actuator.go:128] Deleting machine zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5 I0830 02:54:55.239184 1 virtualmachines.go:225] deleting vm zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5 I0830 02:54:55.525176 1 virtualmachines.go:242] successfully deleted vm zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5 I0830 02:54:55.525204 1 disks.go:49] deleting disk zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5_OSDisk I0830 02:54:55.564315 1 disks.go:65] successfully deleted disk zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5_OSDisk I0830 02:54:55.564392 1 networkinterfaces.go:178] deleting nic zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5-nic I0830 02:54:55.696393 1 networkinterfaces.go:197] successfully deleted nic zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5-nic I0830 02:54:55.696423 1 reconciler.go:466] Generated public IP name was too long, skipping deletion of the resource E0830 02:54:55.736213 1 controller.go:235] Failed to remove finalizer from machine "zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5": Operation cannot be fulfilled on machines.machine.openshift.io "zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5": the object has been modified; please apply your changes to the latest version and try again I0830 02:54:56.736498 1 controller.go:141] Reconciling Machine "zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5" I0830 02:54:56.736531 1 controller.go:310] Machine "zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0830 02:54:56.736549 1 controller.go:205] Reconciling machine "zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5" triggers delete I0830 02:54:56.736556 1 actuator.go:128] Deleting machine zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5 I0830 02:54:56.737503 1 virtualmachines.go:225] deleting vm zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5 I0830 02:54:56.975689 1 virtualmachines.go:242] successfully deleted vm zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5 I0830 02:54:56.975715 1 disks.go:49] deleting disk zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5_OSDisk I0830 02:54:57.022344 1 disks.go:65] successfully deleted disk zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5_OSDisk I0830 02:54:57.022380 1 networkinterfaces.go:178] deleting nic zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5-nic I0830 02:54:57.147936 1 networkinterfaces.go:197] successfully deleted nic zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5-nic I0830 02:54:57.147973 1 reconciler.go:466] Generated public IP name was too long, skipping deletion of the resource I0830 02:54:57.184181 1 controller.go:239] Machine "zhsun5-swwlm-worker-centralus1-test1-test2-test3-test4-test5" deletion successful
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922