Bug 1889620 - [Azure] - Machineset not scaling when publicIP:true in disconnected Azure enviroment
Summary: [Azure] - Machineset not scaling when publicIP:true in disconnected Azure env...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.6
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.7.0
Assignee: Danil Grigorev
QA Contact: Milind Yadav
URL:
Whiteboard:
Depends On:
Blocks: 1933676
TreeView+ depends on / blocked
 
Reported: 2020-10-20 08:44 UTC by Milind Yadav
Modified: 2021-03-01 12:33 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Disconnected installs are not compatible with the publicIP option Consequence: When publicIP was set, machines failed to boot Fix: Prevent users from creating machinesets in disconnected environments that also set publicIP true Result: Users will no longer be able to create MachineSets with this invalid configuration
Clone Of:
Environment:
Last Closed: 2021-02-24 15:26:50 UTC
Target Upstream Version:
Embargoed:
jspeed: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 746 0 None closed Bug 1889620: Azure disconnected reject publicIP setting 2021-02-16 17:08:39 UTC
Github openshift machine-api-operator pull 749 0 None closed Bug 1889620: Warn MachineSet with publicIp set in disconnected install 2021-02-16 17:08:40 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:27:08 UTC

Description Milind Yadav 2020-10-20 08:44:52 UTC
Description: Machine not scaling when publicIP:true in disconnected Azure enviroment

4.6.0-0.nightly-2020-10-15-121733

Always Reproducible:

Steps:
1.Copy a valid machineset, in the machineset.yaml, set publicIp: true.
Then create it
Machineset created successfully

2.Scale machineset 
Expected : Machinset should scale successfully
Actual : Machines moves to Failed state with below error-

message: 'failed to create nic machineset-clone-27609-4ck7t-nic for machine machineset-clone-27609-4ck7t: unable to create VM network interface: failed to create network interface machineset-clone-27609-4ck7t-nic in resource group qeci-9755-t59xc-rg: network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=400 – Original Error: Code="NicWithPublicIpCannotReferencePoolWithOutboundRule" Message="OutboundRules for VMs with public IpConfigurations (instance level publicIPs) /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/qeci-9755-t59xc-rg/providers/Microsoft.Network/networkInterfaces/machineset-clone-27609-4ck7t-nic/ipConfigurations/pipConfig are not supported
 
 Additional Info: This is only happening for Azure Disconnected Private clusters , it is working for rest of the configurations ex - AWS , GCP disconnected private clusters etc.

Comment 1 Joel Speed 2020-10-20 09:14:18 UTC
This seems like a reasonable case to add a validation in the webhooks that prevents users from committing this configuration.

I don't think I've ever set up a disconnected Azure install, not sure if we have the ability to do that, perhaps @miyadav might be able to suggest how we can do that?

Do we have anything in the cluster that tells us that the cluster is using a disconnected install that we could check to understand and inform the user of this misconfiguration?

Comment 3 Danil Grigorev 2020-11-02 11:44:54 UTC
Hey @miyadav. I can see the template, but the job was destroyed. Could you describe, what are the main configuration differences between standard and disconnected IPI install on Azure? Can't find documented path to install such cluster. Does the disconnected installation set anything specific on the Infrastructure resource? Otherwise it would probably not be simple to identify this kind of setup. Is it enough to set `publish: Internal` on the Install Config to get such cluster? Or it is only disabling provision for public IPs for Routes?

Comment 4 Danil Grigorev 2020-11-02 12:17:00 UTC
Looking on the flexy template, on the disconnected install there is always a proxy setting. Could you please provide and example Proxy, Infrastructure and Network resources from the disconnected cluster, if they look any different from the default configuration?

Comment 6 Danil Grigorev 2020-11-09 12:06:41 UTC
Looking into the cluster resources, there is one which differentiates disconnected install from a basic cluster.

```
apiVersion: config.openshift.io/v1
kind: DNS
metadata:
  name: cluster
spec:
  baseDomain: ci-ln-phy0rrb-002ac.ci.azure.devcluster.openshift.com
  privateZone:
    id: /subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-ln-phy0rrb-002ac-68m5d-rg/providers/Microsoft.Network/privateDnsZones/ci-ln-phy0rrb-002ac.ci.azure.devcluster.openshift.com
  publicZone: <- this field is absent in disconnected install
    id: /subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/os4-common/providers/Microsoft.Network/dnszones/ci.azure.devcluster.openshift.com
status: {}
```

Fix for this BZ is currently blocked on implementation for checking resources in webhooks - https://github.com/openshift/machine-api-operator/pull/673 @jspeed We need to merge it first

Comment 8 Milind Yadav 2020-11-19 11:04:41 UTC
Validated at : 

4.7.0-0.nightly-2020-11-18-203317

Steps :

Create a machineset with publicIp: true

Result :

      Error from server (providerSpec.publicIP: Forbidden: publicIP is not allowed in Azure disconnected installation): error when creating "STDIN": admission webhook "validation.machineset.machine.openshift.io" denied the request: providerSpec.publicIP: Forbidden: publicIP is not allowed in Azure disconnected installation


Additional Info:
https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Runner-v3/209885/console

Moving to VERIFIED

Comment 11 errata-xmlrpc 2021-02-24 15:26:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.