Bug 1889620

Summary: [Azure] - Machineset not scaling when publicIP:true in disconnected Azure enviroment
Product: OpenShift Container Platform Reporter: Milind Yadav <miyadav>
Component: Cloud ComputeAssignee: Danil Grigorev <dgrigore>
Cloud Compute sub component: Other Providers QA Contact: Milind Yadav <miyadav>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: dgrigore, jspeed, zhsun
Version: 4.6Flags: jspeed: needinfo-
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Disconnected installs are not compatible with the publicIP option Consequence: When publicIP was set, machines failed to boot Fix: Prevent users from creating machinesets in disconnected environments that also set publicIP true Result: Users will no longer be able to create MachineSets with this invalid configuration
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:26:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1933676    

Description Milind Yadav 2020-10-20 08:44:52 UTC
Description: Machine not scaling when publicIP:true in disconnected Azure enviroment

4.6.0-0.nightly-2020-10-15-121733

Always Reproducible:

Steps:
1.Copy a valid machineset, in the machineset.yaml, set publicIp: true.
Then create it
Machineset created successfully

2.Scale machineset 
Expected : Machinset should scale successfully
Actual : Machines moves to Failed state with below error-

message: 'failed to create nic machineset-clone-27609-4ck7t-nic for machine machineset-clone-27609-4ck7t: unable to create VM network interface: failed to create network interface machineset-clone-27609-4ck7t-nic in resource group qeci-9755-t59xc-rg: network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=400 – Original Error: Code="NicWithPublicIpCannotReferencePoolWithOutboundRule" Message="OutboundRules for VMs with public IpConfigurations (instance level publicIPs) /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/qeci-9755-t59xc-rg/providers/Microsoft.Network/networkInterfaces/machineset-clone-27609-4ck7t-nic/ipConfigurations/pipConfig are not supported
 
 Additional Info: This is only happening for Azure Disconnected Private clusters , it is working for rest of the configurations ex - AWS , GCP disconnected private clusters etc.

Comment 1 Joel Speed 2020-10-20 09:14:18 UTC
This seems like a reasonable case to add a validation in the webhooks that prevents users from committing this configuration.

I don't think I've ever set up a disconnected Azure install, not sure if we have the ability to do that, perhaps @miyadav might be able to suggest how we can do that?

Do we have anything in the cluster that tells us that the cluster is using a disconnected install that we could check to understand and inform the user of this misconfiguration?

Comment 3 Danil Grigorev 2020-11-02 11:44:54 UTC
Hey @miyadav. I can see the template, but the job was destroyed. Could you describe, what are the main configuration differences between standard and disconnected IPI install on Azure? Can't find documented path to install such cluster. Does the disconnected installation set anything specific on the Infrastructure resource? Otherwise it would probably not be simple to identify this kind of setup. Is it enough to set `publish: Internal` on the Install Config to get such cluster? Or it is only disabling provision for public IPs for Routes?

Comment 4 Danil Grigorev 2020-11-02 12:17:00 UTC
Looking on the flexy template, on the disconnected install there is always a proxy setting. Could you please provide and example Proxy, Infrastructure and Network resources from the disconnected cluster, if they look any different from the default configuration?

Comment 6 Danil Grigorev 2020-11-09 12:06:41 UTC
Looking into the cluster resources, there is one which differentiates disconnected install from a basic cluster.

```
apiVersion: config.openshift.io/v1
kind: DNS
metadata:
  name: cluster
spec:
  baseDomain: ci-ln-phy0rrb-002ac.ci.azure.devcluster.openshift.com
  privateZone:
    id: /subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-ln-phy0rrb-002ac-68m5d-rg/providers/Microsoft.Network/privateDnsZones/ci-ln-phy0rrb-002ac.ci.azure.devcluster.openshift.com
  publicZone: <- this field is absent in disconnected install
    id: /subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/os4-common/providers/Microsoft.Network/dnszones/ci.azure.devcluster.openshift.com
status: {}
```

Fix for this BZ is currently blocked on implementation for checking resources in webhooks - https://github.com/openshift/machine-api-operator/pull/673 @jspeed We need to merge it first

Comment 8 Milind Yadav 2020-11-19 11:04:41 UTC
Validated at : 

4.7.0-0.nightly-2020-11-18-203317

Steps :

Create a machineset with publicIp: true

Result :

      Error from server (providerSpec.publicIP: Forbidden: publicIP is not allowed in Azure disconnected installation): error when creating "STDIN": admission webhook "validation.machineset.machine.openshift.io" denied the request: providerSpec.publicIP: Forbidden: publicIP is not allowed in Azure disconnected installation


Additional Info:
https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Runner-v3/209885/console

Moving to VERIFIED

Comment 11 errata-xmlrpc 2021-02-24 15:26:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633