Bug 2107303

Summary: Error message is misleading while creating a service type Load Balancer in Azure Cloud
Product: OpenShift Container Platform Reporter: Himank <hchaturv>
Component: Cloud ComputeAssignee: dmoiseev
Cloud Compute sub component: Cloud Controller Manager QA Contact: sunzhaohua <zhsun>
Status: CLOSED DEFERRED Docs Contact:
Severity: low    
Priority: low CC: dmoiseev, mmasters, skharat
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-09 01:24:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Himank 2022-07-14 17:27:43 UTC
Description of problem:

When creating a service type load balancer the error message is misleading if a wrong subnet name is provided.

OpenShift release version:
4.10

Cluster Platform:
Azure

How reproducible:
Always

Steps to Reproduce (in detail):

1. Create a service using the below yaml with false subnet name:

~~~
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
    service.beta.kubernetes.io/azure-load-balancer-internal-subnet: apps-subnet  
  name: test-internal
  namespace: case-03245390
spec:
  ports:
  - name: 8080-9090
    port: 8080
    protocol: TCP
    targetPort: 9090
  selector:
    app: test-internal
  type: LoadBalancer

~~~

2. Here apps-subnet is the wrong subnet that doesn't exist. 

3. Error message, in this case, should highlight that apps-subnet doesn't exist and message should be something like `failed to get subnet: VNET-XXX/apps-subnet`.

Actual results:

Error message always display the message something as below:

~~~
failed to get subnet: VNET-XXX/SN-XXXX
~~~

Here SN-XXX is the subnet fetched from cm cloud-config.


Expected results:

The error message should highlight the name of the wrong subnet that is being used in the service definition for ease of troubleshooting.

Impact of the problem:

Low

Additional info:

When KCM debug logs are enabled this information is visible in the logs.

Comment 1 Miciah Dashiel Butler Masters 2022-07-20 19:07:26 UTC
Currently, OpenShift uses the legacy, in-tree cloud provider implementation for Azure.  This appears to be the code that emits the log message of concern: https://github.com/openshift/kubernetes/blob/c3ad486c907cdf30ee97c9a7e7052a823a49c34b/staging/src/k8s.io/legacy-cloud-providers/azure/azure_loadbalancer.go#L924

Upstream is only accepting critical fixes for the legacy cloud provider implementations.

However, OpenShift will use the out-of-tree cloud provider implementation in the future.  Here is the equivalent code in the out-of-tree implementation: https://github.com/kubernetes-sigs/cloud-provider-azure/blob/addec968471666e946c55675ec2bf4deae073384/pkg/provider/azure_loadbalancer.go#L1276

Because this logic is in the cloud provider code, I'm re-assign this BZ to the Cloud Infrastructure team.

Comment 7 Shiftzilla 2023-03-09 01:24:45 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9396