Bug 1943376

Summary: ingress-operator doesn't send always send helpful error messages to install-log when it fails to come up
Product: OpenShift Container Platform Reporter: Chris Collins <chris.collins>
Component: NetworkingAssignee: Ryan Fredette <rfredette>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED EOL Docs Contact:
Severity: medium    
Priority: high CC: amcdermo, eparis, mfisher, mmasters, wking
Version: 4.7Keywords: ServiceDeliveryImpact
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-04 15:19:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Collins 2021-03-25 22:27:26 UTC
Description of problem:

Unsure of component to route to: ingress would be most appropriate, but I was unable to find that specifically.  Perhaps cloud provider would be next most appropriate.

OSD ROSA cluster w/BYOC VPC install failed. Cause appears to be tracked back to lack of available IPs in the AWS subnet.

> 42m        Warning   SyncLoadBalancerFailed   service/router-default   (combined from similar events): Error syncing load balancer: failed to ensure load balancer: InvalidSubnet: Not enough IP space available in subnet-xxxxxxxx. ELB requires at least 8 free IP addresses in each subnet.

This causes the ingress and console clusteroperators to fail.

Discussion within the team suggested filing this BZ with a request for a check of available IP space and a message printed clearly to the install log if not enough space is available.

Version-Release number of selected component (if applicable): OSD OCP 4.7.2 on AWS

How reproducible: Have not reproduced, but presumably would be possible with a subnet lacking available IPs

Steps to Reproduce: N/A

Actual results: Cluster failed to complete install. API server up and available, but ingress and console operators degraded.  External access to cluster unavailable.

Expected results: Identify lack of available IPs for ingress ELB and halt (or enter pending state) install, printing results to the log.

Additional info:

Comment 1 Greg Sheremeta 2021-03-26 12:08:39 UTC
@mmasters @sgreene this is another case where we need this exact log message printed right in the openshift-install log.

Error syncing load balancer: failed to ensure load balancer: InvalidSubnet: Not enough IP space available in subnet-xxxxxxxx. ELB requires at least 8 free IP addresses in each subnet.

Because that is a message we want to 1. make super obvious to the users of ROSA and OpenShift Dedicated, 2. use in Hive to transform into a nice error code for both the users and Red Hat SRE.

Comment 5 Greg Sheremeta 2021-08-12 00:10:38 UTC
We had a similar problem today where we had to dig this out of a must-gather:
`failed to describe elb load balancers: InvalidClientTokenId: The security token included in the request is invalid\n\tstatus code: 403`

@mmasters @sgreene this is another case where we need this exact log message printed right in the openshift-install log.

Can we please prioritize this?

Comment 8 Miciah Dashiel Butler Masters 2022-01-27 17:20:08 UTC
Moving off of 4.10.0; we'll get this in a later release.

Comment 10 mfisher 2022-11-04 15:19:03 UTC
This issue is stale and closed because it has no activity for a significant amount of time and is reported on a version no longer in maintenance.  If this issue should not be closed please verify the condition still exists on a supported release and submit an updated bug.