Bug 1978396
| Summary: | Cannot trace cloud provider calls and it looks like ocp is executing "DetachLoadBalancerFromSubnets" in aws causing outage | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Anand Paladugu <apaladug> |
| Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> |
| Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> |
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | low | CC: | jspeed, mfedosin, mimccune |
| Version: | 4.7 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-09 12:53:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Anand Paladugu
2021-07-01 17:40:27 UTC
any chance we could get a must-gather, or the logs from the machine-api controllers? This bug seems to be a request for information regarding tracing AWS calls. At the moment, the load balancer attachement for ingress is handled by kube controller manager. If there are issues or the customer wants to know more about what's happening there, they should review the Kube Controller Manager logs. Eventually this will move into a dedicated cloud controller manager (approx 4.11), which might make it easier to trace in the longer term. Anand, is that sufficient for the customer? @jspeed we could not find any info in the default controller manager logs. Do you think trace logs will contain the calls we are issuing to AWS ? Thanks Anand Yes i think they may. By the time you get to about `-v=8` on the logging within KCM, it should log every single network request and response that it makes. In this case you could filter the logs to determine which calls to AWS are being made Ok. Let me try to check the debug logs then. In the future controller, is there a plan to expose the calls at a high level in the normal logs, so customers need not turn on debug logs to trace the calls ? I'm aware that it does log some of the calls it makes, or at least it seems to, but I don't know if it explicitly logs all calls. Something we can log into though Some notes on what's happening here: - The detach call is here [1], there is only one log at level 2 and it doesn't give specific details - The detach call is intended to remove any subnets that are currently attached to the load balancer which are no longer desired [2] - The subnet IDs come from getLoadBalancerSubnets [3] which either reads the subnet IDs from an annotation [4] or works them out from the subnets available in the VPC [5] - Based on the original description, and looking at the code in [5], i can confirm that the lack of the cluster label on the subnets is what caused them to be removed [6] In terms of increasing visibility into this issue, all of the relevant logging will be set at level 2 (-v=2), so going any higher than this doesn't help the situation. Otherwise, I don't think there's much I can recommend. Kubernetes resources rely on being appropriately tagged across many components, admins should not remove these tags as this will interfere with the ownership. I'm not sure if there's anything we can explicitly recommend to the customer here. [1]: https://github.com/kubernetes/cloud-provider-aws/blob/59ae724ba8a09ca5b6266a8452e937e3e99a6953/pkg/providers/v1/aws_loadbalancer.go#L1049 [2]: https://github.com/kubernetes/cloud-provider-aws/blob/59ae724ba8a09ca5b6266a8452e937e3e99a6953/pkg/providers/v1/aws_loadbalancer.go#L1042 [3]: https://github.com/kubernetes/cloud-provider-aws/blob/a1590733fac851b3a27d351c9c80e9b2bf8d6f7e/pkg/providers/v1/aws.go#L4404 [4]: https://github.com/kubernetes/cloud-provider-aws/blob/a1590733fac851b3a27d351c9c80e9b2bf8d6f7e/pkg/providers/v1/aws.go#L3801-L3803 [5]: https://github.com/kubernetes/cloud-provider-aws/blob/a1590733fac851b3a27d351c9c80e9b2bf8d6f7e/pkg/providers/v1/aws.go#L3683 [6]: https://github.com/kubernetes/cloud-provider-aws/blob/a1590733fac851b3a27d351c9c80e9b2bf8d6f7e/pkg/providers/v1/aws.go#L3655-L3656 Customer case was closed The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |