Bug 2081788
| Summary: | MetalLB: the crds are not validated until metallb is deployed | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Arti Sood <asood> |
| Component: | Networking | Assignee: | Federico Paolinelli <fpaoline> |
| Networking sub component: | Metal LB | QA Contact: | Arti Sood <asood> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | fpaoline |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:10:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 1
Federico Paolinelli
2022-05-05 07:45:53 UTC
I am having the opposite issue: the webhooks are created together with the deployment, while CRDs are created together with the operator. So there are no controls over the configuration unless metallb is created (which is as wrong, we'll need to find a way to fix it). @asood maybe the scenario came from an upgrade where the webhook was already in place? Would you mind trying to reproduce again? @fpaoline This was not the upgrade case as I have not had successful upgrade from 4.10->4.11 so far.
I cannot reproduce with current version metallb-operator.4.11.0-202205242136 or prior version like metallb-operator.4.11.0-202205131159.
The bug was filed against metallb-operator.4.11.0-202205021102.
I am observing that validation module may not be kicking in at all because I was able to create an addresspool below:-
apiVersion: metallb.io/v1beta1
kind: AddressPool
metadata:
name: example
namespace: metallb-system
spec:
addresses:
- 172.31.249.75/33
autoAssign: true
avoidBuggyIPs: false
protocol: layer2
With 4.10 released version of operator I get an error "admission webhook "addresspoolvalidationwebhook.metallb.io" denied the request: Failed to parse addresses for example: invalid CIDR "172.31.249.228/33" in pool example: invalid CIDR "172.31.249.228/33"
(In reply to Arti Sood from comment #7) > @fpaoline This was not the upgrade case as I have not had > successful upgrade from 4.10->4.11 so far. > > I cannot reproduce with current version metallb-operator.4.11.0-202205242136 > or prior version like metallb-operator.4.11.0-202205131159. > > The bug was filed against metallb-operator.4.11.0-202205021102. > > I am observing that validation module may not be kicking in at all because I > was able to create an addresspool below:- > > > apiVersion: metallb.io/v1beta1 > kind: AddressPool > metadata: > name: example > namespace: metallb-system > spec: > addresses: > - 172.31.249.75/33 > autoAssign: true > avoidBuggyIPs: false > protocol: layer2 > > > With 4.10 released version of operator I get an error "admission webhook > "addresspoolvalidationwebhook.metallb.io" denied the request: Failed to > parse addresses for example: invalid CIDR "172.31.249.228/33" in pool > example: invalid CIDR "172.31.249.228/33" Right, which is what I meant in https://bugzilla.redhat.com/show_bug.cgi?id=2081788#c6 I'd suggest closing this and filing a new one (which I am kind of already working on) to track the lack of check issue. @fpaoline
I could reproduce it. The root cause is still metallb CR not yet created.
1. Create the metallb CR on all the worker nodes.
2. While all the pods are coming up with all the required containers, try to create ipaddresspool to see the error.
Danger alert:Error
Fix the following errors:
Error "failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": no endpoints available for service "webhook-service"" for field "undefined".
Right, but that is normal as when deploying both the webhooks AND the endpoints, the webhook will be created immediately while it may take some time to pull the image for the endpoints. We can't do anything about that. What we need to avoid is to have the webhook deployed in one time and the endpoints deployed in another, as I thought originally the bug was about. Changing to ON_QA as all the prs related to this change were merged yesterday. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |