Bug 2081788

Summary:	MetalLB: the crds are not validated until metallb is deployed
Product:	OpenShift Container Platform	Reporter:	Arti Sood <asood>
Component:	Networking	Assignee:	Federico Paolinelli <fpaoline>
Networking sub component:	Metal LB	QA Contact:	Arti Sood <asood>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	fpaoline
Version:	4.11
Target Milestone:	---
Target Release:	4.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-08-10 11:10:18 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 1 Federico Paolinelli 2022-05-05 07:45:53 UTC

Is this with or without deploying the metallb crd first?
Looks like the controller was not running

Comment 6 Federico Paolinelli 2022-05-25 10:24:59 UTC

I am having the opposite issue: the webhooks are created together with the deployment, while CRDs are created together with the operator.
So there are no controls over the configuration unless metallb is created (which is as wrong, we'll need to find a way to fix it).

@asood maybe the scenario came from an upgrade where the webhook was already in place?
Would you mind trying to reproduce again?

Comment 7 Arti Sood 2022-05-25 15:21:49 UTC

@fpaoline  This was not the upgrade case as I have not had successful upgrade from 4.10->4.11 so far.

I cannot reproduce with current version metallb-operator.4.11.0-202205242136 or prior version like metallb-operator.4.11.0-202205131159.

The bug was filed against metallb-operator.4.11.0-202205021102.

I am observing that validation module may not be kicking in at all because I was able to create an addresspool below:-


apiVersion: metallb.io/v1beta1
kind: AddressPool
metadata:
  name: example
  namespace: metallb-system
spec:
  addresses:
    - 172.31.249.75/33
  autoAssign: true
  avoidBuggyIPs: false
  protocol: layer2


With 4.10 released version of operator I get an error "admission webhook "addresspoolvalidationwebhook.metallb.io" denied the request: Failed to parse addresses for example: invalid CIDR "172.31.249.228/33" in pool example: invalid CIDR "172.31.249.228/33"

Comment 8 Federico Paolinelli 2022-05-26 10:28:19 UTC

(In reply to Arti Sood from comment #7)
> @fpaoline  This was not the upgrade case as I have not had
> successful upgrade from 4.10->4.11 so far.
> 
> I cannot reproduce with current version metallb-operator.4.11.0-202205242136
> or prior version like metallb-operator.4.11.0-202205131159.
> 
> The bug was filed against metallb-operator.4.11.0-202205021102.
> 
> I am observing that validation module may not be kicking in at all because I
> was able to create an addresspool below:-
> 
> 
> apiVersion: metallb.io/v1beta1
> kind: AddressPool
> metadata:
>   name: example
>   namespace: metallb-system
> spec:
>   addresses:
>     - 172.31.249.75/33
>   autoAssign: true
>   avoidBuggyIPs: false
>   protocol: layer2
> 
> 
> With 4.10 released version of operator I get an error "admission webhook
> "addresspoolvalidationwebhook.metallb.io" denied the request: Failed to
> parse addresses for example: invalid CIDR "172.31.249.228/33" in pool
> example: invalid CIDR "172.31.249.228/33"

Right, which is what I meant in https://bugzilla.redhat.com/show_bug.cgi?id=2081788#c6 
I'd suggest closing this and filing a new one (which I am kind of already working on) to track the lack of check issue.

Comment 9 Arti Sood 2022-06-07 20:52:01 UTC

@fpaoline

I could reproduce it. The root cause is still metallb CR not yet created.

1. Create the metallb CR on all the worker nodes.
2. While all the pods are coming up with all the required containers, try to create ipaddresspool to see the error.




Danger alert:Error
Fix the following errors:

    Error "failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": no endpoints available for service "webhook-service"" for field "undefined".

Comment 10 Federico Paolinelli 2022-06-08 07:40:45 UTC

Right, but that is normal as when deploying both the webhooks AND the endpoints, the webhook will be created immediately while it may take some time to pull the image for the endpoints.
We can't do anything about that. What we need to avoid is to have the webhook deployed in one time and the endpoints deployed in another, as I thought originally the bug was about.

Comment 11 Federico Paolinelli 2022-06-17 09:13:42 UTC

Changing to ON_QA as all the prs related to this change were merged yesterday.

Comment 14 errata-xmlrpc 2022-08-10 11:10:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069