Bug 2081788 - MetalLB: the crds are not validated until metallb is deployed
Summary: MetalLB: the crds are not validated until metallb is deployed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Federico Paolinelli
QA Contact: Arti Sood
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-04 15:51 UTC by Arti Sood
Modified: 2022-08-10 11:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:10:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift metallb-operator pull 90 0 None open Bug 2081788: Ocp/align150622 2022-06-15 11:56:13 UTC
Github openshift metallb-operator pull 91 0 None open Bug 2081788: CSV: fix the webhook type 2022-06-16 12:24:32 UTC
Github openshift metallb pull 64 0 None open Bug 2081788: Ocp/align160622 2022-06-16 10:44:21 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:10:42 UTC

Comment 1 Federico Paolinelli 2022-05-05 07:45:53 UTC
Is this with or without deploying the metallb crd first?
Looks like the controller was not running

Comment 6 Federico Paolinelli 2022-05-25 10:24:59 UTC
I am having the opposite issue: the webhooks are created together with the deployment, while CRDs are created together with the operator.
So there are no controls over the configuration unless metallb is created (which is as wrong, we'll need to find a way to fix it).

@asood maybe the scenario came from an upgrade where the webhook was already in place?
Would you mind trying to reproduce again?

Comment 7 Arti Sood 2022-05-25 15:21:49 UTC
@fpaoline  This was not the upgrade case as I have not had successful upgrade from 4.10->4.11 so far.

I cannot reproduce with current version metallb-operator.4.11.0-202205242136 or prior version like metallb-operator.4.11.0-202205131159.

The bug was filed against metallb-operator.4.11.0-202205021102.

I am observing that validation module may not be kicking in at all because I was able to create an addresspool below:-


apiVersion: metallb.io/v1beta1
kind: AddressPool
metadata:
  name: example
  namespace: metallb-system
spec:
  addresses:
    - 172.31.249.75/33
  autoAssign: true
  avoidBuggyIPs: false
  protocol: layer2


With 4.10 released version of operator I get an error "admission webhook "addresspoolvalidationwebhook.metallb.io" denied the request: Failed to parse addresses for example: invalid CIDR "172.31.249.228/33" in pool example: invalid CIDR "172.31.249.228/33"

Comment 8 Federico Paolinelli 2022-05-26 10:28:19 UTC
(In reply to Arti Sood from comment #7)
> @fpaoline  This was not the upgrade case as I have not had
> successful upgrade from 4.10->4.11 so far.
> 
> I cannot reproduce with current version metallb-operator.4.11.0-202205242136
> or prior version like metallb-operator.4.11.0-202205131159.
> 
> The bug was filed against metallb-operator.4.11.0-202205021102.
> 
> I am observing that validation module may not be kicking in at all because I
> was able to create an addresspool below:-
> 
> 
> apiVersion: metallb.io/v1beta1
> kind: AddressPool
> metadata:
>   name: example
>   namespace: metallb-system
> spec:
>   addresses:
>     - 172.31.249.75/33
>   autoAssign: true
>   avoidBuggyIPs: false
>   protocol: layer2
> 
> 
> With 4.10 released version of operator I get an error "admission webhook
> "addresspoolvalidationwebhook.metallb.io" denied the request: Failed to
> parse addresses for example: invalid CIDR "172.31.249.228/33" in pool
> example: invalid CIDR "172.31.249.228/33"

Right, which is what I meant in https://bugzilla.redhat.com/show_bug.cgi?id=2081788#c6 
I'd suggest closing this and filing a new one (which I am kind of already working on) to track the lack of check issue.

Comment 9 Arti Sood 2022-06-07 20:52:01 UTC
@fpaoline

I could reproduce it. The root cause is still metallb CR not yet created.

1. Create the metallb CR on all the worker nodes.
2. While all the pods are coming up with all the required containers, try to create ipaddresspool to see the error.




Danger alert:Error
Fix the following errors:

    Error "failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": no endpoints available for service "webhook-service"" for field "undefined".

Comment 10 Federico Paolinelli 2022-06-08 07:40:45 UTC
Right, but that is normal as when deploying both the webhooks AND the endpoints, the webhook will be created immediately while it may take some time to pull the image for the endpoints.
We can't do anything about that. What we need to avoid is to have the webhook deployed in one time and the endpoints deployed in another, as I thought originally the bug was about.

Comment 11 Federico Paolinelli 2022-06-17 09:13:42 UTC
Changing to ON_QA as all the prs related to this change were merged yesterday.

Comment 14 errata-xmlrpc 2022-08-10 11:10:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.