Attempt to re-create a load balancer svc with a different IP fails. The endpoint remains exposed via the previously set IP. Scenario: I successfully used metallb to expose cluster's API via some IP. Later, wanted to change the IP for API, so I deleted the created service and created a new one (from the same subnet). The endpoint doesn't become reachable via the new IP and remains reachable via the old IP. oc get addresspool -n metallb-system api-addresspool -o yaml apiVersion: metallb.io/v1alpha1 kind: AddressPool metadata: creationTimestamp: "2022-01-11T04:07:19Z" generation: 3 name: api-addresspool namespace: metallb-system resourceVersion: "657147" uid: 7d24174b-6681-454c-ac26-f8e4a22736c9 spec: addresses: - 172.22.0.230-172.22.0.230 autoAssign: true protocol: layer2 ############################################################ oc get network cluster -o yaml apiVersion: config.openshift.io/v1 kind: Network metadata: creationTimestamp: "2022-01-11T02:57:17Z" generation: 4 name: cluster resourceVersion: "59121" uid: a7a62170-f7e5-40b3-b153-3c18b0634a15 spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 externalIP: policy: allowedCIDRs: - 172.22.0.0/24 networkType: OpenShiftSDN serviceNetwork: - 172.30.0.0/16 status: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 clusterNetworkMTU: 1450 networkType: OpenShiftSDN serviceNetwork: - 172.30.0.0/16 ############################################################ oc get svc -n openshift-kube-apiserver metallb-api-service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE metallb-api-service LoadBalancer 172.30.86.108 172.22.0.200,172.22.0.230 6443:32589/TCP 11m [root@sealusa34 ansible_tests]# oc get svc -n openshift-kube-apiserver metallb-api-service -o yaml apiVersion: v1 kind: Service metadata: annotations: metallb.universe.tf/address-pool: api-addresspool creationTimestamp: "2022-01-11T19:29:40Z" name: metallb-api-service namespace: openshift-kube-apiserver resourceVersion: "657223" uid: 2999c4c8-21e8-4b78-9c03-f7ff91e640cb spec: allocateLoadBalancerNodePorts: true clusterIP: 172.30.86.108 clusterIPs: - 172.30.86.108 externalIPs: - 172.22.0.230 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: http nodePort: 32589 port: 6443 protocol: TCP targetPort: 6443 selector: app: openshift-kube-apiserver sessionAffinity: None type: LoadBalancer status: loadBalancer: ingress: - ip: 172.22.0.200 Previously the service was created for IP 172.22.0.200 (and it worked). Then I deleted the service and created a new one with 172.22.0.230... No success reaching the new endpoint. curl 172.22.0.200:6443 # Original endpoint - works Client sent an HTTP request to an HTTPS server. curl 172.22.0.230:6443 # new endpoint - doesn't work curl: (7) Failed to connect to 172.22.0.230 port 6443: No route to host
Version: OCP 4.9.12 metallb: metallb-operator.4.9.0-202112142229
When you say you change the ip of the service, how do you do that? I don't see the spec.LoadBalancerIP filed, so metallb will take a random ip from the address pool and use it. Can you share the manifests?
I see externalIPs: - 172.22.0.230 being used here as well. The right field to use is loadbalancerIP
Just tried to create the service and addresspool and then patch both with new IP several times. Observations: 1. Upon creation - everything works as expected. (172.22.0.200) 2. Attempted to change to a new IP both the svc and addresspool - didn't work. (172.22.0.210) 3. Attempted to change again to a new IP both the svc and addresspool - worked as expected. (172.22.0.220) 4. Attempted to change to a new IP both the svc and addresspool - didn't work. (172.22.0.230) 5. Attempted to change again to a new IP both the svc and addresspool - worked as expected. (172.22.0.240) 6. Attempted to change to a new IP both the svc and addresspool - didn't work. (172.22.0.250) oc get svc -n openshift-kube-apiserver metallb-api-service -o yaml apiVersion: v1 kind: Service metadata: annotations: metallb.universe.tf/address-pool: api-addresspool creationTimestamp: "2022-01-12T17:33:40Z" name: metallb-api-service namespace: openshift-kube-apiserver resourceVersion: "813537" uid: 86fdd641-6f45-4d02-8ade-05f8a80b89a8 spec: allocateLoadBalancerNodePorts: true clusterIP: 172.30.4.43 clusterIPs: - 172.30.4.43 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack loadBalancerIP: 172.22.0.250 ports: - name: http nodePort: 31115 port: 6443 protocol: TCP targetPort: 6443 selector: app: openshift-kube-apiserver sessionAffinity: None type: LoadBalancer status: loadBalancer: {} oc get addresspools.metallb.io -o yaml apiVersion: v1 items: - apiVersion: metallb.io/v1alpha1 kind: AddressPool metadata: creationTimestamp: "2022-01-12T17:33:39Z" generation: 8 name: api-addresspool namespace: metallb-system resourceVersion: "813489" uid: 2f68235f-3e82-4f56-8128-ae401b9329bb spec: addresses: - 172.22.0.250-172.22.0.250 autoAssign: true protocol: layer2 kind: List metadata: resourceVersion: "" selfLink: "" 7. Attempted to change again to a new IP both the svc and addresspool - worked as expected. (172.22.0.150) oc get addresspools.metallb.io -o yaml apiVersion: v1 items: - apiVersion: metallb.io/v1alpha1 kind: AddressPool metadata: creationTimestamp: "2022-01-12T17:33:39Z" generation: 8 name: api-addresspool namespace: metallb-system resourceVersion: "813489" uid: 2f68235f-3e82-4f56-8128-ae401b9329bb spec: addresses: - 172.22.0.250-172.22.0.250 autoAssign: true protocol: layer2 kind: List metadata: resourceVersion: "" selfLink: "" [root@sealusa34 ansible_tests]# oc get addresspools.metallb.io -o yaml apiVersion: v1 items: - apiVersion: metallb.io/v1alpha1 kind: AddressPool metadata: creationTimestamp: "2022-01-12T17:33:39Z" generation: 9 name: api-addresspool namespace: metallb-system resourceVersion: "816879" uid: 2f68235f-3e82-4f56-8128-ae401b9329bb spec: addresses: - 172.22.0.150-172.22.0.150 autoAssign: true protocol: layer2 kind: List metadata: resourceVersion: "" selfLink: "" oc get svc -n openshift-kube-apiserver metallb-api-service -o yaml apiVersion: v1 kind: Service metadata: annotations: metallb.universe.tf/address-pool: api-addresspool creationTimestamp: "2022-01-12T17:33:40Z" name: metallb-api-service namespace: openshift-kube-apiserver resourceVersion: "816933" uid: 86fdd641-6f45-4d02-8ade-05f8a80b89a8 spec: allocateLoadBalancerNodePorts: true clusterIP: 172.30.4.43 clusterIPs: - 172.30.4.43 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack loadBalancerIP: 172.22.0.150 ports: - name: http nodePort: 31115 port: 6443 protocol: TCP targetPort: 6443 selector: app: openshift-kube-apiserver sessionAffinity: None type: LoadBalancer status: loadBalancer: ingress: - ip: 172.22.0.150 So it works every second time.
you realize that your address-pool range is just one IP so everytime u change the svc IP you need to update the pool range so if I took when of the failing runs I see this error in the speaker log {"caller":"main.go:237","error":"assigned IP not allowed by config","ip":"172.22.0.220","msg":"IP allocated by controller not allowed by config","op":"setBalancer","service":"openshift-kube-apiserver/metallb-api-service","ts":"2022-01-12T17:40:26.3772146Z"} and there should be an info event showing with this error this error comes from the speaker main loop if it can't find pool for the for loadbalancerIP
Exactly the same method (automated - first we create/change the addresspool and then the service) was used 6 times with the following IPs: 172.22.0.210 172.22.0.220 172.22.0.230 172.22.0.240 172.22.0.250 172.22.0.150 The log entry for "172.22.0.220" - this one actually worked as expected (same for 172.22.0.240 and 172.22.0.150).
here is what happens we have addresspool with just 1 IP and we apply svc with loadBalancerIP set to that IP which endup allocating this IP for example address-pool: - 172.22.0.250-172.22.0.250 then the svc will have this IP and everything is good. update addresspool 1st then svc =============================== now we edit the same addresspool and change it to 172.22.0.251-172.22.0.251 metallb controller will invoke SetConfig()->SetPools() since its the same addresspool so it thinks it already been allocated and have 172.22.0.251 now we try to find the pool that has 172.22.0.250 "already allocated IP" in the pool CIDR definition which now has 172.22.0.251 and we return "" we can't find such pool and this error comes up {"caller":"level.go:63","configmap":"metallb-system/config","error":"new config not compatible with assigned IPs: service \"default/nginx\" cannot own [\"172.22.0.250\"] under new config","level":"error","msg":"applying new configuration failed","op":"setConfig","ts":"2022-01-12T21:30:37.873683013Z"} svc stay in pending state and after ~3sec k8s will delete the service {"caller":"level.go:63","event":"serviceDeleted","level":"info","msg":"service deleted","service":"default/nginx","ts":"2022-01-12T21:32:40.212428145Z"} update svc 1st then addresspool =============================== however if we update the svc first it will invoke SetBalancer() which will trigger IPAM address allocation which will fail and get rid of the prev allocated IP now when update addresspool to match svc IP it will go and allocate {"caller":"level.go:63","error":"[\"172.22.0.251\"] is not allowed in config","level":"error","msg":"IP allocation failed","op":"allocateIPs","service":"default/nginx","ts":"2022-01-12T22:29:03.854318793Z"} {"caller":"level.go:63","configmap":"metallb-system/config","event":"configLoaded","level":"info","msg":"config (re)loaded","ts":"2022-01-12T22:29:41.381103058Z"} {"caller":"level.go:63","event":"ipAllocated","ip":["172.22.0.251"],"level":"info","msg":"IP address assigned by controller","service":"default/nginx","ts":"2022-01-12T22:29:41.386954354Z"} {"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"default/nginx","ts":"2022-01-12T22:29:41.395010516Z"}
It looks like there's a fix for this upstream https://github.com/metallb/metallb/pull/1028 Is there a chance for this fix to be backported to 4.9?
If we first reconfigure the service and then the addresspool - then the issue is not observed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.17 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0195