Bug 1813062 - DNS service IP already allocated [NEEDINFO]
Summary: DNS service IP already allocated
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.2.z
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Daneyon Hansen
QA Contact: Hongan Li
URL:
Whiteboard: SDN-CUST-IMPACT
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-12 20:29 UTC by John Coleman
Modified: 2020-09-11 15:19 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-06 20:09:04 UTC
Target Upstream Version:
fhirtz: needinfo? (dhansen)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-dns-operator pull 182 None closed Bug 1813062: ensures service IP is allocatable 2020-09-14 18:13:07 UTC
Github openshift cluster-dns-operator pull 187 None closed Bug 1813062: Updates Status Reconciliation to Support DNS Service 2020-09-14 18:13:08 UTC

Description John Coleman 2020-03-12 20:29:04 UTC
Description of problem:

apiserver pods were in a CrashLoopBackoff state:

$ oc get pods -n openshift-apiserver
NAME              READY   STATUS             RESTARTS   AGE
apiserver-78vlp   0/1     CrashLoopBackOff   11         36m
apiserver-kvmq6   0/1     CrashLoopBackOff   11         36m
apiserver-zw4kx   0/1     CrashLoopBackOff   11         36m

Logs for the apiserver pods:

W0226 21:42:15.234536       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {etcd.openshift-etcd.svc:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: operation was canceled". Reconnecting...
W0226 21:42:15.234556       1 asm_amd64.s:1337] Failed to dial etcd.openshift-etcd.svc:2379: grpc: the connection is closing; please retry.
F0226 21:42:15.234539       1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 openshift.io {[https://etcd.openshift-etcd.svc:2379] /var/run/secrets/etcd-client/tls.key /var/run/secrets/etcd-client/tls.crt /var/run/configmaps/etcd-serving-ca/ca-bundle.crt} false true {0xc000ebfe60 0xc000ebfef0} {{apps.openshift.io v1} [{apps.openshift.io } {apps.openshift.io }] false} <nil> 5m0s 1m0s}), err (context deadline exceeded)

$ oc4 get events -n openshift-apiserver
LAST SEEN   TYPE      REASON             OBJECT                MESSAGE
43m         Normal    Scheduled          pod/apiserver-78vlp   Successfully assigned openshift-apiserver/apiserver-78vlp to domain-name
43m         Warning   FailedScheduling   pod/apiserver-78vlp   Binding rejected: Operation cannot be fulfilled on pods/binding "apiserver-78vlp": pod apiserver-78vlp is already assigned to node "domain-name"

We created a debug pod and attempted to curl etcd and could not reach it:

  $ oc4 debug ds/apiserver -n openshift-apiserver
  ...
  sh-4.2# curl -vk https://etcd.openshift-etcd.svc:2379
  * Could not resolve host: etcd.openshift-etcd.svc; Unknown error
  * Closing connection 0
  curl: (6) Could not resolve host: etcd.openshift-etcd.svc; Unknown error

This issue was solved by inspecting the DNS operator logs:

openshift-dns-operator
----------------------
2020-02-27T21:24:19.449200823Z time="2020-02-27T21:24:19Z" level=error msg="failed to reconcile request /default: failed to ensure dns default: failed to create service for dns default: failed to create dns service: Service \"dns-default\" is invalid: spec.clusterIP: Invalid value: \"10.253.159.10\": provided IP is already allocated"

Then having the customer check to see if IP was already in use, which we found that it was.  We had the customer remove the conflicting service, and the apiserver pods went into a running state.

We believe that it must have been a timing issue during the original installation or the Service was deleted at some point and another Service pending creation quickly snagged the IP.

Version-Release number of selected component (if applicable):

OpenShift 4.2.z

How reproducible:

This occurred on 2 out of 5 clusters within customers infrastructure.

Additional info:

One thing to note is that the customer did tweak the SDN and Service CIDR's, they also installed with Calico plugin.  See below:

oc3 describe network.config/cluster
-----------------------------------
Name:         cluster
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         Network
Metadata:
  Creation Timestamp:  2020-02-28T02:30:18Z
  Generation:          2
  Resource Version:    2345
  Self Link:           /apis/config.openshift.io/v1/networks/cluster
  UID:                 4281aa32-59d2-11ea-98dc-005056bfc1a6
Spec:
  Cluster Network:
    Cidr:         10.253.144.0/23
    Host Prefix:  26
  External IP:
    Policy:
  Network Type:  Calico
  Service Network:
    10.253.158.0/24
Status:
  Cluster Network:
    Cidr:               10.253.144.0/23
    Host Prefix:        26
  Cluster Network MTU:  1410
  Network Type:         Calico
  Service Network:
    10.253.158.0/24
Events:  <none>

Comment 1 Dan Mace 2020-03-13 12:11:22 UTC
Seems unlikely this is a regression, so I'm moving it out of the 4.4 release.

Comment 2 Daneyon Hansen 2020-03-30 21:21:47 UTC
network.config/cluster shows a service cidr of 10.253.158.0/24. This means available host addresses are 10.253.158.1-.254. The service IP assigned to the default cluster dns service is 10.253.159.10, which is outside of the service cidr scope. This appears to be a the results of a degraded state caused by a network.config/cluster misconfiguration.

Comment 5 Daneyon Hansen 2020-04-03 22:18:19 UTC
SDN IPAM should not be allocating the <SERVICE_CIDR>.10 address. This address is reserved for the DNS service IP. I see the Calico plugin is in play, so maybe a static IP was assigned [1] to the openshift-marketplace service? Reassigning to the SDN team to provide input.

[1] https://docs.projectcalico.org/v3.10/networking/use-specific-ip

Comment 6 Alexander Constantinescu 2020-04-06 20:09:04 UTC
Hi

> One thing to note is that the customer did tweak the SDN and Service CIDR's, they also installed with Calico plugin.

Tweaking the service CIDR is not supported post cluster installation. This could the reason for the IP collision. 

I will have to close this bug as WONTFIX. 

-Alex

Comment 11 Maciej Szulik 2020-05-05 14:56:14 UTC
NodeIPAMController although it's part of kube-controller-manager is under the networking team, so moving accordingly.

Comment 16 Daneyon Hansen 2020-05-14 19:58:09 UTC
The dns operator allocates the 10th IP from the network config serviceCIDR [1] to the DNS service cluster IP [2].

The marketplace operator should not be getting installed before the dns operator, see [3][4][5] for details. It does not appear that the marketplace operator follows the run level schema identified in [3]. However, I followed the Calico install guide [6] for OCP and had no issue installing a cluster using 4.5.0-0.nightly-2020-05-04-113741:

$ openshift-install create cluster
INFO Consuming OpenShift Install (Manifests) from target directory 
<SNIP>
INFO Install complete! 

$ oc get all -n calico-system
NAME                                           READY   STATUS    RESTARTS   AGE
pod/calico-kube-controllers-558b5bb4fc-n82wz   1/1     Running   0          22m
pod/calico-node-2m4n8                          1/1     Running   0          22m
pod/calico-node-55wld                          1/1     Running   0          11m
pod/calico-node-6ddqf                          1/1     Running   0          22m
pod/calico-node-fzxfj                          1/1     Running   0          22m
pod/calico-node-p7wlt                          1/1     Running   0          11m
pod/calico-node-zjtpg                          1/1     Running   0          11m
pod/calico-typha-759d74b7d9-cf7td              1/1     Running   0          20m
pod/calico-typha-759d74b7d9-f6rd5              1/1     Running   0          22m
pod/calico-typha-759d74b7d9-j28ff              1/1     Running   0          10m
pod/calico-typha-759d74b7d9-rh6hf              1/1     Running   0          20m

NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/calico-typha   ClusterIP   172.30.250.232   <none>        5473/TCP   22m

NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/calico-node   6         6         6       6            6           <none>          22m

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/calico-kube-controllers   1/1     1            1           22m
deployment.apps/calico-typha              4/4     4            4           22m

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/calico-kube-controllers-558b5bb4fc   1         1         1       22m
replicaset.apps/calico-typha-759d74b7d9              4         4         4       22m

Note: I did not select the optional step in the Calico install guide.

Steps for reproducing the bz is needed.

[1] https://github.com/openshift/cluster-dns-operator/blob/master/pkg/operator/controller/controller.go#L378-L399
[2] https://github.com/openshift/cluster-dns-operator/blob/master/pkg/operator/controller/controller_dns_service.go#L72-L74
[3] https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level
[4] https://github.com/openshift/cluster-dns-operator/tree/master/manifests
[5] https://github.com/operator-framework/operator-marketplace/tree/master/manifests
[6] https://docs.projectcalico.org/getting-started/openshift/installation

Comment 18 Daneyon Hansen 2020-05-15 17:16:13 UTC
> Could you please revise the dns-operator to handle such a case?

If we allow the DNS service IP to be arbitrary, I'm confident cluster operators (the human kind), app devs, etc are going to be very unhappy. <service_cidr>.10 has been used for DNS a long time by OCP and upstream k8s installs (kubeadm). Having a consistent IP for key infra endpoints (apiserver, default gateways, DNS, etc.) simplifies management and troubleshooting. I think we need to figure out a way to continue using .10. Since k8s does not provide a mechanism for reserving an IP from the service cidr, let me see if we can create the DNS service as part of the initial install.

Comment 19 Daneyon Hansen 2020-05-18 19:03:04 UTC
Blocked waiting for feedback [1] regarding the installer's ability to create the DNS Service early during the install process to ensure DNS always gets the <service_cidr>.10 address.

[1] https://coreos.slack.com/archives/C68TNFWA2/p1589563877199300

Comment 20 Andrew McDermott 2020-05-19 15:06:20 UTC
Moving to 4.6.

Comment 21 Daneyon Hansen 2020-06-16 16:53:42 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 22 Andrew McDermott 2020-07-09 12:14:10 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 23 Andrew McDermott 2020-07-30 10:17:15 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 27 Daneyon Hansen 2020-08-06 18:05:36 UTC
@Frank,

The dns operator reserves the <SERVICE_CIDR>.10 address (see https://bugzilla.redhat.com/show_bug.cgi?id=1813062#c16 [1] for details). This address is reserved because the kubelet is configured by MCO with the same <SERVICE_CIDR>.10. Making this address configurable in the dns operator is a breaking change. Please create an RFE or open an MCO bug that references this BZ for additional background.


[1] https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/network/dns/dns.go
[2] https://github.com/openshift/machine-config-operator/blob/master/pkg/operator/render.go#L114
[3] https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/_base/files/kubelet.yaml#L15

Comment 35 Brandon Anderson 2020-08-27 17:43:25 UTC
Hi, I've got another case 02666014 that appears to be related to this issue. Are there any updates on the QA process that I can relay to the cu?

Comment 36 Daneyon Hansen 2020-08-27 18:05:24 UTC
Brandon,

Note that the PR waiting to be QE'd is to surface dns operator status conditions for this use case and does not fix the underlying issue. As previously mentioned, the <service_cidr>.10 IP is required for the DNS Service IP. For standard installations this is not an issue due to components following CVO run-levels:

https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level

Comment 38 Hongan Li 2020-08-28 02:15:59 UTC
verified with 4.6.0-0.nightly-2020-08-27-005538 and passed.

test step:
1. disable dns operator temporarily
   $ oc -n openshift-dns-operator scale deploy/dns-operator --replicas=0
2. delete the existing dns-default service that taking the IP 172.30.0.10.
   $ oc -n openshift-dns delete svc dns-default
3. create another test service that taking the IP 172.30.0.10
   $ oc create service clusterip test-dns --tcp=53:53 --clusterip="172.30.0.10"
4. enable dns operator
   $ oc -n openshift-dns-operator scale deploy/dns-operator --replicas=1
5. check the dns operator and it should be degraded and shows message "No IP assigned to DNS service".

$ oc get co/dns
NAME   VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
dns    4.6.0-0.nightly-2020-08-27-005538   False       True          True       95s

$ oc get dnses.operator.openshift.io default -oyaml
apiVersion: operator.openshift.io/v1
kind: DNS
<---snip--->
spec: {}
status:
  clusterDomain: cluster.local
  clusterIP: ""
  conditions:
  - lastTransitionTime: "2020-08-28T01:54:47Z"
    message: No IP assigned to DNS service
    reason: NoServiceIP
    status: "True"
    type: Degraded
  - lastTransitionTime: "2020-08-28T01:54:47Z"
    message: No IP assigned to DNS service
    reason: Reconciling
    status: "True"
    type: Progressing
  - lastTransitionTime: "2020-08-28T01:54:47Z"
    message: No IP assigned to DNS service
    reason: NoServiceIP
    status: "False"
    type: Available

Comment 43 Daneyon Hansen 2020-09-08 16:16:47 UTC
> To note, I *do* see the upstream discussion on this as having the "fix" option just be to allow the DNS operator to fallback and use a different address if .10 is taken (I presume that's behind the direction that you were going with this), but is that the only realistic option? I'd think that it'd be a difficult case to make since our concern is a bit nebulous (we don't know what/who having a non-.10 DNS address might break).

Other options exist, but require an RFE. At this point, .10 is required for the DNS service IP.

> As noted, we can split this to a new bz/issue but the client hits this with some regularity and this bz shifted tack about mid-way from "making it work" to fixing the error handling (which is useful, but the cluster still doesn't reliably build). I'm trying to see how to best go about solving that.

We need to be provided a reproducer to consider other alternatives. It would be helpful to get Calico involved since the issue is related to running OCP with their networking plugin.


Note You need to log in before you can comment on or make changes to this bug.