Bug 1718389
| Summary: | [Doc] Can't connect to link-local addresses from cri-o container | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jan Safranek <jsafrane> | |
| Component: | Documentation | Assignee: | Jason Boxman <jboxman> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Sunil Choudhary <schoudha> | |
| Severity: | high | Docs Contact: | Vikram Goyal <vigoyal> | |
| Priority: | high | |||
| Version: | 4.4 | CC: | aos-bugs, bbennett, chaoyang, danclark, danw, dcbw, ddelcian, dwalsh, jcall, jhocutt, jokerman, mmccomas, tsmetana, wsun | |
| Target Milestone: | --- | Keywords: | Reopened | |
| Target Release: | 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1734600 (view as bug list) | Environment: | ||
| Last Closed: | 2020-06-02 18:52:40 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1734600 | |||
|
Description
Jan Safranek
2019-06-07 15:47:07 UTC
The cloud metadata IP is blocked in Openshift 4. What is your use case for accessing it? > The cloud metadata IP is blocked in Openshift 4. What is your use case for accessing it?
AWS EBS CSI driver reads topology information (=region, zone) for each node from link-local address. Usual zone/region labels on nodes cannot be used, because CSI is independent on Kubernetes and can't use these labels.
Gotcha. For the time being, is it possible for you to run the EBS CSI driver in HostNetwork? It has no restrictions. For the time being yes, hostNetwork is usable workaround. We don't ship AWS CSI driver in 4.2, we use it for testing of CSI bits in Kubernetes there, so for 4.2 we seem to be OK with hostNetwork. Update this bug's priority due to we need to ship csi in 4.2 *** Bug 1734600 has been marked as a duplicate of this bug. *** (In reply to Chao Yang from comment #6) > we need to ship csi in 4.2 We were told before that we didn't need to address this for 4.2. Now it's after feature freeze now and too late to design a proper fix. Simply reverting the change and going back to insecure-by-default is not a good option. I've temporarily moved this to target 4.2.0 so Casey will look at it when he gets back from PTO but it will probably just get moved back out of the 4.2 blockers. Correct, we're long past any reasonable timeframe for 4.2. IIRC, the proposed workaround (running the cloud-provider specific components host-network) is acceptable. If we need to re-think this, we'll do it as part of the 4.3 planning cycle. I'm considering closing this and moving it to the RFE board. My customer hit this issue. I had them apply the "hostNetwork: true" workaround to the deployment configuration of their PV snapshot-controller deployment config. In the meantime, I've also filed the following DOC FIX bug : https://bugzilla.redhat.com/show_bug.cgi?id=1760123 My customer hit this issue. I had them apply the "hostNetwork: true" workaround to the deployment configuration of their PV snapshot-controller deployment config. In the meantime, I've also filed the following DOC FIX bug : https://bugzilla.redhat.com/show_bug.cgi?id=1760123 There isn't currently a desire to change these things; we'll revisit this as needed. I created a PR[0] that states that setting `hostNetwork: true` is necessary to use link-local addresses. Can someone verify whether my explanation is correct, or if not, what a better approach to this might be? Thanks! [0] https://github.com/openshift/openshift-docs/pull/21508 Hi Tomas, I've created a PR[0] with a docs update for this. Can you take a look? Thanks! [0] https://github.com/openshift/openshift-docs/pull/21508 Because this applies to every release, I'm re-opening this. Has anyone actually made this workaround work? I'm trying to make it work now but don't quite see how to do it. By default hostNetwork is true for the ebs-csi-node daemonset. The request to the meta-data endpoint is commming from the ebs-csi-controller pod. By default the ebs-csi-controller pod does not have hostNetwork on. I tried to set it and redeploy the helm chart for the AWS CSI driver. The controller pod gets stuck at pending because it cannot find a node with the port available. I guess it is trying to bind to the host network on port 9808...
oc describe pod ebs-csi-controller-65bfc5497-2txhl
Name: ebs-csi-controller-65bfc5497-2txhl
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: app=ebs-csi-controller
app.kubernetes.io/instance=aws-ebs-csi-driver-helm-chart-1589984717
app.kubernetes.io/name=aws-ebs-csi-driver
pod-template-hash=65bfc5497
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/ebs-csi-controller-65bfc5497
Containers:
ebs-plugin:
Image: openshift4-registry.redhatgovsa.io:5000/amazon/aws-ebs-csi-driver:v0.5.0
Port: 9808/TCP
Host Port: 9808/TCP
Args:
controller
--endpoint=$(CSI_ENDPOINT)
--logtostderr
--v=5
Liveness: http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
Environment:
CSI_ENDPOINT: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
AWS_ACCESS_KEY_ID: <set to the key 'key_id' in secret 'aws-secret'> Optional: true
AWS_SECRET_ACCESS_KEY: <set to the key 'access_key' in secret 'aws-secret'> Optional: true
AWS_REGION: us-iso-east-1
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from ebs-csi-controller-sa-token-bkpmd (ro)
csi-provisioner:
Image: openshift4-registry.redhatgovsa.io:5000/k8scsi/csi-provisioner:v1.5.0
Port: <none>
Host Port: <none>
Args:
--csi-address=$(ADDRESS)
--v=5
--feature-gates=Topology=true
--enable-leader-election
--leader-election-type=leases
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from ebs-csi-controller-sa-token-bkpmd (ro)
csi-attacher:
Image: openshift4-registry.redhatgovsa.io:5000/k8scsi/csi-attacher:v1.2.0
Port: <none>
Host Port: <none>
Args:
--csi-address=$(ADDRESS)
--v=5
--leader-election=true
--leader-election-type=leases
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from ebs-csi-controller-sa-token-bkpmd (ro)
csi-snapshotter:
Image: openshift4-registry.redhatgovsa.io:5000/k8scsi/csi-snapshotter:v2.0.1
Port: <none>
Host Port: <none>
Args:
--csi-address=$(ADDRESS)
--leader-election=true
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from ebs-csi-controller-sa-token-bkpmd (ro)
csi-resizer:
Image: openshift4-registry.redhatgovsa.io:5000/k8scsi/csi-resizer:v0.3.0
Port: <none>
Host Port: <none>
Args:
--csi-address=$(ADDRESS)
--v=5
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from ebs-csi-controller-sa-token-bkpmd (ro)
liveness-probe:
Image: openshift4-registry.redhatgovsa.io:5000/k8scsi/livenessprobe:v1.1.0
Port: <none>
Host Port: <none>
Args:
--csi-address=/csi/csi.sock
Environment: <none>
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from ebs-csi-controller-sa-token-bkpmd (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
socket-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
ebs-csi-controller-sa-token-bkpmd:
Type: Secret (a volume populated by a Secret)
SecretName: ebs-csi-controller-sa-token-bkpmd
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/6 nodes are available: 6 node(s) didn't have free ports for the requested pod ports.
Warning FailedScheduling <unknown> default-scheduler 0/6 nodes are available: 6 node(s) didn't have free ports for the requested pod ports.
OK, I made a little "progress" if you want to call it that. By enabling hostNetworking for the controller pod it cannot be scheduled due to the liveness probe on 9808. I went into the daemonset.yaml and the deployment.yaml and commented out those ports along with the liveness container inside the pod itself. I'm now able to deploy and it appears to be making requests to the meta-data endpoint. I'm in an AWS private region so it now fails because of the SSL certificates for the custom API endpoint but that is a different problem. Hi all, I've merged a PR that updates[0] the docs to reflect the current limitations from the networking side. As such, I am going to close this issue as it is assigned to the "Documentation" component. Thanks. [0] https://github.com/openshift/openshift-docs/pull/21508 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |