| Summary: | oc cluster up is hard coded to check docker config for 172.30/16 on oc cluster up breaking many people | ||
|---|---|---|---|
| Product: | OKD | Reporter: | Grant Shipley <gshipley> |
| Component: | oc | Assignee: | Cesar Wong <cewong> |
| Status: | CLOSED NOTABUG | QA Contact: | Xingxing Xia <xxia> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 3.x | CC: | aos-bugs, bbennett, ccoleman, ffranz, gshipley, jvallejo, mmccomas, wewang, wmeng |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-10-18 14:58:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Grant Shipley
2016-10-02 13:18:48 UTC
This will most likely require a backport to 3.3.1/1.3.1. So we do pass a default CIDR for services to the network plugin here: https://github.com/openshift/origin/blob/master/pkg/cmd/server/start/network_args.go#L35 However, from what I can tell, only our own SDN pays attention to that. Grant, is this something you can reproduce at will, or it only happens sometimes? What platform? I'm not sure that it's Docker that decides what CIDR to use for services, but rather the kube proxy. It's also not clear from the code how it decides that. Adding Ben from the networking team to help out. Yes, I can reproduce this every single time. I don't know why I always get 172.17 as my docker0 and all of the docs say this as well. However, it seems most people I see doing demos have 172.30. I have tried this on 10 vm's all using fedora 24. VMs I have created are on windows and linux. I have also tried before and after a yum update with same results. My docker0 is 172.17.0.1, however, all services are created on 172.30.* This is on a RHEL machine. Ok, did a little bit more digging. I don't see how the services subnet can be other than 172.30.0.0/16 with a config created by cluster up: 1) As mentioned earlier, the default network args is initialized here: https://github.com/openshift/origin/blob/master/pkg/cmd/server/start/network_args.go#L35 2) That same value is used to initialize the ServiceSubnet in the Kubernetes master config: https://github.com/openshift/origin/blob/master/pkg/cmd/server/start/master_args.go#L451 3) Which is then used as the Kubernetes master ServiceClusterIPRange: https://github.com/openshift/origin/blob/master/pkg/cmd/server/kubernetes/master_config.go#L79 4) And used to create the Kube master IP allocator (which is what assigns services IPs): https://github.com/openshift/origin/blob/ffdeb1bb546339f62722f507e1a12bdb9701c4c2/vendor/k8s.io/kubernetes/pkg/master/master.go#L368-L386 Grant, are you actually seeing services that are not in the 172.30.0.0/16 range? or just the docker0 interface? The reason we need the --insecure-registry parameter is that the registry service (just like any other service) will be created with a cluster ip in the 172.30.0.0/16 range. Yes, if I docker exec -ti REGISTRY_POD bash and check network, it has a 172.17 address. I can repeat this every time on fedora 24 but it works on centos. I can do a bluejeans later today / this week if you want. The pod itself will not have the same IP as the service. Looking at a cluster I just brought up locally, the docker-registry pod itself has an IP of 172.17.0.5. However, the service has an IP of 172.30.179.130. The service IP is what matters when pushing/pulling images. I can confirm cewong's comment (https://bugzilla.redhat.com/show_bug.cgi?id=1381025#c7): I tried printing all available interfaces using the net package (net.Interfaces()), and looping through I see an IP of `172.17.0.1` for the "docker0" interface. When doing `oc` describe on my docker registry pod, it shows an IP address of `172.17.0.5`. My docker-registry service does also have an IP in the "172.130" range. With all of this information in mind, I am still able to do `oc cluster up` successfully after starting the docker daemon with the following options: --exec-opt native.cgroupdriver=systemd \ --insecure-registry=172.30.0.0/16 \ --insecure-registry=ci.dev.openshift.redhat.com:5000 \ --selinux-enabled &> /tmp/docker & Grant, can you confirm that you are getting service IP's outside of the 172.30.0.0/16 range? If not, I think both OpenShift and cluster up are working as designed. You are correct:
hostname -i on the docker container for the registry has a 172.17
but oc get svc shows the registry having 172.30.130.253
So it must be something else going on with fedora that I keep getting ErrImgPulls in that it can't connect the registry for pulling but can push just fine.
It works 100% of the time for me on centos and fails 100% of the time for me on F24 installs. So I think we can close this as not a bug as it must be something else.
Here is my flow:
install
sudo yum install wget docker git
uncomment in /etc/sysconfig/docker
INSECURE_REGISTRY='--insecure-registry 172.30.0.0/16'
systemctl stop firewalld
systemctl start docker
oc cluster up
when pulling an image:
------------
Oct 04 14:22:26 localhost.localdomain NetworkManager[745]: <info> [1475605346.8776] device (veth7526ee7): link connected
Oct 04 14:22:26 localhost.localdomain docker[7141]: --> Waiting up to 10m0s for pods in deployment test-1 to become ready
Oct 04 14:22:26 localhost.localdomain audit: SELINUX_ERR op=security_compute_av reason=bounds scontext=system_u:system_r:svirt_lxc_net_t:s0:c5,c6 tcontext=system_u:system_r:docker_t:s0 tclass=process perms=getattr
Oct 04 14:22:27 localhost.localdomain docker[7141]: E1004 18:22:26.966331 8297 docker_manager.go:1537] Failed to create symbolic link to the log file of pod "test-1-56xs4_myproject(7fff201b-8a5f-11e6-8747-5254008aa548)" container "POD": symlink /var/log/containers/test-1-56xs4_myproject_POD-8340d12d7464fc5c14b3586d6d69d13c51ec49ca0b8655312c360670e75f2b02.log: no such file or directory
Oct 04 14:22:27 localhost.localdomain docker[7141]: W1004 18:22:27.025940 8297 docker_manager.go:1999] Hairpin setup failed for pod "test-1-56xs4_myproject(7fff201b-8a5f-11e6-8747-5254008aa548)": open /sys/devices/virtual/net/veth7526ee7/brport/hairpin_mode: read-only file system
Oct 04 14:22:27 localhost.localdomain docker[7141]: time="2016-10-04T14:22:27.051019894-04:00" level=info msg="{Action=create, LoginUID=4294967295, PID=8297}"
Oct 04 14:22:27 localhost.localdomain audit[7141]: VIRT_CONTROL pid=7141 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:docker_t:s0 msg='vm-pid=? user=? auid=4294967295 exe=? hostname=? reason=api op=create vm=? exe="/usr/bin/docker" hostname=? addr=? terminal=? res=success'
Oct 04 14:22:27 localhost.localdomain docker[7141]: time="2016-10-04T14:22:27.099069609-04:00" level=warning msg="Error getting v2 registry: Get https://172.30.112.56:5000/v2/: http: server gave HTTP response to HTTPS client"
Oct 04 14:22:27 localhost.localdomain docker[7141]: E1004 18:22:27.146416 8297 handler.go:278] unable to get fs usage from thin pool for device 212: no cached value for usage of device 212
Oct 04 14:22:27 localhost.localdomain docker[7141]: time="2016-10-04T18:22:27.191169505Z" level=debug msg="authorizing request" go.version=go1.6.3 http.request.host="172.30.112.56:5000" http.request.id=cab2661a-ae10-4b01-a195-b04f0652729a http.request.method=GET http.request.remoteaddr="192.168.0.203:41940" http.request.uri="/v2/" http.request.useragent="docker/1.10.3 go/go1.6.3 kernel/4.5.5-300.fc24.x86_64 os/linux arch/amd64" instance.id=b5052801-6f88-40af-a943-5a54984a57a2
Oct 04 14:22:27 localhost.localdomain docker[7141]: time="2016-10-04T18:22:27.193733919Z" level=error msg="error authorizing context: authorization header required" go.version=go1.6.3 http.request.host="172.30.112.56:5000" http.request.id=cab2661a-ae10-4b01-a195-b04f0652729a http.request.method=GET http.request.remoteaddr="192.168.0.203:41940" http.request.uri="/v2/" http.request.useragent="docker/1.10.3 go/go1.6.3 kernel/4.5.5-300.fc24.x86_64 os/linux arch/amd64" instance.id=b5052801-6f88-40af-a943-5a54984a57a2
Grant, did you try running 'iptables -F' on your Fedora box? It just occurred to me you may have been running into this. For now closing this bug though, as we already have an issue for that: https://github.com/openshift/origin/issues/10139 |