Created attachment 1116337 [details] Build log Description of problem: On a new cluster a build of the sample dancer application takes about 1 hour Version-Release number of selected component (if applicable): 3.1.0.4-1.git.15.5e061c3.el7aos.x86_64 How reproducible: Always Steps to Reproduce: 1. Spin up a new native ha cluster on AWS (m4.xlarge) 2. Setup docker registry to use S3 3. Build the sample dancer application Actual results: [0]Pluto:~ $ oc get builds -n wesleyhearn NAME TYPE FROM STATUS STARTED DURATION dancer-example-1 Source Git Complete 3 hours ago 56m25s Expected results: [0]Pluto:~ $ oc get builds -n wesleyhearn NAME TYPE FROM STATUS STARTED DURATION dancer-example-1 Source Git Complete 3 hours ago 3m25s Additional info: Attached is the build log and iostats from during the time window
Created attachment 1116338 [details] iostat output
Both the build and push seem to be taking an extremely long time. I wonder if it's got something to do with the network. Unfortunately there's nothing in the assemble script to indicate how long each step is taking. The assemble script is started at 11:18, and the container exits at 11:44 (26 min) Then it starts pushing the image at 11:44 and it finishes at 12:13 (29 min) Ben, sending your way
likely both have the same root cause but i'd suggest starting by analyzing the push side of the equation since we can at least look at the docker registry logs to see if there's anything interesting there. and i agree, network seems like the probable cause since all the assemble script is doing is pulling down dependencies. Andy can you suggest some debug steps for why the push would be taking 30 mins?
What is the MTU setting (https://docs.openshift.com/enterprise/3.1/install_config/configuring_sdn.html#configuring-the-pod-network-on-nodes)? How does it compare to the NIC's MTU?
A difference of 50. [root@ip-172-31-0-58 ~]# grep mtu /etc/origin/node/node-config.yaml mtu: 8951 [root@ip-172-31-0-58 ~]# ip a l eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP qlen 1000 link/ether 0a:20:b0:f1:d7:23 brd ff:ff:ff:ff:ff:ff inet 172.31.0.58/20 brd 172.31.15.255 scope global dynamic eth0 valid_lft 2850sec preferred_lft 2850sec inet6 fe80::820:b0ff:fef1:d723/64 scope link valid_lft forever preferred_lft forever [root@ip-172-31-0-58 ~]#
Would like the networking team to look into whether the network is performing correctly since it seems like both of these have a common denominator of being networking ops (pulling dependencies from the perl dep server, and pushing an image to the docker registry).
Is that with log level 5? If not, could you get logs with that? Also, some tcpdump output from during the pull and/or push ops would help narrow things down.
Running iperf3 server inside the registry pod and doing tests to it from a remote pod and node showed throughput of between 580Mbps and 1Gps depending on where I tested from... so the raw connectivity seems correct. I'll try a build next and see what happens from those pods...
Those builds are really slow. The download portion is the fastest step below: # oc logs dancer-mysql-example-1-build -n bb-test -f | awk '{now=strftime("%F %T%z\t");sub(/^/, now);print}' ... 2016-01-20 09:31:40-0500 --> Working on Plack 2016-01-20 09:31:51-0500 Fetching http://www.cpan.org/authors/id/M/MI/MIYAGAWA/Plack-1.0039.tar.gz ... OK 2016-01-20 09:31:51-0500 Configuring Plack-1.0039 ... OK 2016-01-20 09:31:51-0500 ==> Found dependencies: Cookie::Baker, Filesys::Notify::Simple, Hash::MultiValue, Apache::LogFormat::Compiler, Devel::StackTrace::AsHTML, HTTP::Message, Stream::Buffered, HTTP::Body, File::ShareDir, Test::TCP 2016-01-20 09:31:56-0500 --> Working on Cookie::Baker 2016-01-20 09:32:06-0500 Fetching http://www.cpan.org/authors/id/K/KA/KAZEBURO/Cookie-Baker-0.06.tar.gz ... OK 2016-01-20 09:32:07-0500 Configuring Cookie-Baker-0.06 ... OK 2016-01-20 09:32:07-0500 Building Cookie-Baker-0.06 ... OK 2016-01-20 09:32:07-0500 Successfully installed Cookie-Baker-0.06 2016-01-20 09:32:17-0500 --> Working on Filesys::Notify::Simple 2016-01-20 09:32:22-0500 Fetching http://www.cpan.org/authors/id/M/MI/MIYAGAWA/Filesys-Notify-Simple-0.12.tar.gz ... OK 2016-01-20 09:32:22-0500 Configuring Filesys-Notify-Simple-0.12 ... OK 2016-01-20 09:32:23-0500 Building Filesys-Notify-Simple-0.12 ... OK 2016-01-20 09:32:23-0500 Successfully installed Filesys-Notify-Simple-0.12 2016-01-20 09:32:33-0500 --> Working on Hash::MultiValue 2016-01-20 09:32:43-0500 Fetching http://www.cpan.org/authors/id/A/AR/ARISTOTLE/Hash-MultiValue-0.16.tar.gz ... OK 2016-01-20 09:32:43-0500 Configuring Hash-MultiValue-0.16 ... OK 2016-01-20 09:32:44-0500 Building Hash-MultiValue-0.16 ... OK 2016-01-20 09:32:44-0500 Successfully installed Hash-MultiValue-0.16 2016-01-20 09:32:49-0500 --> Working on Apache::LogFormat::Compiler 2016-01-20 09:32:59-0500 Fetching http://www.cpan.org/authors/id/K/KA/KAZEBURO/Apache-LogFormat-Compiler-0.32.tar.gz ... OK 2016-01-20 09:32:59-0500 Configuring Apache-LogFormat-Compiler-0.32 ... OK 2016-01-20 09:32:59-0500 ==> Found dependencies: POSIX::strftime::Compiler 2016-01-20 09:33:09-0500 --> Working on POSIX::strftime::Compiler 2016-01-20 09:33:15-0500 Fetching http://www.cpan.org/authors/id/K/KA/KAZEBURO/POSIX-strftime-Compiler-0.41.tar.gz ... OK 2016-01-20 09:33:15-0500 Configuring POSIX-strftime-Compiler-0.41 ... OK 2016-01-20 09:33:15-0500 Building POSIX-strftime-Compiler-0.41 ... OK 2016-01-20 09:33:15-0500 Successfully installed POSIX-strftime-Compiler-0.41 2016-01-20 09:33:16-0500 Building Apache-LogFormat-Compiler-0.32 ... OK 2016-01-20 09:33:16-0500 Successfully installed Apache-LogFormat-Compiler-0.32
Ok. The root cause turned out to be terribly slow dns performance on the pulls. Then I noticed that only one in three was slow... and then we found that an API server had stopped responding. That was also why the pulls were slow, because one in three api calls was timing out.
More details: - pods are configured to use the `kubernetes` service IP address for DNS - in this case, the `kubernetes` service is backed by 3 endpoints: the 3 masters - there is no health check on the masters, so there will always be 3 endpoints in the list, regardless of up/down status - the userspace proxy round robins each request - the proxy tries 4 times to connect to a backend endpoint, waiting .25, .5, 1, and 2 seconds respectively before trying the next endpoint. The question at hand is how can we avoid these delays? Also, what does the behavior look like with the iptables proxy?
Solly - can you investigate while Andy is out?
It seems like we should have a health check run on other masters by the active controller-manager master. A control loop could check on the other apiservers, and remove the relevant endpoints from the "default/kubernetes" service if the health checks fail.
Alternatively, an external monitoring tool could be taught how to remove the endpoints for the `default/kubernetes` service, and then if/when it restarted the master, the master would re-add itself.
I would prefer the masters check each other IF we can guarantee that masters have reachability to them.
Solly is working on this. Setting to assigned.
I've opened a PR against upstream kube (which can then be backported to Origin) which adds an optional controller loop to periodically check the health of the API servers: https://github.com/kubernetes/kubernetes/pull/20975
Hi Solly and Wesley, Can you change the subject of this bugzilla? The "Steps to Reproduce" on first comment is different from the current working issue, so it is really difficult to find this bug ticket...
*** Bug 1321093 has been marked as a duplicate of this bug. ***
The current workaround is to ensure that you have a monitoring tool (e.g. monit or equivalent) that can constantly watch the master api processes and ensure they are restarted if they ever exit.
I do not think it should be only solved through monitoring or solving kubernete bugs... Actually all resolv.conf for infra and pods should point on all available DNS instance and not to the service. It might not fully solve the issue but at least applications will continue to work. Currently this bug make Openshift unusable in prod: you simply can't maintain the masters.
*** Bug 1342933 has been marked as a duplicate of this bug. ***
We have had success getting movement on a kube 1.3 PR that will be backported to OSE 3.2.1.z between July and September.
PR in progress: https://github.com/openshift/origin/pull/9469
PR merged
The PR is in OSE3.3 now. We QE tested on openshift v3.3.0.4, kubernetes v1.3.0+57fb9ac, etcd 2.3.0+git. The bug has been fixed. Once it is ON_QA, we can verify it.
Here are test record: 1. [root@ip-172-18-15-105 ~]# oc describe endpoints kubernetes -n default Name: kubernetes Namespace: default Labels: <none> Subsets: Addresses: 172.18.15.104,172.18.15.105 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- https 443 TCP dns-tcp 8053 TCP dns 8053 UDP No events. 2. [root@ip-172-18-15-105 ~]# service atomic-openshift-master-api stop Redirecting to /bin/systemctl stop atomic-openshift-master-api.service 3. One of endpoint is removed after one api server is down. [root@ip-172-18-15-105 ~]# oc describe endpoints kubernetes -n default Name: kubernetes Namespace: default Labels: <none> Subsets: Addresses: 172.18.15.104 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- https 443 TCP dns-tcp 8053 TCP dns 8053 UDP No events. 4. One master is down won't affect pod build/deploy.(Even stop/start master during pod build/deploy) [root@ip-172-18-15-105 ~]# oc new-app --template=dancer-example -n qwang1 [root@ip-172-18-15-105 ~]# oc get build -n qwang1 NAME TYPE FROM STATUS STARTED DURATION dancer-example-1 Source Git@11c93c3 Complete 4 minutes ago 4m51s [root@ip-172-18-15-105 ~]# oc get pod -n qwang1 NAME READY STATUS RESTARTS AGE dancer-example-1-build 0/1 Completed 0 6m dancer-example-1-oddpt 1/1 Running 0 28s
*** Bug 1286513 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933