| Summary: | It is taking 30+ seconds for build pods to start running | ||
|---|---|---|---|
| Product: | OpenShift Online | Reporter: | Steve Speicher <sspeiche> |
| Component: | Build | Assignee: | Stefanie Forrester <dakini> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Wang Haoran <haowang> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.x | CC: | aos-bugs, bparees, cewong, dakini, sspeiche |
| Target Milestone: | --- | Flags: | sspeiche:
needinfo+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-09-08 16:58:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Steve Speicher
2016-08-26 17:10:40 UTC
Every time a build pod is created, it is taking 20 to 30 secs to pull the builder image:
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
43s 43s 1 {default-scheduler } Normal Scheduled Successfully assigned simple-ruby-6-build to ip-172-31-9-165.ec2.internal
41s 41s 1 {kubelet ip-172-31-9-165.ec2.internal} spec.containers{sti-build} Normal Pulling pulling image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.13"
11s 11s 1 {kubelet ip-172-31-9-165.ec2.internal} spec.containers{sti-build} Normal Pulled Successfully pulled image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.13"
10s 10s 1 {kubelet ip-172-31-9-165.ec2.internal} spec.containers{sti-build} Normal Created Created container with docker id 4cfc652667fb
9s 9s 1 {kubelet ip-172-31-9-165.ec2.internal} spec.containers{sti-build} Normal Started Started container with docker id 4cfc652667fb
I don't see an imagePullPolicy of always set on the pod, but that's what it looks like is in effect.
registry.qe.openshift.com was taking a long time to respond. registry.qe.openshift.com was taking a long time to respond. May I ask why we are pulling from registry.qe.openshift.com and why the pull times should affect dev preview users? Why aren't the builder images already pre-pulled/cached on the builder nodes? We should be monitoring for this situation and kicking the registry again when it happens they are pre-pulled, it was a no-op pull that still took 20-30s to compare the layers even though they already existed. (after kicking the registry it's down to 0.5s) Just took 30s again, 2 attempts in a row :( Yup, it's slow for me as well again. What is the plan/process to get the registry slowness fixed? Is it just replacing the qe registry with the ops one? If so, when do we plan to have this rolled out or a shorter term solution. Replacing the QE registry is our best option. We were waiting until the prod release, but I can probably get it done today instead. The issue should be resolved since we're using the Ops Registry now. If not, let me know and I'll scale it up. Ack - builds starting much faster now. Averages about 6 seconds from time build is started until it is running. It would be nicer if it was less than 6, 3.3 might give us that lightning speed. Are pulls going through registry API calls or directly from S3? Got one build that took over a minute to start running, he's the build pod details
oc describe pod php-32-build
Name: php-32-build
Namespace: sspeiche-test3
Node: ip-172-31-8-200.ec2.internal/172.31.8.200
Start Time: Tue, 30 Aug 2016 08:46:42 -0400
Labels: openshift.io/build.name=php-32
Status: Running
IP: 10.1.58.9
Controllers: <none>
Containers:
sti-build:
Container ID: docker://c06e3c817e9a405765ba93f97e379e61496959fb3eb9d32cb9e453d62e5b783c
Image: registry.ops.openshift.com/openshift3/ose-sti-builder:v3.2.1.13
Image ID: docker://de13933fd4e9bdbe127dd0fdc1bac3a5a3acbe73faa96c56aaf539d93c432645
Port:
Args:
--loglevel=2
QoS Tier:
memory: Burstable
cpu: Burstable
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 60m
memory: 307Mi
State: Running
Started: Tue, 30 Aug 2016 08:46:45 -0400
Ready: True
Restart Count: 0
Environment Variables:
BUILD: {"kind":"Build","apiVersion":"v1","metadata":{"name":"php-32","namespace":"sspeiche-test3","selfLink":"/oapi/v1/namespaces/sspeiche-test3/builds/php-32","uid":"c2b394b6-6eaf-11e6-ad33-0e3d364e19a5","resourceVersion":"138275425","creationTimestamp":"2016-08-30T12:46:24Z","labels":{"app":"php","buildconfig":"php","openshift.io/build-config.name":"php"},"annotations":{"openshift.io/build-config.name":"php","openshift.io/build.number":"32"}},"spec":{"serviceAccount":"builder","source":{"type":"Git","git":{"uri":"https://github.com/openshift/cakephp-ex.git","ref":"master"},"secrets":null},"strategy":{"type":"Source","sourceStrategy":{"from":{"kind":"DockerImage","name":"registry.access.redhat.com/rhscl/php-56-rhel7:latest"}}},"output":{"to":{"kind":"DockerImage","name":"172.30.47.227:5000/sspeiche-test3/php:latest"},"pushSecret":{"name":"builder-dockercfg-y42jt"}},"resources":{},"postCommit":{}},"status":{"phase":"New","outputDockerImageReference":"172.30.47.227:5000/sspeiche-test3/php:latest","config":{"kind":"BuildConfig","namespace":"sspeiche-test3","name":"php"}}}
BUILD_LOGLEVEL: 2
SOURCE_REPOSITORY: https://github.com/openshift/cakephp-ex.git
SOURCE_URI: https://github.com/openshift/cakephp-ex.git
SOURCE_REF: master
ORIGIN_VERSION: v3.2.1.13-5-gddf7d17
ALLOWED_UIDS: 1-
DROP_CAPS: KILL,MKNOD,SETGID,SETUID,SYS_CHROOT
PUSH_DOCKERCFG_PATH: /var/run/secrets/openshift.io/push
Conditions:
Type Status
Ready True
Volumes:
docker-socket:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
builder-dockercfg-y42jt-push:
Type: Secret (a volume populated by a Secret)
SecretName: builder-dockercfg-y42jt
builder-token-8v11g:
Type: Secret (a volume populated by a Secret)
SecretName: builder-token-8v11g
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
50s 50s 1 {kubelet ip-172-31-8-200.ec2.internal} spec.containers{sti-build} Normal Pulling pulling image "registry.ops.openshift.com/openshift3/ose-sti-builder:v3.2.1.13"
49s 49s 1 {kubelet ip-172-31-8-200.ec2.internal} spec.containers{sti-build} Normal Pulled Successfully pulled image "registry.ops.openshift.com/openshift3/ose-sti-builder:v3.2.1.13"
48s 48s 1 {kubelet ip-172-31-8-200.ec2.internal} spec.containers{sti-build} Normal Created Created container with docker id c06e3c817e9a
48s 48s 1 {kubelet ip-172-31-8-200.ec2.internal} spec.containers{sti-build} Normal Started Started container with docker id c06e3c817e9a
39s 39s 1 {default-scheduler } Normal FailedScheduling Binding rejected: Timeout: request did not complete within allowed duration
This latest one doesn't look like a registry issue anymore. It looks more like an issue on the node with Docker:
39s 39s 1 {default-scheduler } Normal FailedScheduling Binding rejected: Timeout: request did not complete within allowed duration
NOC looked into this today, but they didn't see any issues with docker on the node, and they didn't see any pods in FailedScheduling. I'm unable to reproduce the issue. I tried 7 new builds in a row and they all started up within about 5-6 seconds. I didn't hit any timeouts. This is what I used: oadm new-project dakinitest oc new-app https://github.com/openshift/cakephp-ex.git -n dakinitest [root@preview-master-e69da ~]# oc get pods -n dakinitest NAME READY STATUS RESTARTS AGE cakephp-ex-1-build 1/1 Running 0 5s I looked at events across all namespaces and I don't see any mention of "timeout" in the logs. So I think the original issue is no longer occurring. Steve, can you confirm? Stefanie, I have not seen this issue since my comment #12. Perhaps the iptables sync interval change fixed it. This can be closed. |