Hide Forgot
Description of problem: On preview (prod) system, whenever I start a build (either automatically when adding to project or select "start build"), the build stays in "Pending" state for over 30 seconds. This happens fairly consistently, close to 100% of the time. Version-Release number of selected component (if applicable): OpenShift Master: v3.2.1.13-5-gddf7d17 Kubernetes Master: v1.2.0-36-g4a3f9c5 https://console.preview.openshift.com/console/about How reproducible: Create a Steps to Reproduce: 1. Create a project 2. Add to project node.js builder image, v4 with sample is fine 3. Watch how long it takes to start 4. once finished, selected 'start build' and observe time for build pod to be running. Expected results: Additional info:
Every time a build pod is created, it is taking 20 to 30 secs to pull the builder image: Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 43s 43s 1 {default-scheduler } Normal Scheduled Successfully assigned simple-ruby-6-build to ip-172-31-9-165.ec2.internal 41s 41s 1 {kubelet ip-172-31-9-165.ec2.internal} spec.containers{sti-build} Normal Pulling pulling image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.13" 11s 11s 1 {kubelet ip-172-31-9-165.ec2.internal} spec.containers{sti-build} Normal Pulled Successfully pulled image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.13" 10s 10s 1 {kubelet ip-172-31-9-165.ec2.internal} spec.containers{sti-build} Normal Created Created container with docker id 4cfc652667fb 9s 9s 1 {kubelet ip-172-31-9-165.ec2.internal} spec.containers{sti-build} Normal Started Started container with docker id 4cfc652667fb I don't see an imagePullPolicy of always set on the pod, but that's what it looks like is in effect.
registry.qe.openshift.com was taking a long time to respond.
May I ask why we are pulling from registry.qe.openshift.com and why the pull times should affect dev preview users? Why aren't the builder images already pre-pulled/cached on the builder nodes? We should be monitoring for this situation and kicking the registry again when it happens
they are pre-pulled, it was a no-op pull that still took 20-30s to compare the layers even though they already existed. (after kicking the registry it's down to 0.5s)
Just took 30s again, 2 attempts in a row :(
Yup, it's slow for me as well again.
What is the plan/process to get the registry slowness fixed? Is it just replacing the qe registry with the ops one? If so, when do we plan to have this rolled out or a shorter term solution.
Replacing the QE registry is our best option. We were waiting until the prod release, but I can probably get it done today instead.
The issue should be resolved since we're using the Ops Registry now. If not, let me know and I'll scale it up.
Ack - builds starting much faster now. Averages about 6 seconds from time build is started until it is running. It would be nicer if it was less than 6, 3.3 might give us that lightning speed. Are pulls going through registry API calls or directly from S3?
Got one build that took over a minute to start running, he's the build pod details oc describe pod php-32-build Name: php-32-build Namespace: sspeiche-test3 Node: ip-172-31-8-200.ec2.internal/172.31.8.200 Start Time: Tue, 30 Aug 2016 08:46:42 -0400 Labels: openshift.io/build.name=php-32 Status: Running IP: 10.1.58.9 Controllers: <none> Containers: sti-build: Container ID: docker://c06e3c817e9a405765ba93f97e379e61496959fb3eb9d32cb9e453d62e5b783c Image: registry.ops.openshift.com/openshift3/ose-sti-builder:v3.2.1.13 Image ID: docker://de13933fd4e9bdbe127dd0fdc1bac3a5a3acbe73faa96c56aaf539d93c432645 Port: Args: --loglevel=2 QoS Tier: memory: Burstable cpu: Burstable Limits: cpu: 1 memory: 512Mi Requests: cpu: 60m memory: 307Mi State: Running Started: Tue, 30 Aug 2016 08:46:45 -0400 Ready: True Restart Count: 0 Environment Variables: BUILD: {"kind":"Build","apiVersion":"v1","metadata":{"name":"php-32","namespace":"sspeiche-test3","selfLink":"/oapi/v1/namespaces/sspeiche-test3/builds/php-32","uid":"c2b394b6-6eaf-11e6-ad33-0e3d364e19a5","resourceVersion":"138275425","creationTimestamp":"2016-08-30T12:46:24Z","labels":{"app":"php","buildconfig":"php","openshift.io/build-config.name":"php"},"annotations":{"openshift.io/build-config.name":"php","openshift.io/build.number":"32"}},"spec":{"serviceAccount":"builder","source":{"type":"Git","git":{"uri":"https://github.com/openshift/cakephp-ex.git","ref":"master"},"secrets":null},"strategy":{"type":"Source","sourceStrategy":{"from":{"kind":"DockerImage","name":"registry.access.redhat.com/rhscl/php-56-rhel7:latest"}}},"output":{"to":{"kind":"DockerImage","name":"172.30.47.227:5000/sspeiche-test3/php:latest"},"pushSecret":{"name":"builder-dockercfg-y42jt"}},"resources":{},"postCommit":{}},"status":{"phase":"New","outputDockerImageReference":"172.30.47.227:5000/sspeiche-test3/php:latest","config":{"kind":"BuildConfig","namespace":"sspeiche-test3","name":"php"}}} BUILD_LOGLEVEL: 2 SOURCE_REPOSITORY: https://github.com/openshift/cakephp-ex.git SOURCE_URI: https://github.com/openshift/cakephp-ex.git SOURCE_REF: master ORIGIN_VERSION: v3.2.1.13-5-gddf7d17 ALLOWED_UIDS: 1- DROP_CAPS: KILL,MKNOD,SETGID,SETUID,SYS_CHROOT PUSH_DOCKERCFG_PATH: /var/run/secrets/openshift.io/push Conditions: Type Status Ready True Volumes: docker-socket: Type: HostPath (bare host directory volume) Path: /var/run/docker.sock builder-dockercfg-y42jt-push: Type: Secret (a volume populated by a Secret) SecretName: builder-dockercfg-y42jt builder-token-8v11g: Type: Secret (a volume populated by a Secret) SecretName: builder-token-8v11g Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 50s 50s 1 {kubelet ip-172-31-8-200.ec2.internal} spec.containers{sti-build} Normal Pulling pulling image "registry.ops.openshift.com/openshift3/ose-sti-builder:v3.2.1.13" 49s 49s 1 {kubelet ip-172-31-8-200.ec2.internal} spec.containers{sti-build} Normal Pulled Successfully pulled image "registry.ops.openshift.com/openshift3/ose-sti-builder:v3.2.1.13" 48s 48s 1 {kubelet ip-172-31-8-200.ec2.internal} spec.containers{sti-build} Normal Created Created container with docker id c06e3c817e9a 48s 48s 1 {kubelet ip-172-31-8-200.ec2.internal} spec.containers{sti-build} Normal Started Started container with docker id c06e3c817e9a 39s 39s 1 {default-scheduler } Normal FailedScheduling Binding rejected: Timeout: request did not complete within allowed duration
This latest one doesn't look like a registry issue anymore. It looks more like an issue on the node with Docker: 39s 39s 1 {default-scheduler } Normal FailedScheduling Binding rejected: Timeout: request did not complete within allowed duration
NOC looked into this today, but they didn't see any issues with docker on the node, and they didn't see any pods in FailedScheduling.
I'm unable to reproduce the issue. I tried 7 new builds in a row and they all started up within about 5-6 seconds. I didn't hit any timeouts. This is what I used: oadm new-project dakinitest oc new-app https://github.com/openshift/cakephp-ex.git -n dakinitest [root@preview-master-e69da ~]# oc get pods -n dakinitest NAME READY STATUS RESTARTS AGE cakephp-ex-1-build 1/1 Running 0 5s I looked at events across all namespaces and I don't see any mention of "timeout" in the logs. So I think the original issue is no longer occurring. Steve, can you confirm?
Stefanie, I have not seen this issue since my comment #12. Perhaps the iptables sync interval change fixed it. This can be closed.