Hide Forgot
Description of problem: Add option to start metrics with using preloaded "latest" images on openshift nodes as (first) option when starting openshift metrics. openshift metrics will try to download "latest" metrics images and will fail to start if they cannot be downloaded - this will happen even if "latest' metrics images are present and preloaded in advance on nodes where metrics pods are scheduled to start. Currently it does not check for presence of "latest" metrics images and tries directly to download them as first option Version-Release number of selected component (if applicable): Openshift v3.2 How reproducible: Preload "latest" images on openshift nodes and try to deploy openshift metrics with broken network Steps to Reproduce: Preload "latest" images on openshift nodes and try to deploy openshift metrics with broken network Actual results: starting openshift-metrics fails Expected results: starting openshift-metrics to succeed
This needs to be handled at the OpenShift level to allow containers to be pulled in locally if the remote registry is not acceptable.
Elvir, what pull policy are the metrics pods using? (https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L1069)
I have this issue reproduced on my system. I don't see that a policy is specified: https://github.com/openshift/origin-metrics/pull/9/files oot@mvirt2-j: ~/origin-metrics # oc describe pod metrics-deployer-3n92o Name: metrics-deployer-3n92o Namespace: openshift-infra Security Policy: anyuid Node: 192.2.11.8/192.2.11.8 Start Time: Thu, 18 Aug 2016 21:15:27 -0400 Labels: component=deployer metrics-infra=deployer provider=openshift Status: Pending IP: 10.129.6.2 Controllers: <none> Containers: deployer: Container ID: Image: registry.qe.openshift.com/openshift3/metrics-deployer:latest Image ID: Port: State: Waiting Reason: ErrImagePull Ready: False Restart Count: 0 Volume Mounts: /etc/deploy from empty (rw) /secret from secret (ro) /var/run/secrets/kubernetes.io/serviceaccount from metrics-deployer-token-icsi3 (ro) Environment Variables: PROJECT: openshift-infra (v1:metadata.namespace) POD_NAME: metrics-deployer-3n92o (v1:metadata.name) IMAGE_PREFIX: registry.qe.openshift.com/openshift3/ IMAGE_VERSION: latest MASTER_URL: https://kubernetes.default.svc:443 MODE: deploy REDEPLOY: false IGNORE_PREFLIGHT: false USE_PERSISTENT_STORAGE: false DYNAMICALLY_PROVISION_STORAGE: false HAWKULAR_METRICS_HOSTNAME: 192.2.8.32 CASSANDRA_NODES: 1 CASSANDRA_PV_SIZE: 10Gi METRIC_DURATION: 7 USER_WRITE_ACCESS: false HEAPSTER_NODE_ID: nodename METRIC_RESOLUTION: 10s Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: empty: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: secret: Type: Secret (a volume populated by a Secret) SecretName: metrics-deployer metrics-deployer-token-icsi3: Type: Secret (a volume populated by a Secret) SecretName: metrics-deployer-token-icsi3 QoS Tier: BestEffort Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 10m 10m 1 {default-scheduler } Normal Scheduled Successfully assigned metrics-deployer-3n92o to 192.2.11.8 10m 27s 43 {kubelet 192.2.11.8} spec.containers{deployer} Normal BackOff Back-off pulling image "registry.qe.openshift.com/openshift3/metrics-deployer:latest" 10m 27s 43 {kubelet 192.2.11.8} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "deployer" with ImagePullBackOff: "Back-off pulling image \"registry.qe.openshift.com/openshift3/metrics-deployer:latest\"" 10m 13s 7 {kubelet 192.2.11.8} spec.containers{deployer} Normal Pulling pulling image "registry.qe.openshift.com/openshift3/metrics-deployer:latest" 10m 13s 7 {kubelet 192.2.11.8} spec.containers{deployer} Warning Failed Failed to pull image "registry.qe.openshift.com/openshift3/metrics-deployer:latest": unable to ping registry endpoint https://registry.qe.openshift.com/v0/ v2 ping attempt failed with error: Get https://registry.qe.openshift.com/v2/: dial tcp: lookup registry.qe.openshift.com on 192.2.0.2:53: server misbehaving v1 ping attempt failed with error: Get https://registry.qe.openshift.com/v1/_ping: dial tcp: lookup registry.qe.openshift.com on 192.2.0.2:53: server misbehaving 10m 13s 7 {kubelet 192.2.11.8} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "deployer" with ErrImagePull: "unable to ping registry endpoint https://registry.qe.openshift.com/v0/\nv2 ping attempt failed with error: Get https://registry.qe.openshift.com/v2/: dial tcp: lookup registry.qe.openshift.com on 192.2.0.2:53: server misbehaving\n v1 ping attempt failed with error: Get https://registry.qe.openshift.com/v1/_ping: dial tcp: lookup registry.qe.openshift.com on 192.2.0.2:53: server misbehaving" root@mvirt2-j: ~/origin-metrics # ssh root.11.8 docker images|grep metrics registry.qe.openshift.com/openshift3/metrics-hawkular-metrics latest 3ab97c3f7395 2 weeks ago 1.663 GB registry.qe.openshift.com/openshift3/metrics-cassandra latest f460976d4f99 2 weeks ago 837.8 MB registry.qe.openshift.com/openshift3/metrics-deployer latest 91a831b58627 2 weeks ago 754.4 MB registry.qe.openshift.com/openshift3/metrics-heapster latest 179e3ed5c3b2 2 weeks ago 288.4 MB
(In reply to Michal Fojtik from comment #2) > Elvir, what pull policy are the metrics pods using? > (https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types. > go#L1069) Was on PTO, sorry for delay. pull policy for metrics pods is not present , what leads us to : If a container’s imagePullPolicy parameter is not specified, OpenShift Origin sets it based on the image’s tag: If the tag is latest, OpenShift Origin defaults imagePullPolicy to Always. from : https://docs.openshift.org/latest/dev_guide/managing_images.html#image-pull-policy Metrics will by default try to pull "latest" images for metrics pods https://github.com/openshift/origin-metrics/blob/master/metrics.yaml#L104 The particular issue I see here is, if one get these images in advance ( eg. prebuild kvm images for openstack installations, or prebuild ami instances ) on machines, then metrics will by default try to get "latest" images even if these images are already preloaded and present on machines and tagged with "latest" Intention of this bug is to change this, if specific image with tag "latest" is present on system where pod is scheduled to start - then use it as first option and do not try to pull it again. metrics at other side supports IMAGE_VERSION option ( which is default "latest" ) https://github.com/openshift/origin-metrics/blob/master/docs/deployer_configuration.adoc#deployer-template-parameters In my test case, I used latest bits with below workaround 1) get latest images before starting metrics - eg. preload them on nodes 2) tag them with new tag , eg " latest_local" 3) start metrics with --IMAGE_VERSION=latest_local
"Intention of this bug is to change this, if specific image with tag "latest" is present on system where pod is scheduled to start - then use it as first option and do not try to pull it again." No, this should not be the intention of this bug, this breaks a lot of functionality as it will never bring in the latest images as intended. If we wanted this behaviour, we would use the 'IfNotPresent' policy. But we don't want it to act this way. The issue here is more along the lines of what to do if you cannot connect to the docker registry, or there was an error connecting to it, but the images are available locally. It looks like @ekuric would like the system to be able to use the local images instead of failing.
It is common best practice to pre-pull your images, especially in the immutable vm infrastructure or "air-gapped" deployments. Operators will ensure that the images associated with the infrastructure are available locally on the vm's and only the vm's b/c they have curated and locked their infrastructure. Forcing a pull, when the operator knows what they want and have pre-baked their images for security purposes is bug. We need to support this mode of operation, as it's a best practice in large deployments.
ok, ignore my comment. We can't pre-pull "latest" images, they need to be versioned.
Openshift metrics starts fine using specific image version if it is started with --IMAGE_VERSION option. If images are preloaded on openshift nodes and if --IMAGE_VERSION is specified as startup option, then it will use image with specified IMAGE_VERSION to start metrics and use these images as first choice. Closing this BZ.