Bug 1452939
| Summary: | [3.5] Should use "imagePullPolicy: IfNotPresent" instead of "imagePullPolicy: Always" in logging and metrics deployer images | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Antonio Gallego <agallego> |
| Component: | Installer | Assignee: | Jan Wozniak <jwozniak> |
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.5.0 | CC: | aos-bugs, bmcelvee, jokerman, mmccomas, sdodson, tatanaka, xiazhao |
| Target Milestone: | --- | ||
| Target Release: | 3.5.z | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
With this bug fix, the `imagePullPolicy` for logging and metrics images is now set to `IfNotPresent` rather than `Always`, which prevents unnecessary image pulls.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-21 05:41:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Antonio Gallego
2017-05-20 18:59:23 UTC
It appears the document has changed to "v3.5". However, the customer fails to deploy metrics at disconnected environment though he follows the document. He's installing OpenShift 3.5 and pulled these images. ************************: # docker images | grep metrics registry.access.redhat.com/openshift3/metrics-hawkular-metrics 3.5.0 e0d108bd9b0c 6 weeks ago 1.27 GB registry.access.redhat.com/openshift3/metrics-hawkular-metrics v3.5 e0d108bd9b0c 6 weeks ago 1.27 GB registry.access.redhat.com/openshift3/metrics-cassandra 3.5.0 042236fd907e 6 weeks ago 540.6 MB registry.access.redhat.com/openshift3/metrics-cassandra v3.5 042236fd907e 6 weeks ago 540.6 MB registry.access.redhat.com/openshift3/metrics-heapster 3.5.0 4e29df6bda85 8 weeks ago 318.5 MB registry.access.redhat.com/openshift3/metrics-heapster v3.5 4e29df6bda85 8 weeks ago 318.5 MB registry.access.redhat.com/openshift3/metrics-deployer v3.5 f5c500d7a624 8 weeks ago 892.9 MB # ************************: Here are parameters. openshift_hosted_metrics_deploy=true openshift_hosted_metrics_storage_kind=nfs openshift_hosted_metrics_storage_access_modes=['ReadWriteOnce'] openshift_hosted_metrics_storage_host=XXX.XXX.XXX.XXX openshift_hosted_metrics_storage_nfs_directory=/metrics openshift_hosted_metrics_storage_volume_name=metrics openshift_hosted_metrics_storage_volume_size=10Gi Then he got below errors. ************************: # oc get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-2-x6tr8 1/1 Running 2 2d default registry-console-1-wbnsp 1/1 Running 1 2d default router-1-1d3wq 1/1 Running 1 2d default router-1-c4wmq 1/1 Running 1 2d default router-1-p8s0n 1/1 Running 2 2d openshift-infra hawkular-cassandra-1-d8mrk 0/1 ImagePullBackOff 0 2d openshift-infra hawkular-metrics-29tpv 0/1 ImagePullBackOff 0 2d openshift-infra heapster-2m7rc 0/1 ImagePullBackOff 0 2d # ************************: After he connected his environment to the Internet, metrics was installed successfully. After that, he found a newew image is pulled. [root@master-02-XXX ~]# docker images | grep metrics registry.access.redhat.com/openshift3/metrics-hawkular-metrics 3.5.0 b12e45828aad 4 weeks ago 1.456 GB registry.access.redhat.com/openshift3/metrics-hawkular-metrics v3.5 e0d108bd9b0c 6 weeks ago 1.27 GB registry.access.redhat.com/openshift3/metrics-cassandra 3.5.0 042236fd907e 6 weeks ago 540.6 MB registry.access.redhat.com/openshift3/metrics-cassandra v3.5 042236fd907e 6 weeks ago 540.6 MB registry.access.redhat.com/openshift3/metrics-heapster 3.5.0 4e29df6bda85 8 weeks ago 318.5 MB registry.access.redhat.com/openshift3/metrics-heapster v3.5 4e29df6bda85 8 weeks ago 318.5 MB registry.access.redhat.com/openshift3/metrics-deployer v3.5 f5c500d7a624 8 weeks ago 892.9 MB The customer wants to know the right procedure of disconnected installation and the case severity is now Sev1. Could you check the document and provide any workaround soon? The metrics template has a imagePullPolicy: Always. Is it related to this issue? Also, the customer as well as I can't set up disconnected installation right now. However, the customer's Severity is Sev1. I'll escalate this case.
# cat /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_metrics/templates/hawkular_metrics_rc.j2 | grep imagePull
imagePullPolicy: Always
(In reply to Takayoshi Tanaka from comment #2) > The metrics template has a imagePullPolicy: Always. Is it related to this > issue? Also, the customer as well as I can't set up disconnected > installation right now. However, the customer's Severity is Sev1. I'll > escalate this case. > > # cat > /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_metrics/ > templates/hawkular_metrics_rc.j2 | grep imagePull > imagePullPolicy: Always Yes, exactly -- Tested on openshift v3.5.5.31.19, the imagePullPolicy matters: 1. Firstly reproduced the original issue as follows: -- Deployed metrics stacks with "imagePullPolicy: Always" which is the default setting in template, disconnect network connection by chaning to "no-internet" security group, metrics pods turn to "ImagePullBackOff" status while the other infra pods (e.g. the router and registry pods) can still be in "running" status 2. Edit rc for each metrics pods, changing to "imagePullPolicy: IfNotPresent", then redeploy them, metrics pods become "running" 3. Edit rc for each metrics pods, changing to "imagePullPolicy: Never", then redeploy them, metrics pods become "running" Thanks a lot! I'll reply the customer. Thanks Xia! Much appreciated!! Takayoshi - please let us know the outcome with the customer and if we need to put this fix in the docs. Can I confirm one thing? Is the workaround is modifying all the imagePullPolicy in logging and metrics templates? # grep imagePullPolicy /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_metrics/templates/* /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_metrics/templates/hawkular_cassandra_rc.j2: imagePullPolicy: Always /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_metrics/templates/hawkular_metrics_rc.j2: imagePullPolicy: Always /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_metrics/templates/hawkular_openshift_agent_ds.j2: imagePullPolicy: Always /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_metrics/templates/heapster.j2: imagePullPolicy: Always # grep imagePullPolicy /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging*/*/* /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging_curator/templates/curator.j2: imagePullPolicy: Always /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging_elasticsearch/templates/es.j2: imagePullPolicy: Always /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging_fluentd/templates/fluentd.j2: imagePullPolicy: Always /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging_kibana/templates/kibana.j2: imagePullPolicy: Always /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging_kibana/templates/kibana.j2: imagePullPolicy: Always /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging_mux/templates/mux.j2: imagePullPolicy: Always /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging/templates/jks_pod.j2: imagePullPolicy: Always The customer wants Red Hat provides a patch command until we fix the issue. Is this command enough? # grep -l 'imagePullPolicy: Always' /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_metrics/templates/* | xargs sed -i.bak -e 's/imagePullPolicy: Always/imagePullPolicy: IfNotPresent/g' # grep -l 'imagePullPolicy: Always' /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging*/*/* | xargs sed -i.bak -e 's/imagePullPolicy: Always/imagePullPolicy: IfNotPresent/g' Also, I found the same is existing in OCP 3.6. Could you confirm the workaround is enough or not? Confirmed the workaround steps in comment #7 worked fine in disconnected installation env for logging & metrics deployments on openshift v3.5.5.31.19. All logging & metrics pods can be in running status after deployment: Some statictics as more details: openshift-infra hawkular-cassandra-1-rpxj9 1/1 Running 0 4m openshift-infra hawkular-metrics-7d006 1/1 Running 0 4m openshift-infra heapster-x5bgm 1/1 Running 0 4m default hawkular-openshift-agent-7jm6h 1/1 Running 0 2m logging logging-curator-1-wpv32 1/1 Running 1 31m logging logging-es-oe4fy2fg-1-kxkbh 1/1 Running 0 15m logging logging-fluentd-8x746 1/1 Running 0 15m logging logging-fluentd-z3vjz 1/1 Running 0 15m logging logging-kibana-1-fwcn5 2/2 Running 0 31m One more thing, for hawkular-openshift-agent, you have to modify the imagePullPolicy here: https://github.com/openshift/origin-metrics/blob/enterprise/hawkular-openshift-agent/hawkular-openshift-agent.yaml#L67, and make sure the hawkular-openshift-agent image ready on all the openshift nodes. The customer confirmed the workaround went fine. As the customer doesn't use hawkular-openshift-agent, the executed steps are as follows: # grep -l 'imagePullPolicy: Always' /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_metrics/templates/* | xargs sed -i.bak -e 's/imagePullPolicy: Always/imagePullPolicy: IfNotPresent/g' # grep -l 'imagePullPolicy: Always' /usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_logging*/*/* | xargs sed -i.bak -e 's/imagePullPolicy: Always/imagePullPolicy: IfNotPresent/g' Also, note that I recommended executing these two commands on all masters. It seems the first master is enough as far as the playbook, but just in case. The only recommended approach to install metrics and logging is to use ansible. If we want to have a disconnected installation process, it should be done by the playbooks and have that configure the pullpolicy. The fix is not in the latest package openshift-ansible:v3.5.132. Waiting for new pacakges @scott, 1) I think the correct errata for this bug should be https://errata.devel.redhat.com/advisory/30242, 2) The fix is in openshift-ansible-roles-3.5.134-1.git.0.e5f4029.el7.noarch. But the latest packages openshift-ansible-roles-3.5.132 is not in 30242 [1] with openshift-ansible-roles-3.5.134-1.git.0.e5f4029.el7.noarch [root@131c8e9a37a7 roles]# grep -r imagePullPolicy |grep openshift_logging openshift_logging/README.md:- Default imagePullPolicy changed from Always to IfNotPresent openshift_logging/templates/curator.j2: imagePullPolicy: IfNotPresent openshift_logging/templates/es.j2: imagePullPolicy: IfNotPresent openshift_logging/templates/fluentd.j2: imagePullPolicy: IfNotPresent openshift_logging/templates/jks_pod.j2: imagePullPolicy: IfNotPresent openshift_logging/templates/kibana.j2: imagePullPolicy: IfNotPresent openshift_logging/templates/kibana.j2: imagePullPolicy: IfNotPresent I've moved it to the next errata. Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/862f50ff66324d7d1f23fe9bedd5d9d664578302 Bug 1452939 - change Logging & Metrics imagePullPolicy - all images logging and metrics change their default imagePullPolicy from Always to IfNotPresent https://github.com/openshift/openshift-ansible/commit/f0da12b7292cddabfc7c33206cabf0ff34aa9852 Merge pull request #5700 from wozniakjan/bz_1452939 Automatic merge from submit-queue. Bug 1452939 - change imagePullPolicy in logging and metrics cc: @jcantrill Verified and pass with openshift-ansible-roles-3.5.137.
Once deployed by openshift-ansible-roles-3.5.137, the imagePullPolicy is IfNotPresent.
# oc get ds -n logging -o yaml |grep imagePullPolicy
imagePullPolicy: IfNotPresent
# oc get dc -n logging -o yaml |grep imagePullPolicy
imagePullPolicy: IfNotPresent
imagePullPolicy: IfNotPresent
imagePullPolicy: IfNotPresent
imagePullPolicy: IfNotPresent
[root@host-8-241-7 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3255 |