Bug 1670390

Summary: Openshift node problem detector ignores internal registries
Product: OpenShift Container Platform Reporter: German Montalvo <gmontalv>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED ERRATA QA Contact: Weinan Liu <weinliu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: acomabon, aos-bugs, jokerman, mmccomas, sdodson, wzheng
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The Node Problem detector now pulls its images from the registry defined by oreg_url variable like most other components.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-20 10:11:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1633504    
Bug Blocks:    

Description German Montalvo 2019-01-29 13:23:27 UTC
Description of problem:

Node Problem Detector try to pull images from registry.access.redhat.com even when it's blocked on /etc/containers/registries.conf and /etc/sysconfig/docker on an Openshift installation using only internal registry through oreg_url

Version-Release number of selected component (if applicable):

v3.10.72

How reproducible:
Install Openshift using an external registry then use playbook to deploy node-problem-detector.

You will see a pull image error on each pod once deployed

NAME                          READY     STATUS             RESTARTS   AGE
node-problem-detector-dqtwn   0/1       ImagePullBackOff   0          19h
node-problem-detector-phpw8   0/1       ImagePullBackOff   0          19h
node-problem-detector-qh249   0/1       ImagePullBackOff   0          19h
node-problem-detector-t5b6t   0/1       ImagePullBackOff   0          19h
node-problem-detector-t5p4b   0/1       ImagePullBackOff   0          19h


Steps to Reproduce:
1. Install Openshift using external registry:
# Registries
my_registry=registry.mytest.internal
oreg_url={{ my_registry }}/openshift3/ose-${component}:${version}
openshift_examples_modify_imagestreams=true

In addition, configure the following:
#Docker
openshift_docker_additional_registries={{ my_registry }}
openshift_docker_blocked_registries=["registry.access.redhat.com","docker.io"]
openshift_docker_disable_push_dockerhub=True

2. Ensure /var/containers/registries.conf have the following format:
[registries.block]
registries = ['docker.io','registry.access.redhat.com']

Also ensure /var/sysconfig/docker
BLOCK_REGISTRY='--block-registry registry.access.redhat.com --block-registry docker.io'

3. Once installed, using the same inventory, add the node-problem-detector section:
# Node Problem Detector
openshift_node_problem_detector_install=true

3. ps -ef | grep docker and look for the following:
--block-registry docker.io --block-registry registry.access.redhat.com

Actual results:
problem detector pods will be unable to be pulled because they're pointing to the wrong registry even when it should be blocked

Normal   BackOff  25m (x5130 over 20h)  kubelet, hostname1  Back-off pulling image "registry.access.redhat.com/openshift3/ose-node-problem-detector:v3.10.72"
Warning  Failed   18s (x5240 over 20h)  kubelet, hostname1  Error: ImagePullBackOff

Expected results:
Openshift will be able to understand the external registries has been blocked and look for the problem detector image into the internal ones.


Additional info:
After take a look deeper into the playbooks I found the following:
/usr/share/ansible/openshift-ansible/roles/openshift_node_problem_detector/defaults/main.yaml

penshift_node_problem_detector_image_dict:
  origin:
    prefix: "docker.io/openshift/"
    version: "{{ openshift_image_tag }}"
  openshift-enterprise:
    prefix: "registry.access.redhat.com/openshift3/ose-"
    version: "{{ openshift_image_tag }}"

openshift_node_problem_detector_image_prefix: "{{ openshift_node_problem_detector_image_dict[openshift_deployment_type]['prefix'] }}"                                                                          
openshift_node_problem_detector_image_version: "{{ openshift_node_problem_detector_image_dict[openshift_deployment_type]['version'] }}"

As you can see there, there's a hardcoded registry into the main.yml file, maybe it should be changed to oreg_url or even l_osm_registry_url


Thank you.
Regards,

Comment 2 Ben Parees 2019-01-29 14:30:52 UTC
Looks like this was fixed via an ansible installer change.  Moving to installer component.

Comment 3 Scott Dodson 2019-01-29 14:35:53 UTC
Fixed in openshift-ansible-3.11.23-1 and later, this is not fixed in 3.10 yet though there is an outstanding pull request for it.

release-3.11 https://github.com/openshift/openshift-ansible/pull/10335
release-3.10 https://github.com/openshift/openshift-ansible/pull/10336

Setting target release 3.11.z and ON_QA for verification, if your customer needs this backported to 3.10 please clone this and assign it to me.

Comment 7 Scott Dodson 2019-01-29 15:42:28 UTC
Actually, I see that https://bugzilla.redhat.com/show_bug.cgi?id=1633504 was used to track 3.11 and that followed proper process. We'll track the fix for 3.10 here.

Comment 8 German Montalvo 2019-01-30 18:06:30 UTC
Confirmed is still failing on openshift-ansible-playbooks-3.10.73-1.git.0.8b65cea.el7.noarch

Initially tested on openshift-ansible-playbooks-3.10.47-1.git.0.95bc2d2.el7_5.noarch

Comment 13 errata-xmlrpc 2019-02-20 10:11:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0328