Bug 1670390 - Openshift node problem detector ignores internal registries
Summary: Openshift node problem detector ignores internal registries
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.10.z
Assignee: Scott Dodson
QA Contact: Weinan Liu
URL:
Whiteboard:
Depends On: 1633504
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-29 13:23 UTC by German Montalvo
Modified: 2019-02-20 10:11 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The Node Problem detector now pulls its images from the registry defined by oreg_url variable like most other components.
Clone Of:
Environment:
Last Closed: 2019-02-20 10:11:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0328 0 None None None 2019-02-20 10:11:13 UTC

Description German Montalvo 2019-01-29 13:23:27 UTC
Description of problem:

Node Problem Detector try to pull images from registry.access.redhat.com even when it's blocked on /etc/containers/registries.conf and /etc/sysconfig/docker on an Openshift installation using only internal registry through oreg_url

Version-Release number of selected component (if applicable):

v3.10.72

How reproducible:
Install Openshift using an external registry then use playbook to deploy node-problem-detector.

You will see a pull image error on each pod once deployed

NAME                          READY     STATUS             RESTARTS   AGE
node-problem-detector-dqtwn   0/1       ImagePullBackOff   0          19h
node-problem-detector-phpw8   0/1       ImagePullBackOff   0          19h
node-problem-detector-qh249   0/1       ImagePullBackOff   0          19h
node-problem-detector-t5b6t   0/1       ImagePullBackOff   0          19h
node-problem-detector-t5p4b   0/1       ImagePullBackOff   0          19h


Steps to Reproduce:
1. Install Openshift using external registry:
# Registries
my_registry=registry.mytest.internal
oreg_url={{ my_registry }}/openshift3/ose-${component}:${version}
openshift_examples_modify_imagestreams=true

In addition, configure the following:
#Docker
openshift_docker_additional_registries={{ my_registry }}
openshift_docker_blocked_registries=["registry.access.redhat.com","docker.io"]
openshift_docker_disable_push_dockerhub=True

2. Ensure /var/containers/registries.conf have the following format:
[registries.block]
registries = ['docker.io','registry.access.redhat.com']

Also ensure /var/sysconfig/docker
BLOCK_REGISTRY='--block-registry registry.access.redhat.com --block-registry docker.io'

3. Once installed, using the same inventory, add the node-problem-detector section:
# Node Problem Detector
openshift_node_problem_detector_install=true

3. ps -ef | grep docker and look for the following:
--block-registry docker.io --block-registry registry.access.redhat.com

Actual results:
problem detector pods will be unable to be pulled because they're pointing to the wrong registry even when it should be blocked

Normal   BackOff  25m (x5130 over 20h)  kubelet, hostname1  Back-off pulling image "registry.access.redhat.com/openshift3/ose-node-problem-detector:v3.10.72"
Warning  Failed   18s (x5240 over 20h)  kubelet, hostname1  Error: ImagePullBackOff

Expected results:
Openshift will be able to understand the external registries has been blocked and look for the problem detector image into the internal ones.


Additional info:
After take a look deeper into the playbooks I found the following:
/usr/share/ansible/openshift-ansible/roles/openshift_node_problem_detector/defaults/main.yaml

penshift_node_problem_detector_image_dict:
  origin:
    prefix: "docker.io/openshift/"
    version: "{{ openshift_image_tag }}"
  openshift-enterprise:
    prefix: "registry.access.redhat.com/openshift3/ose-"
    version: "{{ openshift_image_tag }}"

openshift_node_problem_detector_image_prefix: "{{ openshift_node_problem_detector_image_dict[openshift_deployment_type]['prefix'] }}"                                                                          
openshift_node_problem_detector_image_version: "{{ openshift_node_problem_detector_image_dict[openshift_deployment_type]['version'] }}"

As you can see there, there's a hardcoded registry into the main.yml file, maybe it should be changed to oreg_url or even l_osm_registry_url


Thank you.
Regards,

Comment 2 Ben Parees 2019-01-29 14:30:52 UTC
Looks like this was fixed via an ansible installer change.  Moving to installer component.

Comment 3 Scott Dodson 2019-01-29 14:35:53 UTC
Fixed in openshift-ansible-3.11.23-1 and later, this is not fixed in 3.10 yet though there is an outstanding pull request for it.

release-3.11 https://github.com/openshift/openshift-ansible/pull/10335
release-3.10 https://github.com/openshift/openshift-ansible/pull/10336

Setting target release 3.11.z and ON_QA for verification, if your customer needs this backported to 3.10 please clone this and assign it to me.

Comment 7 Scott Dodson 2019-01-29 15:42:28 UTC
Actually, I see that https://bugzilla.redhat.com/show_bug.cgi?id=1633504 was used to track 3.11 and that followed proper process. We'll track the fix for 3.10 here.

Comment 8 German Montalvo 2019-01-30 18:06:30 UTC
Confirmed is still failing on openshift-ansible-playbooks-3.10.73-1.git.0.8b65cea.el7.noarch

Initially tested on openshift-ansible-playbooks-3.10.47-1.git.0.95bc2d2.el7_5.noarch

Comment 13 errata-xmlrpc 2019-02-20 10:11:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0328


Note You need to log in before you can comment on or make changes to this bug.