Bug 1633923

Summary: openshift-autoheal fails to install in disconnected install
Product: OpenShift Container Platform Reporter: Suresh <sgaikwad>
Component: InstallerAssignee: Juan Hernández <juan.hernandez>
Status: CLOSED ERRATA QA Contact: Gaoyun Pei <gpei>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: aos-bugs, gpei, jokerman, juan.hernandez, lmeyer, mmccomas, sdodson, vrutkovs
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-20 03:10:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Suresh 2018-09-28 06:39:55 UTC
Description of problem:
When installing openshift-autoheal it fails as it tries to pull from registry.redhat.io  and doesn't consider oreg_url.

So, if the registry is different than registry.redhat.io, installation always fails.
 

Version-Release number of selected component (if applicable):
openshift-ansible-3.11.14-1.git.0.65a0c0c.el7.noarch
ansible-2.6.4-1.el7ae.noarch

How reproducible:
Everytime when we don't have access to registry.redhat.io

Steps to Reproduce:
1.
2.
3.

Actual results:
fails

Expected results:
Should succeed without any issue.

Additional info:


Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Suresh 2018-09-28 06:42:06 UTC
Below changes in the /usr/share/ansible/openshift-ansible/roles/openshift_autoheal/defaults/main.yml playbook works for openshift-enterprise. 


<snip>

l_osm_registry_url: "{{ oreg_url | default(l_osm_registry_url_default) | regex_replace('${version}' | regex_escape, openshift_image_tag | default('${version}')) }}"

#openshift_autoheal_image_dict:
#  origin:
#    prefix: "docker.io/openshift/"
#    version: v0.0.1
#  openshift-enterprise:
#    prefix: "registry.redhat.io/openshift3/ose-"
#    version: "{{ openshift_image_tag }}"
#openshift_autoheal_image_prefix: "{{ openshift_autoheal_image_dict[openshift_deployment_type]['prefix'] }}"
#openshift_autoheal_image_version: "{{ 


openshift_autoheal_image_dict[openshift_deployment_type]['version'] }}"
openshift_autoheal_image: "{{ l_osm_registry_url | regex_replace('${component}' | regex_escape, 'autoheal') }}"

</snip>

This needs to be tested with origin though.

Comment 2 Suresh 2018-09-28 07:01:08 UTC
The proper fix would be:

<snip>

[root@master-0 ~]# cat /usr/share/ansible/openshift-ansible/roles/openshift_autoheal/defaults/main.yml 
---

#
# Image name:
#

#openshift_autoheal_image_dict:
#  origin:
#    prefix: "docker.io/openshift/"
#    version: v0.0.1
#  openshift-enterprise:
#    prefix: "registry.redhat.io/openshift3/ose-"
#    version: "{{ openshift_image_tag }}"
#openshift_autoheal_image_prefix: "{{ openshift_autoheal_image_dict[openshift_deployment_type]['prefix'] }}"
#openshift_autoheal_image_version: "{{ openshift_autoheal_image_dict[openshift_deployment_type]['version'] }}"

openshift_autoheal_image: "{{ l_osm_registry_url | regex_replace('${component}' | regex_escape, 'autoheal') }}"

#
# Content of the configuration file of the auto-heal service. Note that this is
# a minimal example configuration. For more details and examples see the
# documentation of the auto-heal service:
#
#   https://github.com/openshift/autoheal
#
# In particular see the example configuration file:
#
#   https://github.com/openshift/autoheal/blob/master/autoheal.yml
#
openshift_autoheal_config: |+
  awx:
    address: "https://myawx.example.com/api"
    credentials:
      username: "autoheal"
      password: "..."
    project: "Auto-heal"
[root@master-0 ~]# 
[root@master-0 ~]# 
[root@master-0 ~]# 
[root@master-0 ~]# cat /usr/share/ansible/openshift-ansible/playbooks/openshift-autoheal/private/config.yml 
---
- name: Auto-heal Install Checkpoint Start
  hosts: all
  gather_facts: false
  tasks:
  - name: Set Auto-heal install 'In Progress'
    run_once: true
    set_stats:
      data:
        installer_phase_autoheal:
          title: "Auto-heal Install"
          playbook: "playbooks/openshift-autoheal/config.yml"
          status: "In Progress"
          start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"

- name: Auto-heal
  hosts: oo_first_master
  roles:
  - role: openshift_facts
  - role: openshift_autoheal

- name: Auto-heal Install Checkpoint End
  hosts: all
  gather_facts: false
  tasks:
  - name: Auto-heal install 'Complete'
    run_once: true
    set_stats:
      data:
        installer_phase_autoheal:
          status: "Complete"
          end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"


</snip>

  - role: openshift_facts is missing as well. Including it will be a better option.

Comment 7 Luke Meyer 2018-10-01 15:35:25 UTC
This image is introduced in 3.11 which is not yet GA. So of course you won't find it at registry.access.redhat.com yet. The bug is that the installer didn't respect oreg_url making it hard to test before release. Looks like that's fixed. The build is available internally so... CLOSED CURRENTRELEASE?

Comment 8 Scott Dodson 2018-10-02 12:11:23 UTC
The pull request from comment #3 is still relevant, we'll treat this as a 3.11.z bug.

My questions arose from the fact that there's no origin image and since the code was added during 3.10 I went to look for a 3.10 OCP image and found none.

Comment 9 Scott Dodson 2018-10-11 20:27:52 UTC
https://github.com/openshift/openshift-ansible/pull/10388 release-3.11 backport

Comment 11 Gaoyun Pei 2018-11-06 06:48:06 UTC
Verify this bug with openshift-ansible-3.11.38-1.git.0.d146f83.el7, openshift-autoheal installation respect oreg_url now.

1. Set oreg_url=host-x.x.redhat.com:5000/test/ose-${component}:${version}

2. Deploy openshift-autoheal on ocp-3.11 cluster:
ansible-playbook -v playbooks/openshift-autoheal/config.yml

3. After autoheal pod is running, check the image autoheal deployment used, it's using the registry set in oreg_url.
[root@qe-gpei-testmaster-etcd-1 ~]# oc describe pod autoheal-6ccdb8f5bf-xjv74 -n openshift-autoheal 
<--snip-->
  receiver:
    Container ID:  docker://30ea5dd6d5f0510792f13b3f84c2b9061b2bd0e580087c597056bcc131dca219
    Image:         host-x.x.redhat.com:5000/test/ose-autoheal:v3.11.39
<--snip-->



During the testing, found oauth-proxy image used for autoheal deployment is hard coded as 
image: openshift/oauth-proxy:v1.1.0

Actually we should use the oauth-proxy image shipped with openshift
openshift3/oauth-proxy:v3.11

For this issue, filed BZ#1646844 to track it.

Comment 13 errata-xmlrpc 2018-11-20 03:10:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3537