Bug 1766830 - Installer failed to get OCP version of nightly build when Scaling up RHEL nodes
Summary: Installer failed to get OCP version of nightly build when Scaling up RHEL nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.2.z
Assignee: Russell Teague
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-30 03:13 UTC by sheng.lao
Modified: 2019-11-26 18:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-26 18:46:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3919 0 None None None 2019-11-26 18:46:18 UTC

Description sheng.lao 2019-10-30 03:13:50 UTC
Description of problem:
QE have to test nightly build before we release OCP to customers, but openshift ansible gets the wrong OCP version.

Ansible (/usr/share/ansible/openshift-ansible/playbooks/scaleup.yml) gets the OCP version as following:

# cat openshift-ansible/roles/openshift_node/defaults/main.yml
openshift_node_packages:
  - openshift-clients{{ l_cluster_version }}
  - openshift-hyperkube{{ l_cluster_version }} 

# cat openshift-ansible/roles/openshift_node/tasks/install.yml
- name: Get cluster version
  command: >
    oc get clusterversion
  register: oc_get

- name: Set fact l_cluster_version
  set_fact:
    l_cluster_version: "-{{ oc_get.stdout | regex_search('^\\d+\\.\\d+\\.\\d+') }}"

For example when testing 4.2.0-0.nightly-2019-10-28-140411, we will get the this:
TASK [openshift_node : Install openshift packages] *****************************
Wednesday 30 October 2019  00:27:52 +0800 (0:00:00.057)       0:03:40.003 ***** 

changed: [10.0.32.5] => {"ansible_job_id": "183378387509.16194", "attempts": 1, "changed": true, "changes": {"installed": ["cri-o", "openshift-clients-4.2.0", "openshift-hyperkube-4.2.0"], 

But 4.2.0-0.nightly-2019-10-28-140411 is the same as stable v4.2.2 
# oc image info registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2019-10-28-140411 |grep BUILD_VERSION
BUILD_VERSION=v4.2.2

This will be potential a problem in the feature for QE testing.

So, QE want an ansible variable that can be assigned the OCP version.  


Version-Release number of the following components:
# rpm -q openshift-ansible
openshift-ansible-4.2.0-201910111434.git.190.85c9108.el7

# rpm -q ansible
ansible-2.8.6-1.el7ae.noarch

# ansible --version
ansible 2.8.6
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Mar 26 2019, 22:13:06) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

How reproducible:
Always

Steps to Reproduce:
1. Scale up RHEL node in OCP4
2.
3.

Actual results:
Only get the nightly version then install the wrong package.

Expected results:
Install the correct package with nightly build

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Scott Dodson 2019-10-30 18:02:58 UTC
If the nightly release reported the correct version then this would work properly. Moving to Release component, I don't think we want to special case this in openshift-ansible.

Comment 2 Johnny Liu 2019-11-04 01:01:41 UTC
Seem like some already hit similar issue - https://bugzilla.redhat.com/show_bug.cgi?id=1768098.

Comment 3 Johnny Liu 2019-11-04 07:18:36 UTC
This is blocking QE's downstream CI testing, suggest to fix it in 4.2.z.

Comment 4 Russell Teague 2019-11-05 13:54:46 UTC
FYI: Relaxing RPM version requirement to major.minor
https://github.com/openshift/openshift-ansible/pull/11997

Comment 5 Luke Meyer 2019-11-15 15:02:13 UTC
Does Russel's PR above address this then? I'm having trouble understanding the problem, but it seems related to mismatch we had (in 4.2.1/4.2.2 and again now in 4.2.5/4.2.6) between the component versions and the release version. If so, relaxing it so it always selects latest minor version RPM seems like it might handle it. If not, can you please give me some idea what we should be doing differently in the packaging?

BTW at some point we do expect to stop trying to match the component versions to the release version, and it'll be openshift-ansible-4.2-<release> or maybe openshift-ansible-4.2.20191110123-<release>.

Comment 6 Russell Teague 2019-11-15 15:07:52 UTC
With PR [1] merged, is this still an issue?

[1] https://github.com/openshift/openshift-ansible/pull/11997

Comment 7 Scott Dodson 2019-11-15 16:17:19 UTC
Please close this as a dupe of Bug 1768098 if it's confirmed to no longer be a problem.

Comment 8 Johnny Liu 2019-11-18 01:32:24 UTC
Yeah, should be the same root cause. This bug was reported prior to customer bug - Bug 1768098, so per https://bugzilla.redhat.com/show_bug.cgi?id=1768098#c6, move this bug to VERIFIED.

Comment 10 errata-xmlrpc 2019-11-26 18:46:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3919


Note You need to log in before you can comment on or make changes to this bug.