Bug 1581140 - Should break upgrade earlier when fail to download new node packages
Summary: Should break upgrade earlier when fail to download new node packages
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 3.10.0
Assignee: Tim Bielawa
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-22 08:42 UTC by liujia
Modified: 2018-07-30 19:16 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The default failure conditions for an upgrade task were not broad enough to catch all possible failures. Consequence: The failed task would continue and produce a broken installation. Fix: The scope of considered failures has been increased to include additional failure indicators. Result: The install will fail earlier, as expected, if packages can not be downloaded for the upgrade.
Clone Of:
Environment:
Last Closed: 2018-07-30 19:16:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:16:51 UTC

Description liujia 2018-05-22 08:42:09 UTC
Description of problem:
Upgrade ocp v3.9 to v3.10. Upgrade failed at task[openshift_node : Wait for node to be ready] due to /usr/bin/openshift-node-config was not available, which caused by atomic-openshift-node package installed unsuccessfully for any reason(negative scenario, and upgrade fail was expected). 

But checked upgrade log, shows that a pre TASK [openshift_node : download new node packages] was failed earlier before some fatal/unnecessary changes happen. and installer did not catch this failure well.

TASK [openshift_node : download new node packages] *****************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/rpm_upgrade.yml:9
Tuesday 22 May 2018  06:45:33 +0000 (0:00:00.036)       0:08:51.121 *********** 
changed: [x.x.x.x] => {"attempts": 1, "changed": true, "cmd": ["yum", "install", "-y", "--downloadonly", "atomic-openshift-node-3.10*", "atomic-openshift-clients-3.10*", "PyYAML"], "delta": "0:00:05.438444", "end": "2018-05-22 02:46:45.628826", "failed": false, "rc": 0, "start": "2018-05-22 02:46:40.190382", "stderr": "", "stderr_lines": [], "stdout": "Loaded plugins: product-id, search-disabled-repos, subscription-manager\nThis system is not registered with an entitlement server. You can use subscription-manager to register.\nNo package atomic-openshift-node-3.10* available.\nNo package atomic-openshift-clients-3.10* available.\nPackage PyYAML-3.10-11.el7.x86_64 already installed and latest version\nNothing to do", "stdout_lines": ["Loaded plugins: product-id, search-disabled-repos, subscription-manager", "This system is not registered with an entitlement server. You can use subscription-manager to register.", "No package atomic-openshift-node-3.10* available.", "No package atomic-openshift-clients-3.10* available.", "Package PyYAML-3.10-11.el7.x86_64 already installed and latest version", "Nothing to do"]}

Two many unnecessary/fatal changes have been made after task openshift_node : download new node packages failed, which caused original ocp can not work.

/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/rpm_upgrade_install.yml:11 
openshift_node : download new node packages ----------------------------- 7.46s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/rpm_upgrade.yml:9 
openshift_node : Remove old service information ------------------------- 6.52s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/config_changes.yml:48 
openshift_node : Uninstall openvswitch ---------------------------------- 5.69s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/config_changes.yml:42 
openshift_node : Configure Node settings -------------------------------- 5.27s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/config/configure-node-settings.yml:2 
openshift_node_group : create node config template ---------------------- 4.02s
/usr/share/ansible/openshift-ansible/roles/openshift_node_group/tasks/create_config.yml:22 
openshift_excluder : Get available excluder version --------------------- 3.99s
/usr/share/ansible/openshift-ansible/roles/openshift_excluder/tasks/verify_excluder.yml:4 
openshift_node : Install Node service file ------------------------------ 3.93s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/systemd_units.yml:8 
openshift_node_group : create node-config.yaml configmap ---------------- 3.92s
/usr/share/ansible/openshift-ansible/roles/openshift_node_group/tasks/create_config.yml:50 



Version-Release number of the following components:
openshift-ansible-3.10.0-0.50.0.git.0.bd68ade.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Do upgrade against rpm ocp with openshift_enable_openshift_excluder=false(negative scenario)
2.
3.

Actual results:
Two many unnecessary/fatal config have been changed before stop upgrade when check some errors in pre task.

Expected results:
Should break upgrade earlier when fail to download new node packages

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Scott Dodson 2018-05-22 12:03:30 UTC
The problem here is that the yum module is not returning a failure. I guess we need to add a failed_when that checks for "No package (.*) available" in the output.

Tim do you mind taking a look at this?

Comment 2 Tim Bielawa 2018-05-22 17:06:35 UTC
PR created for this https://github.com/openshift/openshift-ansible/pull/8481

Comment 3 Tim Bielawa 2018-05-23 14:56:04 UTC
Scott, the PR was against master, does this need to be backported to any branches?

Comment 4 Scott Dodson 2018-05-25 17:53:09 UTC
No, master is fine.

Comment 5 liujia 2018-05-31 08:30:39 UTC
Verified on openshift-ansible-3.10.0-0.54.0.git.0.537c485.el7.noarch

TASK [openshift_node : download new node packages] *****************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/rpm_upgrade.yml:16
Thursday 31 May 2018  08:23:27 +0000 (0:00:00.037)       0:05:59.890 ********** 
fatal: [qx]: FAILED! => {"attempts": 1, "changed": true, "cmd": ["yum", "install", "-y", "--downloadonly", "atomic-openshift-3.10*", "atomic-openshift-hyperkube-3.10*", "atomic-openshift-node-3.10*", "atomic-openshift-clients-3.10*"], "delta": "0:00:10.078409", "end": "2018-05-31 04:25:44.047840", "failed": true, "failed_when_result": true, "rc": 0, "start": "2018-05-31 04:25:33.969431", "stderr": "", "stderr_lines": [], "stdout": "Loaded plugins: product-id, search-disabled-repos, subscription-manager\nThis system is not registered with an entitlement server. You can use subscription-manager to register.\nNo package atomic-openshift-3.10* available.\nNo package atomic-openshift-node-3.10* available.\nNo package atomic-openshift-clients-3.10* available.\nResolving Dependencies\n--> Running transaction check\n---> Package atomic-openshift-hyperkube.x86_64 0:3.10.0-0.54.0.git.0.00a8b84.el7 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package             Arch   Version                         Repository     Size\n================================================================================\nInstalling:\n atomic-openshift-hyperkube\n                     x86_64 3.10.0-0.54.0.git.0.00a8b84.el7 aos_addon3_10  33 M\n\nTransaction Summary\n================================================================================\nInstall  1 Package\n\nTotal download size: 33 M\nInstalled size: 229 M\nBackground downloading packages, then exiting:\nexiting because \"Download Only\" specified", "stdout_lines": ["Loaded plugins: product-id, search-disabled-repos, subscription-manager", "This system is not registered with an entitlement server. You can use subscription-manager to register.", "No package atomic-openshift-3.10* available.", "No package atomic-openshift-node-3.10* available.", "No package atomic-openshift-clients-3.10* available.", "Resolving Dependencies", "--> Running transaction check", "---> Package atomic-openshift-hyperkube.x86_64 0:3.10.0-0.54.0.git.0.00a8b84.el7 will be installed", "--> Finished Dependency Resolution", "", "Dependencies Resolved", "", "================================================================================", " Package             Arch   Version                         Repository     Size", "================================================================================", "Installing:", " atomic-openshift-hyperkube", "                     x86_64 3.10.0-0.54.0.git.0.00a8b84.el7 aos_addon3_10  33 M", "", "Transaction Summary", "================================================================================", "Install  1 Package", "", "Total download size: 33 M", "Installed size: 229 M", "Background downloading packages, then exiting:", "exiting because \"Download Only\" specified"]}

Comment 7 errata-xmlrpc 2018-07-30 19:16:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.