Bug 1581140
| Summary: | Should break upgrade earlier when fail to download new node packages | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | liujia <jiajliu> |
| Component: | Cluster Version Operator | Assignee: | Tim Bielawa <tbielawa> |
| Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.10.0 | CC: | aos-bugs, jokerman, mmccomas, sdodson |
| Target Milestone: | --- | ||
| Target Release: | 3.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: The default failure conditions for an upgrade task were not broad enough to catch all possible failures.
Consequence: The failed task would continue and produce a broken installation.
Fix: The scope of considered failures has been increased to include additional failure indicators.
Result: The install will fail earlier, as expected, if packages can not be downloaded for the upgrade.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-07-30 19:16:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The problem here is that the yum module is not returning a failure. I guess we need to add a failed_when that checks for "No package (.*) available" in the output. Tim do you mind taking a look at this? PR created for this https://github.com/openshift/openshift-ansible/pull/8481 Scott, the PR was against master, does this need to be backported to any branches? No, master is fine. Verified on openshift-ansible-3.10.0-0.54.0.git.0.537c485.el7.noarch
TASK [openshift_node : download new node packages] *****************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/rpm_upgrade.yml:16
Thursday 31 May 2018 08:23:27 +0000 (0:00:00.037) 0:05:59.890 **********
fatal: [qx]: FAILED! => {"attempts": 1, "changed": true, "cmd": ["yum", "install", "-y", "--downloadonly", "atomic-openshift-3.10*", "atomic-openshift-hyperkube-3.10*", "atomic-openshift-node-3.10*", "atomic-openshift-clients-3.10*"], "delta": "0:00:10.078409", "end": "2018-05-31 04:25:44.047840", "failed": true, "failed_when_result": true, "rc": 0, "start": "2018-05-31 04:25:33.969431", "stderr": "", "stderr_lines": [], "stdout": "Loaded plugins: product-id, search-disabled-repos, subscription-manager\nThis system is not registered with an entitlement server. You can use subscription-manager to register.\nNo package atomic-openshift-3.10* available.\nNo package atomic-openshift-node-3.10* available.\nNo package atomic-openshift-clients-3.10* available.\nResolving Dependencies\n--> Running transaction check\n---> Package atomic-openshift-hyperkube.x86_64 0:3.10.0-0.54.0.git.0.00a8b84.el7 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package Arch Version Repository Size\n================================================================================\nInstalling:\n atomic-openshift-hyperkube\n x86_64 3.10.0-0.54.0.git.0.00a8b84.el7 aos_addon3_10 33 M\n\nTransaction Summary\n================================================================================\nInstall 1 Package\n\nTotal download size: 33 M\nInstalled size: 229 M\nBackground downloading packages, then exiting:\nexiting because \"Download Only\" specified", "stdout_lines": ["Loaded plugins: product-id, search-disabled-repos, subscription-manager", "This system is not registered with an entitlement server. You can use subscription-manager to register.", "No package atomic-openshift-3.10* available.", "No package atomic-openshift-node-3.10* available.", "No package atomic-openshift-clients-3.10* available.", "Resolving Dependencies", "--> Running transaction check", "---> Package atomic-openshift-hyperkube.x86_64 0:3.10.0-0.54.0.git.0.00a8b84.el7 will be installed", "--> Finished Dependency Resolution", "", "Dependencies Resolved", "", "================================================================================", " Package Arch Version Repository Size", "================================================================================", "Installing:", " atomic-openshift-hyperkube", " x86_64 3.10.0-0.54.0.git.0.00a8b84.el7 aos_addon3_10 33 M", "", "Transaction Summary", "================================================================================", "Install 1 Package", "", "Total download size: 33 M", "Installed size: 229 M", "Background downloading packages, then exiting:", "exiting because \"Download Only\" specified"]}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816 |
Description of problem: Upgrade ocp v3.9 to v3.10. Upgrade failed at task[openshift_node : Wait for node to be ready] due to /usr/bin/openshift-node-config was not available, which caused by atomic-openshift-node package installed unsuccessfully for any reason(negative scenario, and upgrade fail was expected). But checked upgrade log, shows that a pre TASK [openshift_node : download new node packages] was failed earlier before some fatal/unnecessary changes happen. and installer did not catch this failure well. TASK [openshift_node : download new node packages] ***************************** task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/rpm_upgrade.yml:9 Tuesday 22 May 2018 06:45:33 +0000 (0:00:00.036) 0:08:51.121 *********** changed: [x.x.x.x] => {"attempts": 1, "changed": true, "cmd": ["yum", "install", "-y", "--downloadonly", "atomic-openshift-node-3.10*", "atomic-openshift-clients-3.10*", "PyYAML"], "delta": "0:00:05.438444", "end": "2018-05-22 02:46:45.628826", "failed": false, "rc": 0, "start": "2018-05-22 02:46:40.190382", "stderr": "", "stderr_lines": [], "stdout": "Loaded plugins: product-id, search-disabled-repos, subscription-manager\nThis system is not registered with an entitlement server. You can use subscription-manager to register.\nNo package atomic-openshift-node-3.10* available.\nNo package atomic-openshift-clients-3.10* available.\nPackage PyYAML-3.10-11.el7.x86_64 already installed and latest version\nNothing to do", "stdout_lines": ["Loaded plugins: product-id, search-disabled-repos, subscription-manager", "This system is not registered with an entitlement server. You can use subscription-manager to register.", "No package atomic-openshift-node-3.10* available.", "No package atomic-openshift-clients-3.10* available.", "Package PyYAML-3.10-11.el7.x86_64 already installed and latest version", "Nothing to do"]} Two many unnecessary/fatal changes have been made after task openshift_node : download new node packages failed, which caused original ocp can not work. /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/rpm_upgrade_install.yml:11 openshift_node : download new node packages ----------------------------- 7.46s /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/rpm_upgrade.yml:9 openshift_node : Remove old service information ------------------------- 6.52s /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/config_changes.yml:48 openshift_node : Uninstall openvswitch ---------------------------------- 5.69s /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/config_changes.yml:42 openshift_node : Configure Node settings -------------------------------- 5.27s /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/config/configure-node-settings.yml:2 openshift_node_group : create node config template ---------------------- 4.02s /usr/share/ansible/openshift-ansible/roles/openshift_node_group/tasks/create_config.yml:22 openshift_excluder : Get available excluder version --------------------- 3.99s /usr/share/ansible/openshift-ansible/roles/openshift_excluder/tasks/verify_excluder.yml:4 openshift_node : Install Node service file ------------------------------ 3.93s /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/systemd_units.yml:8 openshift_node_group : create node-config.yaml configmap ---------------- 3.92s /usr/share/ansible/openshift-ansible/roles/openshift_node_group/tasks/create_config.yml:50 Version-Release number of the following components: openshift-ansible-3.10.0-0.50.0.git.0.bd68ade.el7.noarch How reproducible: always Steps to Reproduce: 1. Do upgrade against rpm ocp with openshift_enable_openshift_excluder=false(negative scenario) 2. 3. Actual results: Two many unnecessary/fatal config have been changed before stop upgrade when check some errors in pre task. Expected results: Should break upgrade earlier when fail to download new node packages Additional info: Please attach logs from ansible-playbook with the -vvv flag