Bug 1853433
| Summary: | [OSP16]FFU] To allow system_upgrade step, we need to unmount NFS / iSCSI filesystems in the overcloud nodes (and clear fstab) | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Mauro Oddi <moddi> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Giulio Fidente <gfidente> |
| Status: | CLOSED ERRATA | QA Contact: | Mike Abrams <mabrams> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.1 (Train) | CC: | amcleod, ekuvaja, fj-lsoft-ofuku, fpantano, gfidente, johfulto, jpretori, jritenou, lbezdick, lmarsh, mabrams, mburns, pgrist, spower, tshefi |
| Target Milestone: | z1 | Keywords: | Triaged |
| Target Release: | 16.1 (Train on RHEL 8.2) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-0.20200616081533.396affd | Doc Type: | Bug Fix |
| Doc Text: |
Before this update, the Leapp upgrade could fail if you had any NFS shares mounted. Specifically, the nodes that run the Compute service (nova) or the Image service (glance) services hung if they used an NFS mount.
+
With this update, before the Leapp upgrade, director unmounts `/var/lib/nova/instances`, `/var/lib/glance/images`, and any Image service staging area that you define with the `GlanceNodeStagingUri` parameter.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-08-27 15:19:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1768952, 1847108, 1872255 | ||
Not sure if we can have a step which clears fstab from *every* nfs fstab entry? Sounds like Nova/NFS would hit this issue too if ephemeral storage is configured with NFS [1] 1. https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L909 *** Bug 1847108 has been marked as a duplicate of this bug. *** Just remember if such NFS mount is being used as a filesystem store for Glance, The glance-api process' in that node must be drained of in flight requests and stopped first! This is very important to avoid potential data loss. (In reply to Erno Kuvaja from comment #6) > Just remember if such NFS mount is being used as a filesystem store for > Glance, The glance-api process' in that node must be drained of in flight > requests and stopped first! This is very important to avoid potential data > loss. ack, the proposed change tries to unmount the share when we're at a stage of the upgrade process in which immediately after the nodes baseos will be upgraded and the hardware rebooted ... there shouldn't be traffic going through the glance-api node but we don't have anything enforcing a service stop in the submission at the moment; let's continue this conversation directly in the proposed change https://review.opendev.org/739219 https://review.opendev.org/#/c/739219/ merged in train Verified on: openstack-tripleo-heat-templates-11.3.2-0.20200616081539.396affd.el8ost.noarch Successfully upgraded a Glance NFS backed OSP13 to OSP16.1.1. No issues were hit along the way, Glance data uploaded before upgrade is available after the upgrade. Uploading\consuming an additional new image after upgrade also works. Good to verify. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3542 *** Bug 1872162 has been marked as a duplicate of this bug. *** |
Description of problem: During the run of leapp in the overcloud nodes the process will be inhibited due to NFS mountpoints. To work around the issue one may need to unmount the share manually in the overcloud node. But it would be better approach to have the unmount automated and possible mount it after the upgrade automatically. Version-Release number of selected component (if applicable): 16.1 How reproducible: always Steps to Reproduce: 1. Run the FFU process in 13 overcloud with mounted NFS 2. 3. Actual results: leapp upgrade inhibited Expected results: automatically proceed with the node upgrade Additional info: - Piece of code for the step: /usr/share/openstack-tripleo-heat-templates/deployment/tripleo-packages/tripleo-packages-baremetal-puppet.yaml: ~~~ - name: system_upgrade_prepare step 4 tags: - never - system_upgrade - system_upgrade_prepare when: step|int == 4 block: - name: set leapp options shell: > leapp answer --section remove_pam_pkcs11_module_check.confirm=True --add when: upgrade_leapp_enabled - name: run leapp upgrade (download packages) shell: > {% if upgrade_leapp_devel_skip|default(false) %}{{ upgrade_leapp_devel_skip }}{% endif %} leapp upgrade {% if upgrade_leapp_debug|default(true) %}--debug{% endif %} {% if upgrade_leapp_command_options|default(false) %}{{ upgrade_leapp_command_options }}{% endif %} when: upgrade_leapp_enabled - name: system_upgrade_run step 4 tags: - never - system_upgrade - system_upgrade_run # In case someone needs to re-run system_upgrade_run post-tasks # but doesn't want to reboot, they can run with # `--skip-tags system_upgrade_reboot`. - system_upgrade_reboot when: - step|int == 4 - upgrade_leapp_enabled ~~~