Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1853433

Summary: [OSP16]FFU] To allow system_upgrade step, we need to unmount NFS / iSCSI filesystems in the overcloud nodes (and clear fstab)
Product: Red Hat OpenStack Reporter: Mauro Oddi <moddi>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Mike Abrams <mabrams>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: amcleod, ekuvaja, fj-lsoft-ofuku, fpantano, gfidente, johfulto, jpretori, jritenou, lbezdick, lmarsh, mabrams, mburns, pgrist, spower, tshefi
Target Milestone: z1Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200616081533.396affd Doc Type: Bug Fix
Doc Text:
Before this update, the Leapp upgrade could fail if you had any NFS shares mounted. Specifically, the nodes that run the Compute service (nova) or the Image service (glance) services hung if they used an NFS mount. + With this update, before the Leapp upgrade, director unmounts `/var/lib/nova/instances`, `/var/lib/glance/images`, and any Image service staging area that you define with the `GlanceNodeStagingUri` parameter.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-27 15:19:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1768952, 1847108, 1872255    

Description Mauro Oddi 2020-07-02 16:24:23 UTC
Description of problem:
During the run of leapp in the overcloud nodes the process will be inhibited due to NFS mountpoints. 

To work around the issue one may need to unmount the share manually in the overcloud node. But it would be better approach to have the unmount automated and possible mount it after the upgrade automatically.

Version-Release number of selected component (if applicable):
16.1

How reproducible:
always

Steps to Reproduce:
1. Run the FFU process in 13 overcloud with mounted NFS
2.
3.

Actual results:
leapp upgrade inhibited

Expected results:
automatically proceed with the node upgrade




Additional info:

 - Piece of code for the step:
/usr/share/openstack-tripleo-heat-templates/deployment/tripleo-packages/tripleo-packages-baremetal-puppet.yaml:

~~~
        - name: system_upgrade_prepare step 4
          tags:
            - never
            - system_upgrade
            - system_upgrade_prepare
          when: step|int == 4
          block:
            - name: set leapp options
              shell: >
                leapp answer --section remove_pam_pkcs11_module_check.confirm=True --add                                                                    
              when: upgrade_leapp_enabled
 
            - name: run leapp upgrade (download packages)
              shell: >
                {% if upgrade_leapp_devel_skip|default(false) %}{{ upgrade_leapp_devel_skip }}{% endif %}                                                   
                leapp upgrade
                {% if upgrade_leapp_debug|default(true) %}--debug{% endif %}
                {% if upgrade_leapp_command_options|default(false) %}{{ upgrade_leapp_command_options }}{% endif %}                                         
              when: upgrade_leapp_enabled

        - name: system_upgrade_run step 4
          tags:
            - never
            - system_upgrade
            - system_upgrade_run
            # In case someone needs to re-run system_upgrade_run post-tasks
            # but doesn't want to reboot, they can run with
            # `--skip-tags system_upgrade_reboot`.
            - system_upgrade_reboot
          when:
            - step|int == 4
            - upgrade_leapp_enabled

~~~

Comment 1 Giulio Fidente 2020-07-02 17:47:43 UTC
Not sure if we can have a step which clears fstab from *every* nfs fstab entry?

Comment 2 Giulio Fidente 2020-07-03 09:59:42 UTC
Sounds like Nova/NFS would hit this issue too if ephemeral storage is configured with NFS [1]

1. https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L909

Comment 4 Giulio Fidente 2020-07-03 12:15:06 UTC
*** Bug 1847108 has been marked as a duplicate of this bug. ***

Comment 6 Erno Kuvaja 2020-07-07 16:48:00 UTC
Just remember if such NFS mount is being used as a filesystem store for Glance, The glance-api process' in that node must be drained of in flight requests and stopped first! This is very important to avoid potential data loss.

Comment 7 Giulio Fidente 2020-07-08 10:22:23 UTC
(In reply to Erno Kuvaja from comment #6)
> Just remember if such NFS mount is being used as a filesystem store for
> Glance, The glance-api process' in that node must be drained of in flight
> requests and stopped first! This is very important to avoid potential data
> loss.

ack, the proposed change tries to unmount the share when we're at a stage of the upgrade process in which immediately after the nodes baseos will be upgraded and the hardware rebooted ... there shouldn't be traffic going through the glance-api node but we don't have anything enforcing a service stop in the submission at the moment; let's continue this conversation directly in the proposed change https://review.opendev.org/739219

Comment 9 John Fulton 2020-07-14 21:22:16 UTC
https://review.opendev.org/#/c/739219/ merged in train

Comment 12 Tzach Shefi 2020-08-19 04:13:43 UTC
Verified on:
openstack-tripleo-heat-templates-11.3.2-0.20200616081539.396affd.el8ost.noarch

Successfully upgraded a Glance NFS backed OSP13 to OSP16.1.1. 
No issues were hit along the way, Glance data uploaded before upgrade is available after the upgrade. 
Uploading\consuming an additional new image after upgrade also works. 

Good to verify.

Comment 14 errata-xmlrpc 2020-08-27 15:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3542

Comment 15 John Fulton 2020-08-31 13:31:22 UTC
*** Bug 1872162 has been marked as a duplicate of this bug. ***