Bug 1853433 - [OSP16]FFU] To allow system_upgrade step, we need to unmount NFS / iSCSI filesystems in the overcloud nodes (and clear fstab)
Summary: [OSP16]FFU] To allow system_upgrade step, we need to unmount NFS / iSCSI file...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z1
: 16.1 (Train on RHEL 8.2)
Assignee: Giulio Fidente
QA Contact: Mike Abrams
URL:
Whiteboard:
: 1847108 1872162 (view as bug list)
Depends On:
Blocks: 1768952 1847108 1872255
TreeView+ depends on / blocked
 
Reported: 2020-07-02 16:24 UTC by Mauro Oddi
Modified: 2022-08-08 13:22 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200616081533.396affd
Doc Type: Bug Fix
Doc Text:
Before this update, the Leapp upgrade could fail if you had any NFS shares mounted. Specifically, the nodes that run the Compute service (nova) or the Image service (glance) services hung if they used an NFS mount. + With this update, before the Leapp upgrade, director unmounts `/var/lib/nova/instances`, `/var/lib/glance/images`, and any Image service staging area that you define with the `GlanceNodeStagingUri` parameter.
Clone Of:
Environment:
Last Closed: 2020-08-27 15:19:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1884926 0 None None None 2020-07-03 12:14:45 UTC
OpenStack gerrit 739219 0 None MERGED Unmount NFS shares before launching LEAPP 2021-01-20 08:12:05 UTC
Red Hat Issue Tracker OSP-10400 0 None None None 2022-08-08 12:26:54 UTC
Red Hat Product Errata RHBA-2020:3542 0 None None None 2020-08-27 15:19:28 UTC

Description Mauro Oddi 2020-07-02 16:24:23 UTC
Description of problem:
During the run of leapp in the overcloud nodes the process will be inhibited due to NFS mountpoints. 

To work around the issue one may need to unmount the share manually in the overcloud node. But it would be better approach to have the unmount automated and possible mount it after the upgrade automatically.

Version-Release number of selected component (if applicable):
16.1

How reproducible:
always

Steps to Reproduce:
1. Run the FFU process in 13 overcloud with mounted NFS
2.
3.

Actual results:
leapp upgrade inhibited

Expected results:
automatically proceed with the node upgrade




Additional info:

 - Piece of code for the step:
/usr/share/openstack-tripleo-heat-templates/deployment/tripleo-packages/tripleo-packages-baremetal-puppet.yaml:

~~~
        - name: system_upgrade_prepare step 4
          tags:
            - never
            - system_upgrade
            - system_upgrade_prepare
          when: step|int == 4
          block:
            - name: set leapp options
              shell: >
                leapp answer --section remove_pam_pkcs11_module_check.confirm=True --add                                                                    
              when: upgrade_leapp_enabled
 
            - name: run leapp upgrade (download packages)
              shell: >
                {% if upgrade_leapp_devel_skip|default(false) %}{{ upgrade_leapp_devel_skip }}{% endif %}                                                   
                leapp upgrade
                {% if upgrade_leapp_debug|default(true) %}--debug{% endif %}
                {% if upgrade_leapp_command_options|default(false) %}{{ upgrade_leapp_command_options }}{% endif %}                                         
              when: upgrade_leapp_enabled

        - name: system_upgrade_run step 4
          tags:
            - never
            - system_upgrade
            - system_upgrade_run
            # In case someone needs to re-run system_upgrade_run post-tasks
            # but doesn't want to reboot, they can run with
            # `--skip-tags system_upgrade_reboot`.
            - system_upgrade_reboot
          when:
            - step|int == 4
            - upgrade_leapp_enabled

~~~

Comment 1 Giulio Fidente 2020-07-02 17:47:43 UTC
Not sure if we can have a step which clears fstab from *every* nfs fstab entry?

Comment 2 Giulio Fidente 2020-07-03 09:59:42 UTC
Sounds like Nova/NFS would hit this issue too if ephemeral storage is configured with NFS [1]

1. https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L909

Comment 4 Giulio Fidente 2020-07-03 12:15:06 UTC
*** Bug 1847108 has been marked as a duplicate of this bug. ***

Comment 6 Erno Kuvaja 2020-07-07 16:48:00 UTC
Just remember if such NFS mount is being used as a filesystem store for Glance, The glance-api process' in that node must be drained of in flight requests and stopped first! This is very important to avoid potential data loss.

Comment 7 Giulio Fidente 2020-07-08 10:22:23 UTC
(In reply to Erno Kuvaja from comment #6)
> Just remember if such NFS mount is being used as a filesystem store for
> Glance, The glance-api process' in that node must be drained of in flight
> requests and stopped first! This is very important to avoid potential data
> loss.

ack, the proposed change tries to unmount the share when we're at a stage of the upgrade process in which immediately after the nodes baseos will be upgraded and the hardware rebooted ... there shouldn't be traffic going through the glance-api node but we don't have anything enforcing a service stop in the submission at the moment; let's continue this conversation directly in the proposed change https://review.opendev.org/739219

Comment 9 John Fulton 2020-07-14 21:22:16 UTC
https://review.opendev.org/#/c/739219/ merged in train

Comment 12 Tzach Shefi 2020-08-19 04:13:43 UTC
Verified on:
openstack-tripleo-heat-templates-11.3.2-0.20200616081539.396affd.el8ost.noarch

Successfully upgraded a Glance NFS backed OSP13 to OSP16.1.1. 
No issues were hit along the way, Glance data uploaded before upgrade is available after the upgrade. 
Uploading\consuming an additional new image after upgrade also works. 

Good to verify.

Comment 14 errata-xmlrpc 2020-08-27 15:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3542

Comment 15 John Fulton 2020-08-31 13:31:22 UTC
*** Bug 1872162 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.