Bug 1854950
| Summary: | Instances can't access to their volumes during FFU OSP10->OSP13 | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Ganesh Kadam <gkadam> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Alan Bishop <abishop> |
| Status: | CLOSED ERRATA | QA Contact: | Tzach Shefi <tshefi> |
| Severity: | high | Docs Contact: | Chuck Copello <ccopello> |
| Priority: | high | ||
| Version: | 13.0 (Queens) | CC: | abishop, igallagh, mburns, nbourgeo |
| Target Milestone: | z13 | Keywords: | Triaged, ZStream |
| Target Release: | 13.0 (Queens) | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-8.4.1-63.el7ost | Doc Type: | Bug Fix |
| Doc Text: |
Before this update, instances were unable to access their volumes after upgrading from RHOSP 10 to RHOSP 13, because the NFS share being used as a backend for OpenStack Block Storage (cinder) was not unmounted before migrating the OpenStack Block Storage services from the host to the containers. Therefore, when the containerized service started up and changed the ownership of all files in
OpenStack Block Storage service directory, it also changed the ownership of files on the NFS share.
With this update, OpenStack Block Storage NFS shares are unmounted prior to upgrading the services to run in containers. This resolves the issue, and instances can now access their volumes after upgrading to RHOSP 13.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-28 18:23:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 5
Alan Bishop
2020-07-10 18:41:05 UTC
It's definitely not a netapp issue, and because the problem is specific to cinder I plan to fix it. That's why I assigned the bz to myself, and I've already started working on it. Kolla executes the recursive chown only when the top level /var/lib/cinder directory's ownership isn't cinder:kolla. Kolla should only need to execute the chown once, and so the customer shouldn't experience any more problems. Unless, of course, the customer has additional clouds that are scheduled for FFU to OSP-13. Alan, As this involves FFU which is a long and tedious process, I'd like to confirm my verification steps before I take a stab at this. My plan of action: 1. Deploy OSP10 system, with Cinder using Netapp NFS as a backend. 2. Boot an up instance or two with volumes attached, write to volumes. 3. Start FFU upgrade to OSP13, reach controller upgrade step. 4. Verify that I still have access to volumes from inside instances. 5. Complete FFU and recheck instance/volume access. Sounds easy enough, the only bit that worries me is your comment #5 -> "Under normal circumstances, there won't be any active NFS mounts inside the cinder-volume container prior to when the service starts. However, in a FFU scenario, there may be an NFS mount on the host leftover from when cinder ran on the host" Is there away I can trigger this? Should I manually create a mount on the host just to test comment#5 Thanks Sorry Tzach, I can see how that statement is concerning, but your plan of action looks fine. What I meant is that in a fully containerized deployment, at the time kolla executes the recursive chown there will not be any active NFS mounts associate with the cinder-volume service. That's because kolla hasn't started c-vol yet! That's what I meant by "under normal circumstances." Your steps 1 and 2 will create the FFU situation where there -are- NFS mounts (the ones left over from OSP-10). The fix ensures these mounts are removed during the FFU process, so they're torn down prior to kolla executing the chown. Verified on: openstack-tripleo-heat-templates-8.4.1-68.el7ost.noarch Installed an OSP10 system, Cinder backed by Netapp NFS Created two NFS backed volumes attached to two separate instances, one on each of the two compute nodes. Created FS and mounted volumes, wrote a test file on both volumes. Used watch -n command to review both text files every 5 seconds. Started FFU process (undercloud) [stack@undercloud-0 ~]$ openstack overcloud upgrade run --roles Controller --skip-tags validation .. .. PLAY RECAP ********************************************************************* controller-0 : ok=21 changed=4 unreachable=0 failed=0 controller-1 : ok=21 changed=4 unreachable=0 failed=0 controller-2 : ok=21 changed=4 unreachable=0 failed=0 Thursday 15 October 2020 08:00:29 -0400 (0:00:00.389) 0:00:34.897 ****** =============================================================================== Updated nodes - Controller Success Completed Overcloud Upgrade Run for Controller with playbooks ['upgrade_steps_playbook.yaml', 'deploy_steps_playbook.yaml', 'post_upgrade_steps_playbook.yaml'] Up till here there was no issue, both instances's volumes and files were accessible during controller upgrade. BZ verified as working properly, as before this fix volumes would disconnect which didn't happen in my case. For anyone doing this upgrade, during the undercloud upgrade I had to bump OSP10 to 13.0-RHEL-7/7.7-latest/ (2020-03-10.1) don't recall which 13Z this is. As OSP10 is RHEL7.7 and OSP13z13 is RHEL7.9, with out this temp upgrade step I had hit dependency issues. With this workaround I was able to upgrade the undercloud from OSP10 to OSP13z13(rhel7.9) and then start the overcloud upgrade. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 13.0 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4388 |