Bug 2158437 - [OVN migration] Can't backup controller nodes if BACKUP_MIGRATION_IP does not belong 192.168.24.0/24
Summary: [OVN migration] Can't backup controller nodes if BACKUP_MIGRATION_IP does not...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: 17.1
Assignee: Miro Tomaska
QA Contact: Roman Safronov
URL:
Whiteboard:
Depends On:
Blocks: 2019745
TreeView+ depends on / blocked
 
Reported: 2023-01-05 12:57 UTC by Roman Safronov
Modified: 2023-09-22 17:08 UTC (History)
5 users (show)

Fixed In Version: openstack-neutron-18.6.1-1.20230518200963.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:13:10 UTC
Target Upstream Version:
Embargoed:
mtomaska: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 869613 0 None MERGED [OVN][Migration] Enable settings backup subnet for NFS clients 2023-06-01 14:47:07 UTC
Red Hat Issue Tracker OSP-21204 0 None None None 2023-01-05 13:44:49 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:13:31 UTC

Description Roman Safronov 2023-01-05 12:57:46 UTC
Description of problem:

When a user has undercloud (or any other node used for backup control plane nodes) on a subnet other than subnets set in tripleo_backup_and_restore_clients_nets (see [1]) openstack overcloud backup fails to mount nfs shares for backup.
The problem is that even if we specify BACKUP_MIGRATION_IP/backup_migration_ip it's not enough and we still need also specify corresponding net to tripleo_backup_and_restore_clients_nets. At this moment ovn migration script does not allow specifying custom values for tripleo_backup_and_restore_clients_nets.

[1] https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/backup_and_restore/defaults/main.yml#L47


Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20221130.n.1

How reproducible:
100%

Steps to Reproduce:
1. Deploy undercloud with local_ip = 192.168.25.1/24
2. In case your environment is SR-IOV make sure you renamed ControllerSriov role to Controller in order to workaround BZ2158396 (see the BZ for details)
3. Deploy overcloud with ml2ovs backend, 3 controllers and 2 compute nodes
4. Try to run ovn migration according to official documentation [1]. Make sure you enabled backup by specifying environment variables:
export CREATE_BACKUP=True
export BACKUP_MIGRATION_IP=192.168.25.1

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.0/html/testing_migration_of_the_networking_service_to_the_ml2ovn_mechanism_driver/migrating-ml2ovs-to-ovn#doc-wrapper

Actual results:
Backup of controller nodes failed complaining on mount command 'mount -v -t nfs -o rw,noatime 192.168.25.1:/ctl_plane_backups /var/tmp/rear.ooULEDUn2lGsOCY/outputfs' failed.
See full log below in additional info section.

Expected results:
Backup of controller nodes succeeded


Additional info:

from ovn_migration_ansible.log

FATAL | Create the node backup | controller-0 | error={\"changed\": true, \"cmd\": [\"rear\", \"-d\", \"-v\", \"mkbackup\"], \"delta\": \"0:00:17.861449\", \"end\": \"2023-0
1-05 01:44:30.534047\", \"msg\": \"non-zero return code\", \"rc\": -15, \"start\": \"2023-01-05 01:44:12.672598\", \"stderr\": \"ERROR: Mount command 'mount -v -t nfs -o rw,noatime 192.168.25.1:/ctl_plane_backups /var/tmp/rear.ooULEDUn2lGsOCY/outputfs' failed.\\nSome latest log messages since the last called script 060_mount_NETFS_path.sh:\\n  mount.nfs: timeout set for Thu Jan  5 01:46:13 2023\\n  mount.nfs: trying text-based options 'vers=4.2,addr=192.168.25.1,clientaddr=192.168.25.11'\\n  mount.nfs: trying text-based options 'vers=4,minorversion=1,addr=192.168.25.1,clientaddr=192.168.25.11'\\n  mount.nfs: trying text-based options 'vers=4,addr=192.168.25.1,clientaddr=192.168.25.11'\\n  mount.nfs: trying text-based options 'addr=192.168.25.1'\\n  mount.nfs: prog 100003, trying vers=3, prot=6\\n  mount.nfs: prog 100005, trying vers=3, prot=17\\n  mount.nfs: prog 100005, trying vers=3, prot=6\\nAborting due to an error, check /var/log/rear/rear-controller-0.log for details\", \"stderr_lines\": [\"ERROR: Mount command 'mount -v -t nfs -o rw,noatime 192.168.25.1:/ctl_plane_backups /var/tmp/rear.ooULEDUn2lGsOCY/outputfs' failed.\", \"Some latest log messages since the last called script 060_mount_NETFS_path.sh:\", \"  mount.nfs: timeout set for Thu Jan  5 01:46:13 2023\", \"  mount.nfs: trying text-based options 'vers=4.2,addr=192.168.25.1,clientaddr=192.168.25.11'\", \"  mount.nfs: trying text-based options 'vers=4,minorversion=1,addr=192.168.25.1,clientaddr=192.168.25.11'\", \"  mount.nfs: trying text-based options 'vers=4,addr=192.168.25.1,clientaddr=192.168.25.11'\", \"  mount.nfs: trying text-based options 'addr=192.168.25.1'\", \"  mount.nfs: prog 100003, trying vers=3, prot=6\", \"  mount.nfs: prog 100005, trying vers=3, prot=17\", \"  mount.nfs: prog 100005, trying vers=3, prot=6\", \"Aborting due to an error, check /var/log/rear/rear-controller-0.log for details\"],


(undercloud) [stack@undercloud-0 ~]$ showmount --exports
Export list for undercloud-0.redhat.local:
/ctl_plane_backups 172.16.0.0/24,10.0.0.0/24,192.168.24.0/24

cat /etc/exports
# BEGIN ANSIBLE MANAGED BLOCK /ctl_plane_backups
/ctl_plane_backups 192.168.24.0/24(rw,sync,no_root_squash,no_subtree_check)
/ctl_plane_backups 10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
/ctl_plane_backups 172.16.0.0/24(rw,sync,no_root_squash,no_subtree_check)
# END ANSIBLE MANAGED BLOCK /ctl_plane_backups

After I changed 192.168.24.0 in /etc/exports to 192.168.25.0 and restarted nfs-server by running "sudo systemctl restart nfs-server" I was able to successfully mount  192.168.25.1:/ctl_plane_backups from a controller node by running:
mount -v -t nfs -o rw,noatime 192.168.25.1:/ctl_plane_backups /tmp/my_share_name

Comment 7 Roman Safronov 2023-06-11 14:40:19 UTC
Verified on RHOS-17.1-RHEL-9-20230607.n.2 puddle with openstack-neutron-ovn-migration-tool-18.6.1-1.20230518200966.el9ost.noarch
Verified that it's possible to override default control plane CIDR(s) for backup by means of BACKUP_MIGRATION_CTL_PLANE_CIDRS environment variable
- Deployed overcloud with control plane on a custom CIDR
- Run ovn_migration.sh backup with specifyng BACKUP_MIGRATION_IP and BACKUP_MIGRATION_CTL_PLANE_CIDRS
- Confirmed the NFS server had correct NFS settings in /etc/exports
- Ansible tasks for backup completed successfully

Comment 15 errata-xmlrpc 2023-08-16 01:13:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.