Bug 1980829 - problem with kernelargs after upgrade
Summary: problem with kernelargs after upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z7
: 16.1 (Train on RHEL 8.2)
Assignee: Saravanan KR
QA Contact: Jason Grosso
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-09 15:52 UTC by Maciej Relewicz
Modified: 2021-12-09 22:07 UTC (History)
12 users (show)

Fixed In Version: tripleo-ansible-0.5.1-1.20210713143308.el8ost
Doc Type: Bug Fix
Doc Text:
Before this update, changes to `KernelArgs` parameters caused errors in the Red Hat OpenStack Platform (RHOSP) fast forward upgrade (FFU) process for version 13 to version 16: * Duplicate entries appeared in `/etc/default/grub`. * Duplicate entries appeared in the kernel command line. * Nodes rebooted during the RHOSP upgrade. + These errors were caused when the `KernelArgs` parameter, or the order of values in the string, changed or when a `KernelArgs` parameter was added. + With this update, TripleO has added upgrade tasks in `kernel-boot-params-baremetal-ansible.yaml` to migrate from `TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS` to `GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS`. + This change was made to accommodate the Red Hat Enterprise Linux (RHEL) in-place upgrade tool, LEAPP, which is used to upgrade RHEL from version 7 to version 8, during the RHOSP version 13 to version 16 FFU process. LEAPP understands GRUB parameters only when the parameters start with `GRUB_` in `/etc/default/grub`. + Despite this update, you must manually inspect each `KernelArgs` value to ensure that it matches the value for all hosts in the corresponding role. + The `KernelArgs` value may come from the `PreNetworkConfig` implementation from either the default tripleo-heat-templates or third-party heat templates. + If you find any mismatches, change the value of the `KernelArgs` parameter in the corresponding role to match the value of `KernelArgs` on the hosts. Perform these checks before running the `openstack overcloud upgrade prepare` command. + You can use the following script to check `KernelArgs` values: + ---- tripleo-ansible-inventory --static-yaml-inventory inventory.yaml KernelArgs='< KernelArgs_ FROM_THT >' ansible -i inventory.yaml ComputeSriov -m shell -b -a "cat /proc/cmdline | grep '${KernelArgs}'" ----
Clone Of:
Environment:
Last Closed: 2021-12-09 20:20:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 753207 0 None None None 2021-08-06 11:40:06 UTC
OpenStack gerrit 775216 0 None None None 2021-08-06 11:40:06 UTC
Red Hat Issue Tracker OSP-6091 0 None None None 2021-11-18 11:34:39 UTC
Red Hat Issue Tracker UPG-3168 0 None None None 2021-08-09 14:29:58 UTC
Red Hat Product Errata RHBA-2021:3762 0 None None None 2021-12-09 20:20:45 UTC

Description Maciej Relewicz 2021-07-09 15:52:34 UTC
Description of problem:
In kernel-boot-params-baremetal-ansible.yaml is:

110           block:
111             - name: fix grub entries to have name start with GRUB_
112               replace:
113                 path: '/etc/default/grub'
114                 regexp: '^(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)(.*)'
115                 replace: 'GRUB_\1\2'
116             - name: fix grub entries in append statement
117               replace:
118                 path: '/etc/default/grub'
119                 regexp: '(.*){(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)}(.*)'
120                 replace: '\1{GRUB_\2}\3'

Before upgrade you modify entries, to add prefix (GRUB_) because of leapp upgrade prepare.

But all tht templates still modify TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS 

You can have situation where you have TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS and GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS, and /etc/default/grub will be, like this:

# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet"
GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 "
GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"
TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=8 "
GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"
GRUB_DISABLE_RECOVERY="true"
GRUB_INITRD_OVERLAY="${GRUB_INITRD_OVERLAY:+$GRUB_INITRD_OVERLAY }\$tuned_initrd"
GRUB_ENABLE_BLSCFG=true

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jesse Pretorius 2021-07-09 15:58:31 UTC
The change above comes from https://review.opendev.org/c/openstack/tripleo-heat-templates/+/745059 so perhaps the author can help to understand context here.

Comment 2 Saravanan KR 2021-07-13 04:52:01 UTC
This is the scenario that has been handled:
1) OSP 13 deployed with TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS (at this point only TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS is present)
2) Run FFU scripts
3) Check if there is an entry starts with TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS, then modify it to GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS
4) Continue upgrade

Can you explain the scenario of these duplicate entries?

Comment 3 Maciej Relewicz 2021-07-13 11:32:08 UTC
1) I run stack upgrade. In stack are dpdk nodes configured with kernelargs with hugepages.
2) upgrade failed, reason was different dpdk driver for rhosp16 for my nodes than in rhos13. Need to switch driver from uio_pci_generic to vif-io.
3) i changed driver and kernelargs to proper, and run prepare again to generate new configuration
4) run upgrade , failed due to duplicated entries about hugepages

This is a scenario for upgrade.

Imho, please fix me is i wrong, but looks like playbooks/templates by default configuring only `TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS`. So when upgrade will done, I i will try to change kernelargs I can hit the same situation?

--
Maciej

Comment 4 Saravanan KR 2021-07-14 04:35:06 UTC
Had a chat with Maciej to understand the problem. 16.x templates should not have reference to TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS as it has been changed to GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS. Below is a snippet that Maciej shared from his templates where it still has reference to TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS. These old references and files are from OSP13 templates and those are not present in the OSP16 templates of the contrail. It could be a problem of copying and merging templates in the FFU preparation step which is still referring to the old grub file entries. Maciej will check and come back once the templates are fixed.

This problem should not occur if the templates are used with corrected grub entry as per the OSP16 templates. IMO, this bug is not valid, let's wait for Maciej to confirm, and then we can close it.



On a different note, I suggested that copying and merging of the default THT templates in the user directory lead to such a problem. There is no need to copy the templates, all the functionality can be achieved by having the user templates separated from the default THT templates. It would be good if we could take this direction for future contrail deployments so that the upgrade experience improves.



~~~~~~
(undercloud) [stack@undercloud ~]$ grep -rl TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/
tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml
tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml
tripleo-heat-templates/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml
(undercloud) [stack@undercloud ~]$ rpm -qa | grep -i heat-templ
openstack-tripleo-heat-templates-11.3.2-1.20210408163453.el8ost.noarch
tf-tripleo-heat-templates-16.2.0_dev6-bb48da7.el8.noarch
(undercloud) [stack@undercloud ~]$ grep -r TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/
tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml:            - name: Ensure the kernel args ( {{ kernel_arg_list }} ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS
tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml:                regexp: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*'
tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml:                line: 'TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" {{ kernel_arg_list }} "'
tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml:            - name: Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter
tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml:                line: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"'
tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml:                insertafter: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*'
tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml:        - name: Ensure the kernel args ( {{ _KERNEL_ARGS_ }} ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS
tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml:            regexp: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*'
tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml:            line: 'TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" {{ _KERNEL_ARGS_ }} "'
tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml:        - name: Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter
tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml:            line: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"'
tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml:            insertafter: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*'
tripleo-heat-templates/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml:                regexp: '^(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)(.*)'
tripleo-heat-templates/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml:                regexp: '(.*){(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)}(.*)'
~~~~~

Comment 5 Saravanan KR 2021-07-15 03:59:36 UTC
As per comment #4, the issues in with the merging of templates (OSP13 and OSP16). Feel free to reopen the issue if you still think it is valid after fixing the templates.

Comment 6 Lukasz Pelczyk 2021-08-04 10:06:38 UTC
Hello Team,

So templates were fixed as suggested, but /etc/default/grub is still improperly generated. There are duplicated entries, where one is with grub_ prefix, while second one is not.
The result is:

GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 "
GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"
TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 "
GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"

As a result both entries are combined:
[heat-admin@overcloudcy4-compdpdk-0 ~]$ cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.60.2.el8_2.x86_64 root=UUID=09c8d80a-5200-4137-9384-641e2d5709ee ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4



In logs I could find following actions taken for dpdk node (the int at the beginning is a line number in log):
268:TASK [fix grub entries to have name start with GRUB_] **************************
1890:TASK [tripleo-kernel : Ensure the kernel args ( intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] ***
3472:TASK [fix grub entries to have name start with GRUB_] **************************
5141:TASK [tripleo-kernel : Ensure the kernel args ( intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] ***
6228:TASK [fix grub entries to have name start with GRUB_] **************************
7896:TASK [tripleo-kernel : Ensure the kernel args ( intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] ***

It means that entries are first fixed with grub_ prefix, but later the wrong entry is added once again, due to:
[stack@undercloud ~]$ cat /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml
...
# Kernel Args Configuration
- block:
    - name: Ensure the kernel args ( {{ tripleo_kernel_args }} ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS
      lineinfile:
        dest: /etc/default/grub
        regexp: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*'
        insertafter: '^GRUB_CMDLINE_LINUX.*'
        line: 'TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" {{ tripleo_kernel_args }} "'
    - name: Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter
      lineinfile:
        dest: /etc/default/grub
        line: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"'
        insertafter: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*'

Above one is provided by following package:
[stack@undercloud ~]$ rpm -q --whatprovides /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml
tripleo-ansible-0.5.1-1.20210323173506.el8ost.noarch

IMHO /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml should regex-replace/insert GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS instead of TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS

Comment 7 Saravanan KR 2021-08-04 10:19:16 UTC
Yes, this patch https://review.opendev.org/c/openstack/tripleo-ansible/+/753207, which does the modification is not present in the 16.1 version, the issue is valid. I will check and update.

Comment 8 Jesse Pretorius 2021-08-04 10:37:37 UTC
This appears to be specific to 16.1 only, so removing the 13.x and 16.2 flags.

Comment 9 Saravanan KR 2021-08-04 10:38:25 UTC
This happens only when the kernel args are changed during the upgrade.

Existing (OSP13)   >>>>>>>>>>>>> " iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 "
New (as part of 16.1 upgrade) >> " intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 "

Comment 10 Maciej Relewicz 2021-08-05 11:49:22 UTC
I took a look on https://review.opendev.org/c/openstack/tripleo-ansible/+/753207/ which contains the fixes.
I see that dir is different on our environmnet (ansible -> tripleo_ansible). 


In case that fixed package will arrive too late for us are there any ETAs regarding porting this patch to 16.1?

Packages on our lab:

[stack@undercloud ~]$ rpm -qa | egrep "tripleo|openstack" | sort
ansible-role-openstack-operations-0.0.1-0.20200311080930.274739e.el8ost.noarch
ansible-role-tripleo-modify-image-1.2.1-1.20201114004656.1dffa21.el8ost.noarch
ansible-tripleo-ipa-0.2.1-1.20210407143429.3bb3c53.el8ost.noarch
ansible-tripleo-ipsec-9.3.0-1.20201113193132.0c8693c.el8ost.noarch
contrail_cloud-openstack-containers-rhosp16-20210630035944.el8.x86_64
openstack-heat-agents-1.10.1-1.20201113195131.96b819c.el8ost.noarch
openstack-heat-api-13.1.0-1.20210406084933.48b730a.el8ost.noarch
openstack-heat-common-13.1.0-1.20210406084933.48b730a.el8ost.noarch
openstack-heat-engine-13.1.0-1.20210406084933.48b730a.el8ost.noarch
openstack-heat-monolith-13.1.0-1.20210406084933.48b730a.el8ost.noarch
openstack-ironic-python-agent-builder-2.2.0-1.20201114043345.69e41ff.el8ost.noarch
openstack-selinux-0.8.24-1.20210407093456.26243bf.el8ost.noarch
openstack-tripleo-common-11.4.1-1.20210407183435.el8ost.noarch
openstack-tripleo-common-containers-11.4.1-1.20210407183435.el8ost.noarch
openstack-tripleo-heat-templates-11.3.2-1.20210408163453.el8ost.noarch
openstack-tripleo-image-elements-10.6.2-1.20201113215051.7dc0fa1.el8ost.noarch
openstack-tripleo-puppet-elements-11.2.2-1.20201114042506.f061f90.el8ost.noarch
openstack-tripleo-validations-11.3.2-1.20210408103438.el8ost.noarch
puppet-openstack_extras-15.4.1-1.20201113215732.371931c.el8ost.noarch
puppet-openstacklib-15.4.1-1.20201113204514.5fdf43c.el8ost.noarch
puppet-tripleo-11.5.0-1.20210406223722.f716ef5.el8ost.noarch
python3-openstackclient-4.0.1-1.20210310102048.bff556c.el8ost.noarch
python3-openstacksdk-0.36.4-1.20210310100715.76d3b29.el8ost.noarch
python3-tripleoclient-12.3.2-1.20210407123431.ae58329.el8ost.noarch
python3-tripleoclient-heat-installer-12.3.2-1.20210407123431.ae58329.el8ost.noarch
python3-tripleo-common-11.4.1-1.20210407183435.el8ost.noarch
python-openstackclient-lang-4.0.1-1.20210310102048.bff556c.el8ost.noarch
tf-tripleo-heat-templates-16.2.0_dev11-ef5928d.el8.noarch
tripleo-ansible-0.5.1-1.20210323173506.el8ost.noarch

Comment 11 Saravanan KR 2021-08-05 12:25:53 UTC
(In reply to Maciej Relewicz from comment #10)
> I took a look on
> https://review.opendev.org/c/openstack/tripleo-ansible/+/753207/ which
> contains the fixes.
> I see that dir is different on our environmnet (ansible -> tripleo_ansible). 

In 16.1, when the package is installed, the file should be at /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml

> In case that fixed package will arrive too late for us are there any ETAs
> regarding porting this patch to 16.1?

I don't have an ETA, need to discuss with release planning to provide it. I Will update once confirmed.

This issue will not happen if you retain the same kernel args during the upgrade. That's an alternative that could be locked into.

Comment 14 mgeary 2021-08-10 16:55:58 UTC
Note added to release notes at the end of this section:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/release_notes/index#known_issues

Comment 50 errata-xmlrpc 2021-12-09 20:20:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762


Note You need to log in before you can comment on or make changes to this bug.