Bug 1980829
Summary: | problem with kernelargs after upgrade | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Maciej Relewicz <mrelewicz> |
Component: | tripleo-ansible | Assignee: | Saravanan KR <skramaja> |
Status: | CLOSED ERRATA | QA Contact: | Jason Grosso <jgrosso> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 16.1 (Train) | CC: | gregraka, hakhande, jpalanis, jpretori, lpelczyk, mburns, mgeary, omcgonag, skramaja, spower, supadhya, tmurray |
Target Milestone: | z7 | Keywords: | Reopened, Triaged, ZStream |
Target Release: | 16.1 (Train on RHEL 8.2) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | tripleo-ansible-0.5.1-1.20210713143308.el8ost | Doc Type: | Bug Fix |
Doc Text: |
Before this update, changes to `KernelArgs` parameters caused errors in the Red Hat OpenStack Platform (RHOSP) fast forward upgrade (FFU) process for version 13 to version 16:
* Duplicate entries appeared in `/etc/default/grub`.
* Duplicate entries appeared in the kernel command line.
* Nodes rebooted during the RHOSP upgrade.
+
These errors were caused when the `KernelArgs` parameter, or the order of values in the string, changed or when a `KernelArgs` parameter was added.
+
With this update, TripleO has added upgrade tasks in `kernel-boot-params-baremetal-ansible.yaml` to migrate from `TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS` to `GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS`.
+
This change was made to accommodate the Red Hat Enterprise Linux (RHEL) in-place upgrade tool, LEAPP, which is used to upgrade RHEL from version 7 to version 8, during the RHOSP version 13 to version 16 FFU process. LEAPP understands GRUB parameters only when the parameters start with `GRUB_` in `/etc/default/grub`.
+
Despite this update, you must manually inspect each `KernelArgs` value to ensure that it matches the value for all hosts in the corresponding role.
+
The `KernelArgs` value may come from the `PreNetworkConfig` implementation from either the default tripleo-heat-templates or third-party heat templates.
+
If you find any mismatches, change the value of the `KernelArgs` parameter in the corresponding role to match the value of `KernelArgs` on the hosts. Perform these checks before running the `openstack overcloud upgrade prepare` command.
+
You can use the following script to check `KernelArgs` values:
+
----
tripleo-ansible-inventory --static-yaml-inventory inventory.yaml
KernelArgs='< KernelArgs_ FROM_THT >'
ansible -i inventory.yaml ComputeSriov -m shell -b -a "cat /proc/cmdline | grep '${KernelArgs}'"
----
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-12-09 20:20:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Maciej Relewicz
2021-07-09 15:52:34 UTC
The change above comes from https://review.opendev.org/c/openstack/tripleo-heat-templates/+/745059 so perhaps the author can help to understand context here. This is the scenario that has been handled: 1) OSP 13 deployed with TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS (at this point only TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS is present) 2) Run FFU scripts 3) Check if there is an entry starts with TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS, then modify it to GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS 4) Continue upgrade Can you explain the scenario of these duplicate entries? 1) I run stack upgrade. In stack are dpdk nodes configured with kernelargs with hugepages. 2) upgrade failed, reason was different dpdk driver for rhosp16 for my nodes than in rhos13. Need to switch driver from uio_pci_generic to vif-io. 3) i changed driver and kernelargs to proper, and run prepare again to generate new configuration 4) run upgrade , failed due to duplicated entries about hugepages This is a scenario for upgrade. Imho, please fix me is i wrong, but looks like playbooks/templates by default configuring only `TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS`. So when upgrade will done, I i will try to change kernelargs I can hit the same situation? -- Maciej Had a chat with Maciej to understand the problem. 16.x templates should not have reference to TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS as it has been changed to GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS. Below is a snippet that Maciej shared from his templates where it still has reference to TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS. These old references and files are from OSP13 templates and those are not present in the OSP16 templates of the contrail. It could be a problem of copying and merging templates in the FFU preparation step which is still referring to the old grub file entries. Maciej will check and come back once the templates are fixed. This problem should not occur if the templates are used with corrected grub entry as per the OSP16 templates. IMO, this bug is not valid, let's wait for Maciej to confirm, and then we can close it. On a different note, I suggested that copying and merging of the default THT templates in the user directory lead to such a problem. There is no need to copy the templates, all the functionality can be achieved by having the user templates separated from the default THT templates. It would be good if we could take this direction for future contrail deployments so that the upgrade experience improves. ~~~~~~ (undercloud) [stack@undercloud ~]$ grep -rl TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/ tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml tripleo-heat-templates/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml (undercloud) [stack@undercloud ~]$ rpm -qa | grep -i heat-templ openstack-tripleo-heat-templates-11.3.2-1.20210408163453.el8ost.noarch tf-tripleo-heat-templates-16.2.0_dev6-bb48da7.el8.noarch (undercloud) [stack@undercloud ~]$ grep -r TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/ tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: - name: Ensure the kernel args ( {{ kernel_arg_list }} ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: regexp: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: line: 'TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" {{ kernel_arg_list }} "' tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: - name: Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: line: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"' tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: insertafter: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: - name: Ensure the kernel args ( {{ _KERNEL_ARGS_ }} ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: regexp: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: line: 'TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" {{ _KERNEL_ARGS_ }} "' tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: - name: Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: line: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"' tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: insertafter: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' tripleo-heat-templates/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml: regexp: '^(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)(.*)' tripleo-heat-templates/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml: regexp: '(.*){(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)}(.*)' ~~~~~ As per comment #4, the issues in with the merging of templates (OSP13 and OSP16). Feel free to reopen the issue if you still think it is valid after fixing the templates. Hello Team, So templates were fixed as suggested, but /etc/default/grub is still improperly generated. There are duplicated entries, where one is with grub_ prefix, while second one is not. The result is: GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 " GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}" TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 " GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}" As a result both entries are combined: [heat-admin@overcloudcy4-compdpdk-0 ~]$ cat /proc/cmdline BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.60.2.el8_2.x86_64 root=UUID=09c8d80a-5200-4137-9384-641e2d5709ee ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 In logs I could find following actions taken for dpdk node (the int at the beginning is a line number in log): 268:TASK [fix grub entries to have name start with GRUB_] ************************** 1890:TASK [tripleo-kernel : Ensure the kernel args ( intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] *** 3472:TASK [fix grub entries to have name start with GRUB_] ************************** 5141:TASK [tripleo-kernel : Ensure the kernel args ( intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] *** 6228:TASK [fix grub entries to have name start with GRUB_] ************************** 7896:TASK [tripleo-kernel : Ensure the kernel args ( intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] *** It means that entries are first fixed with grub_ prefix, but later the wrong entry is added once again, due to: [stack@undercloud ~]$ cat /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml ... # Kernel Args Configuration - block: - name: Ensure the kernel args ( {{ tripleo_kernel_args }} ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS lineinfile: dest: /etc/default/grub regexp: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' insertafter: '^GRUB_CMDLINE_LINUX.*' line: 'TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" {{ tripleo_kernel_args }} "' - name: Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter lineinfile: dest: /etc/default/grub line: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"' insertafter: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' Above one is provided by following package: [stack@undercloud ~]$ rpm -q --whatprovides /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml tripleo-ansible-0.5.1-1.20210323173506.el8ost.noarch IMHO /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml should regex-replace/insert GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS instead of TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS Yes, this patch https://review.opendev.org/c/openstack/tripleo-ansible/+/753207, which does the modification is not present in the 16.1 version, the issue is valid. I will check and update. This appears to be specific to 16.1 only, so removing the 13.x and 16.2 flags. This happens only when the kernel args are changed during the upgrade. Existing (OSP13) >>>>>>>>>>>>> " iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 " New (as part of 16.1 upgrade) >> " intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 " I took a look on https://review.opendev.org/c/openstack/tripleo-ansible/+/753207/ which contains the fixes. I see that dir is different on our environmnet (ansible -> tripleo_ansible). In case that fixed package will arrive too late for us are there any ETAs regarding porting this patch to 16.1? Packages on our lab: [stack@undercloud ~]$ rpm -qa | egrep "tripleo|openstack" | sort ansible-role-openstack-operations-0.0.1-0.20200311080930.274739e.el8ost.noarch ansible-role-tripleo-modify-image-1.2.1-1.20201114004656.1dffa21.el8ost.noarch ansible-tripleo-ipa-0.2.1-1.20210407143429.3bb3c53.el8ost.noarch ansible-tripleo-ipsec-9.3.0-1.20201113193132.0c8693c.el8ost.noarch contrail_cloud-openstack-containers-rhosp16-20210630035944.el8.x86_64 openstack-heat-agents-1.10.1-1.20201113195131.96b819c.el8ost.noarch openstack-heat-api-13.1.0-1.20210406084933.48b730a.el8ost.noarch openstack-heat-common-13.1.0-1.20210406084933.48b730a.el8ost.noarch openstack-heat-engine-13.1.0-1.20210406084933.48b730a.el8ost.noarch openstack-heat-monolith-13.1.0-1.20210406084933.48b730a.el8ost.noarch openstack-ironic-python-agent-builder-2.2.0-1.20201114043345.69e41ff.el8ost.noarch openstack-selinux-0.8.24-1.20210407093456.26243bf.el8ost.noarch openstack-tripleo-common-11.4.1-1.20210407183435.el8ost.noarch openstack-tripleo-common-containers-11.4.1-1.20210407183435.el8ost.noarch openstack-tripleo-heat-templates-11.3.2-1.20210408163453.el8ost.noarch openstack-tripleo-image-elements-10.6.2-1.20201113215051.7dc0fa1.el8ost.noarch openstack-tripleo-puppet-elements-11.2.2-1.20201114042506.f061f90.el8ost.noarch openstack-tripleo-validations-11.3.2-1.20210408103438.el8ost.noarch puppet-openstack_extras-15.4.1-1.20201113215732.371931c.el8ost.noarch puppet-openstacklib-15.4.1-1.20201113204514.5fdf43c.el8ost.noarch puppet-tripleo-11.5.0-1.20210406223722.f716ef5.el8ost.noarch python3-openstackclient-4.0.1-1.20210310102048.bff556c.el8ost.noarch python3-openstacksdk-0.36.4-1.20210310100715.76d3b29.el8ost.noarch python3-tripleoclient-12.3.2-1.20210407123431.ae58329.el8ost.noarch python3-tripleoclient-heat-installer-12.3.2-1.20210407123431.ae58329.el8ost.noarch python3-tripleo-common-11.4.1-1.20210407183435.el8ost.noarch python-openstackclient-lang-4.0.1-1.20210310102048.bff556c.el8ost.noarch tf-tripleo-heat-templates-16.2.0_dev11-ef5928d.el8.noarch tripleo-ansible-0.5.1-1.20210323173506.el8ost.noarch (In reply to Maciej Relewicz from comment #10) > I took a look on > https://review.opendev.org/c/openstack/tripleo-ansible/+/753207/ which > contains the fixes. > I see that dir is different on our environmnet (ansible -> tripleo_ansible). In 16.1, when the package is installed, the file should be at /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml > In case that fixed package will arrive too late for us are there any ETAs > regarding porting this patch to 16.1? I don't have an ETA, need to discuss with release planning to provide it. I Will update once confirmed. This issue will not happen if you retain the same kernel args during the upgrade. That's an alternative that could be locked into. Note added to release notes at the end of this section: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/release_notes/index#known_issues Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762 |