Description of problem: In kernel-boot-params-baremetal-ansible.yaml is: 110 block: 111 - name: fix grub entries to have name start with GRUB_ 112 replace: 113 path: '/etc/default/grub' 114 regexp: '^(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)(.*)' 115 replace: 'GRUB_\1\2' 116 - name: fix grub entries in append statement 117 replace: 118 path: '/etc/default/grub' 119 regexp: '(.*){(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)}(.*)' 120 replace: '\1{GRUB_\2}\3' Before upgrade you modify entries, to add prefix (GRUB_) because of leapp upgrade prepare. But all tht templates still modify TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS You can have situation where you have TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS and GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS, and /etc/default/grub will be, like this: # cat /etc/default/grub GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet" GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 " GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}" TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=8 " GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}" GRUB_DISABLE_RECOVERY="true" GRUB_INITRD_OVERLAY="${GRUB_INITRD_OVERLAY:+$GRUB_INITRD_OVERLAY }\$tuned_initrd" GRUB_ENABLE_BLSCFG=true Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The change above comes from https://review.opendev.org/c/openstack/tripleo-heat-templates/+/745059 so perhaps the author can help to understand context here.
This is the scenario that has been handled: 1) OSP 13 deployed with TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS (at this point only TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS is present) 2) Run FFU scripts 3) Check if there is an entry starts with TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS, then modify it to GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS 4) Continue upgrade Can you explain the scenario of these duplicate entries?
1) I run stack upgrade. In stack are dpdk nodes configured with kernelargs with hugepages. 2) upgrade failed, reason was different dpdk driver for rhosp16 for my nodes than in rhos13. Need to switch driver from uio_pci_generic to vif-io. 3) i changed driver and kernelargs to proper, and run prepare again to generate new configuration 4) run upgrade , failed due to duplicated entries about hugepages This is a scenario for upgrade. Imho, please fix me is i wrong, but looks like playbooks/templates by default configuring only `TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS`. So when upgrade will done, I i will try to change kernelargs I can hit the same situation? -- Maciej
Had a chat with Maciej to understand the problem. 16.x templates should not have reference to TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS as it has been changed to GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS. Below is a snippet that Maciej shared from his templates where it still has reference to TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS. These old references and files are from OSP13 templates and those are not present in the OSP16 templates of the contrail. It could be a problem of copying and merging templates in the FFU preparation step which is still referring to the old grub file entries. Maciej will check and come back once the templates are fixed. This problem should not occur if the templates are used with corrected grub entry as per the OSP16 templates. IMO, this bug is not valid, let's wait for Maciej to confirm, and then we can close it. On a different note, I suggested that copying and merging of the default THT templates in the user directory lead to such a problem. There is no need to copy the templates, all the functionality can be achieved by having the user templates separated from the default THT templates. It would be good if we could take this direction for future contrail deployments so that the upgrade experience improves. ~~~~~~ (undercloud) [stack@undercloud ~]$ grep -rl TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/ tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml tripleo-heat-templates/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml (undercloud) [stack@undercloud ~]$ rpm -qa | grep -i heat-templ openstack-tripleo-heat-templates-11.3.2-1.20210408163453.el8ost.noarch tf-tripleo-heat-templates-16.2.0_dev6-bb48da7.el8.noarch (undercloud) [stack@undercloud ~]$ grep -r TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/ tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: - name: Ensure the kernel args ( {{ kernel_arg_list }} ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: regexp: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: line: 'TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" {{ kernel_arg_list }} "' tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: - name: Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: line: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"' tripleo-heat-templates/extraconfig/pre_network/contrail/contrail_ansible_kernel_config.yaml: insertafter: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: - name: Ensure the kernel args ( {{ _KERNEL_ARGS_ }} ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: regexp: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: line: 'TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" {{ _KERNEL_ARGS_ }} "' tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: - name: Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: line: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"' tripleo-heat-templates/extraconfig/pre_network/boot_param_tasks.yaml: insertafter: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' tripleo-heat-templates/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml: regexp: '^(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)(.*)' tripleo-heat-templates/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml: regexp: '(.*){(TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS)}(.*)' ~~~~~
As per comment #4, the issues in with the merging of templates (OSP13 and OSP16). Feel free to reopen the issue if you still think it is valid after fixing the templates.
Hello Team, So templates were fixed as suggested, but /etc/default/grub is still improperly generated. There are duplicated entries, where one is with grub_ prefix, while second one is not. The result is: GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 " GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}" TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 " GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}" As a result both entries are combined: [heat-admin@overcloudcy4-compdpdk-0 ~]$ cat /proc/cmdline BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.60.2.el8_2.x86_64 root=UUID=09c8d80a-5200-4137-9384-641e2d5709ee ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 In logs I could find following actions taken for dpdk node (the int at the beginning is a line number in log): 268:TASK [fix grub entries to have name start with GRUB_] ************************** 1890:TASK [tripleo-kernel : Ensure the kernel args ( intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] *** 3472:TASK [fix grub entries to have name start with GRUB_] ************************** 5141:TASK [tripleo-kernel : Ensure the kernel args ( intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] *** 6228:TASK [fix grub entries to have name start with GRUB_] ************************** 7896:TASK [tripleo-kernel : Ensure the kernel args ( intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] *** It means that entries are first fixed with grub_ prefix, but later the wrong entry is added once again, due to: [stack@undercloud ~]$ cat /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml ... # Kernel Args Configuration - block: - name: Ensure the kernel args ( {{ tripleo_kernel_args }} ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS lineinfile: dest: /etc/default/grub regexp: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' insertafter: '^GRUB_CMDLINE_LINUX.*' line: 'TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS=" {{ tripleo_kernel_args }} "' - name: Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter lineinfile: dest: /etc/default/grub line: 'GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX:+$GRUB_CMDLINE_LINUX }${TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS}"' insertafter: '^TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS.*' Above one is provided by following package: [stack@undercloud ~]$ rpm -q --whatprovides /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml tripleo-ansible-0.5.1-1.20210323173506.el8ost.noarch IMHO /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml should regex-replace/insert GRUB_TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS instead of TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS
Yes, this patch https://review.opendev.org/c/openstack/tripleo-ansible/+/753207, which does the modification is not present in the 16.1 version, the issue is valid. I will check and update.
This appears to be specific to 16.1 only, so removing the 13.x and 16.2 flags.
This happens only when the kernel args are changed during the upgrade. Existing (OSP13) >>>>>>>>>>>>> " iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=2048 " New (as part of 16.1 upgrade) >> " intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=4 "
I took a look on https://review.opendev.org/c/openstack/tripleo-ansible/+/753207/ which contains the fixes. I see that dir is different on our environmnet (ansible -> tripleo_ansible). In case that fixed package will arrive too late for us are there any ETAs regarding porting this patch to 16.1? Packages on our lab: [stack@undercloud ~]$ rpm -qa | egrep "tripleo|openstack" | sort ansible-role-openstack-operations-0.0.1-0.20200311080930.274739e.el8ost.noarch ansible-role-tripleo-modify-image-1.2.1-1.20201114004656.1dffa21.el8ost.noarch ansible-tripleo-ipa-0.2.1-1.20210407143429.3bb3c53.el8ost.noarch ansible-tripleo-ipsec-9.3.0-1.20201113193132.0c8693c.el8ost.noarch contrail_cloud-openstack-containers-rhosp16-20210630035944.el8.x86_64 openstack-heat-agents-1.10.1-1.20201113195131.96b819c.el8ost.noarch openstack-heat-api-13.1.0-1.20210406084933.48b730a.el8ost.noarch openstack-heat-common-13.1.0-1.20210406084933.48b730a.el8ost.noarch openstack-heat-engine-13.1.0-1.20210406084933.48b730a.el8ost.noarch openstack-heat-monolith-13.1.0-1.20210406084933.48b730a.el8ost.noarch openstack-ironic-python-agent-builder-2.2.0-1.20201114043345.69e41ff.el8ost.noarch openstack-selinux-0.8.24-1.20210407093456.26243bf.el8ost.noarch openstack-tripleo-common-11.4.1-1.20210407183435.el8ost.noarch openstack-tripleo-common-containers-11.4.1-1.20210407183435.el8ost.noarch openstack-tripleo-heat-templates-11.3.2-1.20210408163453.el8ost.noarch openstack-tripleo-image-elements-10.6.2-1.20201113215051.7dc0fa1.el8ost.noarch openstack-tripleo-puppet-elements-11.2.2-1.20201114042506.f061f90.el8ost.noarch openstack-tripleo-validations-11.3.2-1.20210408103438.el8ost.noarch puppet-openstack_extras-15.4.1-1.20201113215732.371931c.el8ost.noarch puppet-openstacklib-15.4.1-1.20201113204514.5fdf43c.el8ost.noarch puppet-tripleo-11.5.0-1.20210406223722.f716ef5.el8ost.noarch python3-openstackclient-4.0.1-1.20210310102048.bff556c.el8ost.noarch python3-openstacksdk-0.36.4-1.20210310100715.76d3b29.el8ost.noarch python3-tripleoclient-12.3.2-1.20210407123431.ae58329.el8ost.noarch python3-tripleoclient-heat-installer-12.3.2-1.20210407123431.ae58329.el8ost.noarch python3-tripleo-common-11.4.1-1.20210407183435.el8ost.noarch python-openstackclient-lang-4.0.1-1.20210310102048.bff556c.el8ost.noarch tf-tripleo-heat-templates-16.2.0_dev11-ef5928d.el8.noarch tripleo-ansible-0.5.1-1.20210323173506.el8ost.noarch
(In reply to Maciej Relewicz from comment #10) > I took a look on > https://review.opendev.org/c/openstack/tripleo-ansible/+/753207/ which > contains the fixes. > I see that dir is different on our environmnet (ansible -> tripleo_ansible). In 16.1, when the package is installed, the file should be at /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml > In case that fixed package will arrive too late for us are there any ETAs > regarding porting this patch to 16.1? I don't have an ETA, need to discuss with release planning to provide it. I Will update once confirmed. This issue will not happen if you retain the same kernel args during the upgrade. That's an alternative that could be locked into.
Note added to release notes at the end of this section: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/release_notes/index#known_issues
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762