Description of problem: deploy RHOSP10 with RHEL 7.6 using our public CDN. As overcloud images have not been updated since 10.z9, Overcloud deploy is fine. Then I "yum update" on all of my (overcloud) nodes, rebooted them (on RHEL7.6), and then I cannot start any new VM for an obvious reason: openstack server show my_failed_vm | fault | {u'message': u'Exceeded maximum number of retries. Exceeded max | | | scheduling attempts 3 for instance e1f3a644-c9de-4526-aee4-4f0a9743f721. | | | Last exception: invalid argument: could not find capabilities for | | | domaintype=kvm ', u'code': 500, u'details': u' File "/usr/lib/python2.7 | | | /site-packages/nova/conductor/manager.py", line 493, in | | | build_instances\n filter_properties, instances[0].uuid)\n File | | | "/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 184, in | | | populate_retry\n raise exception.MaxRetriesExceeded(reason=msg)\n', | | | u'created': u'2018-10-31T20:44:00Z'} | On a compute node, kvm_intel is not loaded!! [root@overcloud-compute-0 etc]# modprobe kvm_intel modprobe: ERROR: could not insert 'kvm_intel': Unknown symbol in module, or unknown parameter (see dmesg) [root@overcloud-compute-0 etc]# dmesg [ 8923.582176] kvm_intel: Unknown parameter `ple_gap' [root@overcloud-compute-0 etc]# grep -r ple_gap /etc/ /etc/modprobe.d/kvm.rt.tuned.conf:options kvm_intel ple_gap=0 I removed that line from kvm.rt.tuned.conf, rebooted (as I'm lazy) and now I can start VMs. This is not an outstanding bug but if no update script that I skipped is removing that line, we have a regression. Can someone check urgently if we have a regression? Version-Release number of selected component (if applicable): RHEL 7.6 Openstack 10 - z9 How reproducible: Always Steps to Reproduce: 1. Deploy RHOSP 10 z9 with base undercloud image as 7.6 2. Deploy overcloud with defualt overcloud image (which is still 7.5) 3. On the Overcloud nodes do a yum update to 7.6, and this issue is seen. Actual results: Expected results: Additional info:
We have also tested it via 'overcloud stack update' -- ie.. 1. Deploy the undercloud (rhel 7.6) - then deploy overcloud (OSP version 10) 2. POint the repositories to 7.6 3. Then ran openstack overcloud deploy --update-plan-only ... 4. Then Ran openstack overcloud update stack -i overcloud Still the same issue kvm_intel module is not loaded
This might be better directed to the RHEL team (either kernel or virt or RT team). > /etc/modprobe.d/kvm.rt.tuned.conf The RT makes me think this is RealTime, is that correct? did you have the RT kernel running on 7.6 but update to the non-RT kernel?
(In reply to Mike Burns from comment #2) > This might be better directed to the RHEL team (either kernel or virt or RT > team). > > > /etc/modprobe.d/kvm.rt.tuned.conf > > The RT makes me think this is RealTime, is that correct? did you have the > RT kernel running on 7.6 but update to the non-RT kernel? These overcloud nodes are not RT-KVM images. Certainly the naming is confusing, However these are normal compute nodes with 7.5 images, after deployment and spawning VM's, the overcloud nodes were updated (via update stack). After the update, reboot was done, and after which we see that the kvm_intel module to be not loaded. I guess this bug should first be investigated with osp update folks?
This was verified. We did a upgrade and reboot of the overcloud nodes and the VM's did come up. The issue with loading of module kvm_intel is not seen with update to 7.6 from 7.5.
Hi all, My customer just hit this issue while upgrading their RHOSP 10 update 5 environment to RHOSP 10 update 10 (RHEL 7.4 -> RHEL 7.6). To perform the update they did the standard overcloud update procedure. The workaround of removing the option: options kvm_intel ple_gap=0 From /etc/modprobe.d/kvm.rt.tuned.conf resolved their issue. They've since put a post-install script step in their deployment templates to work around this issue. As far as I can tell this issue is not yet resolved. For reference this is the version they ended up on post-upgrade. tuned-2.10.0-6.el7.noarch Wed Jan 23 01:45:49 2019 tuned-profiles-cpu-partitioning-2.10.0-6.el7.noarch Wed Jan 23 02:02:27 2019 Let me know if you need any additional data to troubleshoot further.
Since the problem described in this bug report should be resolved in this build tuned-2.10.0-6.el7_6.3 which shipped 13-Mar-19, it has been closed with a resolution of CURRENTRELEASE. For information, please reference https://bugzilla.redhat.com/show_bug.cgi?id=1653767