RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1826972 - tuned-adm profile realtime-virtual-host fails in RHOS 13 deployment with containerized services
Summary: tuned-adm profile realtime-virtual-host fails in RHOS 13 deployment with cont...
Keywords:
Status: CLOSED DUPLICATE of bug 1852253
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Marcelo Tosatti
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1672377
TreeView+ depends on / blocked
 
Reported: 2020-04-22 22:11 UTC by James Parker
Modified: 2020-10-28 03:06 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-05 23:23:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description James Parker 2020-04-22 22:11:45 UTC
Description of problem:
TripleO deployment of a real-time compute role in a RHOS 13 fails during the PreNetworkConfig stage when it attempts to apply the realtime-virtal-host profile via 'tuned-adm profile realtime-virtual-host'.  From TripleO's perspective when the profile is applied, the operation reaches its 600 second timeout and fails. When investigating the compute host the profile is hanging when it attempts to calculate the value for lapic_timer_advance_ns.  tuned/profiles/realtime-virtual-host/script.sh calls /usr/libexec/qemu-kvm when executing run_tsc_deadline_latency() and simply hangs with nothing being passed to the tmp/out files. At the time of the execution the libvirtd service is loaded but inactive.  Workaround currently is starting the libvirtd service and rerunning the deployment.

Version-Release number of selected component (if applicable):
RHEL 7.8

How reproducible:
100%


Steps to Reproduce:
1. Create a ComputeRealTime role that uses the realtime-virtual-host profile
parameter_defaults:
...
  ComputeRealTimeParameters:
...
    TunedProfileName: "realtime-virtual-host"

2. Use TripleO to deploy RHOS 13 with containerized services and compute realtime role.
 

Actual results:
TripleO deployment Fails:
[jparker@localhost stack]$ cat openstack_failures_long.log 
overcloud.ComputeRealTime.0.PreNetworkConfig.HostParametersDeployment:
  resource_type: OS::TripleO::Reboot::SoftwareDeployment
  physical_resource_id: 9be772cf-a0f1-449b-9e0e-6aa43bd452de
  status: CREATE_FAILED
  status_reason: |
    Error: resources.HostParametersDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2


Expected results:
Deployment is successfully


Additional info:
# Overcloud Task Failures
[jparker@localhost stack]$ cat overcloud_install.log | grep -Eo '\[overcloud\..*_FAILED.*'
[overcloud.ComputeRealTime.0.PreNetworkConfig]: CREATE_FAILED  Error: resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime.0]: CREATE_FAILED  Resource CREATE failed: Error: resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime.0]: CREATE_FAILED  Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime]: UPDATE_FAILED  Resource CREATE failed: Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime]: CREATE_FAILED  resources.ComputeRealTime: Resource CREATE failed: Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2

# Compute realtime messages
[jparker@localhost computerealtime-0]$ grep -E 'stderr|FAILED' var/log/messages 
Apr 15 16:34:35 computerealtime-0 cloud-init: 2020-04-15 20:34:35,523 - main.py[WARNING]: Stdout, stderr changing to (| tee -a /var/log/cloud-init-output.log, | tee -a /var/log/cloud-init-output.log)
Apr 15 16:45:36 computerealtime-0 os-collect-config: [2020-04-15 16:45:36,063] (heat-config) [INFO] {"deploy_stdout": "\nPLAY [Configuration to be applied before rebooting the node] *******************\n\nTASK [Gathering Facts] *********************************************************\nok: [localhost]\n\nTASK [Get the command line args of the node] ***********************************\nchanged: [localhost]\n\nTASK [Get the active tuned profile] ********************************************\nchanged: [localhost]\n\nTASK [Ensure the kernel args ( default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=2,3,4,5,6,7,8,9,10,11,12,13 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] ***\nchanged: [localhost]\n\nTASK [Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter] ***\nchanged: [localhost]\n\nTASK [Generate grub config file] ***********************************************\nchanged: [localhost]\n\nTASK [Set reboot required fact] ************************************************\nok: [localhost]\n\nTASK [Check Tune-d Configuration file exists] **********************************\nok: [localhost]\n\nTASK [Tune-d Configuration] ****************************************************\nchanged: [localhost]\n\nTASK [Tune-d profile activation] ***********************************************\nfatal: [localhost]: FAILED! => {\"changed\": true, \"cmd\": \"tuned-adm profile realtime-virtual-host\", \"delta\": \"0:10:01.359620\", \"end\": \"2020-04-15 16:45:35.993044\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2020-04-15 16:35:34.633424\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.\", \"stdout_lines\": [\"Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.\"]}\n\tto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/80160902-ebc7-496f-9cf0-6c2ff66f8f99_playbook.retry\n\nPLAY RECAP *********************************************************************\nlocalhost                  : ok=9    changed=6    unreachable=0    failed=1   \n\n", "deploy_stderr": "", "deploy_status_code": 2}
Apr 15 16:45:36 computerealtime-0 os-collect-config: fatal: [localhost]: FAILED! => {"changed": true, "cmd": "tuned-adm profile realtime-virtual-host", "delta": "0:10:01.359620", "end": "2020-04-15 16:45:35.993044", "msg": "non-zero return code", "rc": 1, "start": "2020-04-15 16:35:34.633424", "stderr": "", "stderr_lines": [], "stdout": "Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.", "stdout_lines": ["Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async."]}

# QEMU Process Remains up
[root@computerealtime-0 heat-admin]# ps -efwww | grep qemu
root     16208 15846 82 16:49 ?        00:00:01 /usr/libexec/qemu-kvm -S -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel /usr/share/qemu-kvm/tscdeadline_latency.flat -cpu host -mon chardev=char0,mode=readline -chardev socket,id=char0,nowait,path=/tmp/tmp.NAK4S0QVsU,server

Specific Ansible Play:
https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/queens/extraconfig/pre_network/boot_param_tasks.yaml#L47


Packages Used:
[heat-admin@computerealtime-0 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[heat-admin@computerealtime-0 ~]$ uname -a
Linux computerealtime-0 3.10.0-1127.rt56.1093.el7.x86_64 #1 SMP PREEMPT RT Wed Feb 19 11:36:25 EST 2020 x86_64 x86_64 x86_64 GNU/Linux

[heat-admin@computerealtime-0 ~]$ rpm -qa | grep tuned
tuned-2.11.0-8.el7.noarch
tuned-profiles-nfv-host-2.11.0-8.el7.noarch
tuned-profiles-realtime-2.11.0-8.el7.noarch
tuned-profiles-cpu-partitioning-2.11.0-8.el7.noarch

[heat-admin@computerealtime-0 ~]$ rpm -qa | grep rt-tests
rt-tests-1.5-9.el7.x86_64


Note You need to log in before you can comment on or make changes to this bug.