Bug 1826972

Summary: tuned-adm profile realtime-virtual-host fails in RHOS 13 deployment with containerized services
Product: Red Hat Enterprise Linux 7 Reporter: James Parker <jparker>
Component: kernel-rtAssignee: Marcelo Tosatti <mtosatti>
kernel-rt sub component: Other QA Contact: Pei Zhang <pezhang>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: bhu, chayang, jinzhao, juzhang, lcapitulino, mtosatti, nilal, pezhang, qzhao, ribarry, williams
Version: 7.8   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-05 23:23:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1672377    

Description James Parker 2020-04-22 22:11:45 UTC
Description of problem:
TripleO deployment of a real-time compute role in a RHOS 13 fails during the PreNetworkConfig stage when it attempts to apply the realtime-virtal-host profile via 'tuned-adm profile realtime-virtual-host'.  From TripleO's perspective when the profile is applied, the operation reaches its 600 second timeout and fails. When investigating the compute host the profile is hanging when it attempts to calculate the value for lapic_timer_advance_ns.  tuned/profiles/realtime-virtual-host/script.sh calls /usr/libexec/qemu-kvm when executing run_tsc_deadline_latency() and simply hangs with nothing being passed to the tmp/out files. At the time of the execution the libvirtd service is loaded but inactive.  Workaround currently is starting the libvirtd service and rerunning the deployment.

Version-Release number of selected component (if applicable):
RHEL 7.8

How reproducible:
100%


Steps to Reproduce:
1. Create a ComputeRealTime role that uses the realtime-virtual-host profile
parameter_defaults:
...
  ComputeRealTimeParameters:
...
    TunedProfileName: "realtime-virtual-host"

2. Use TripleO to deploy RHOS 13 with containerized services and compute realtime role.
 

Actual results:
TripleO deployment Fails:
[jparker@localhost stack]$ cat openstack_failures_long.log 
overcloud.ComputeRealTime.0.PreNetworkConfig.HostParametersDeployment:
  resource_type: OS::TripleO::Reboot::SoftwareDeployment
  physical_resource_id: 9be772cf-a0f1-449b-9e0e-6aa43bd452de
  status: CREATE_FAILED
  status_reason: |
    Error: resources.HostParametersDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2


Expected results:
Deployment is successfully


Additional info:
# Overcloud Task Failures
[jparker@localhost stack]$ cat overcloud_install.log | grep -Eo '\[overcloud\..*_FAILED.*'
[overcloud.ComputeRealTime.0.PreNetworkConfig]: CREATE_FAILED  Error: resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime.0]: CREATE_FAILED  Resource CREATE failed: Error: resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime.0]: CREATE_FAILED  Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime]: UPDATE_FAILED  Resource CREATE failed: Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime]: CREATE_FAILED  resources.ComputeRealTime: Resource CREATE failed: Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2

# Compute realtime messages
[jparker@localhost computerealtime-0]$ grep -E 'stderr|FAILED' var/log/messages 
Apr 15 16:34:35 computerealtime-0 cloud-init: 2020-04-15 20:34:35,523 - main.py[WARNING]: Stdout, stderr changing to (| tee -a /var/log/cloud-init-output.log, | tee -a /var/log/cloud-init-output.log)
Apr 15 16:45:36 computerealtime-0 os-collect-config: [2020-04-15 16:45:36,063] (heat-config) [INFO] {"deploy_stdout": "\nPLAY [Configuration to be applied before rebooting the node] *******************\n\nTASK [Gathering Facts] *********************************************************\nok: [localhost]\n\nTASK [Get the command line args of the node] ***********************************\nchanged: [localhost]\n\nTASK [Get the active tuned profile] ********************************************\nchanged: [localhost]\n\nTASK [Ensure the kernel args ( default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=2,3,4,5,6,7,8,9,10,11,12,13 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] ***\nchanged: [localhost]\n\nTASK [Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter] ***\nchanged: [localhost]\n\nTASK [Generate grub config file] ***********************************************\nchanged: [localhost]\n\nTASK [Set reboot required fact] ************************************************\nok: [localhost]\n\nTASK [Check Tune-d Configuration file exists] **********************************\nok: [localhost]\n\nTASK [Tune-d Configuration] ****************************************************\nchanged: [localhost]\n\nTASK [Tune-d profile activation] ***********************************************\nfatal: [localhost]: FAILED! => {\"changed\": true, \"cmd\": \"tuned-adm profile realtime-virtual-host\", \"delta\": \"0:10:01.359620\", \"end\": \"2020-04-15 16:45:35.993044\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2020-04-15 16:35:34.633424\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.\", \"stdout_lines\": [\"Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.\"]}\n\tto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/80160902-ebc7-496f-9cf0-6c2ff66f8f99_playbook.retry\n\nPLAY RECAP *********************************************************************\nlocalhost                  : ok=9    changed=6    unreachable=0    failed=1   \n\n", "deploy_stderr": "", "deploy_status_code": 2}
Apr 15 16:45:36 computerealtime-0 os-collect-config: fatal: [localhost]: FAILED! => {"changed": true, "cmd": "tuned-adm profile realtime-virtual-host", "delta": "0:10:01.359620", "end": "2020-04-15 16:45:35.993044", "msg": "non-zero return code", "rc": 1, "start": "2020-04-15 16:35:34.633424", "stderr": "", "stderr_lines": [], "stdout": "Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.", "stdout_lines": ["Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async."]}

# QEMU Process Remains up
[root@computerealtime-0 heat-admin]# ps -efwww | grep qemu
root     16208 15846 82 16:49 ?        00:00:01 /usr/libexec/qemu-kvm -S -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel /usr/share/qemu-kvm/tscdeadline_latency.flat -cpu host -mon chardev=char0,mode=readline -chardev socket,id=char0,nowait,path=/tmp/tmp.NAK4S0QVsU,server

Specific Ansible Play:
https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/queens/extraconfig/pre_network/boot_param_tasks.yaml#L47


Packages Used:
[heat-admin@computerealtime-0 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[heat-admin@computerealtime-0 ~]$ uname -a
Linux computerealtime-0 3.10.0-1127.rt56.1093.el7.x86_64 #1 SMP PREEMPT RT Wed Feb 19 11:36:25 EST 2020 x86_64 x86_64 x86_64 GNU/Linux

[heat-admin@computerealtime-0 ~]$ rpm -qa | grep tuned
tuned-2.11.0-8.el7.noarch
tuned-profiles-nfv-host-2.11.0-8.el7.noarch
tuned-profiles-realtime-2.11.0-8.el7.noarch
tuned-profiles-cpu-partitioning-2.11.0-8.el7.noarch

[heat-admin@computerealtime-0 ~]$ rpm -qa | grep rt-tests
rt-tests-1.5-9.el7.x86_64