Description of problem:
TripleO deployment of a real-time compute role in a RHOS 13 fails during the PreNetworkConfig stage when it attempts to apply the realtime-virtal-host profile via 'tuned-adm profile realtime-virtual-host'. From TripleO's perspective when the profile is applied, the operation reaches its 600 second timeout and fails. When investigating the compute host the profile is hanging when it attempts to calculate the value for lapic_timer_advance_ns. tuned/profiles/realtime-virtual-host/script.sh calls /usr/libexec/qemu-kvm when executing run_tsc_deadline_latency() and simply hangs with nothing being passed to the tmp/out files. At the time of the execution the libvirtd service is loaded but inactive. Workaround currently is starting the libvirtd service and rerunning the deployment.
Version-Release number of selected component (if applicable):
RHEL 7.8
How reproducible:
100%
Steps to Reproduce:
1. Create a ComputeRealTime role that uses the realtime-virtual-host profile
parameter_defaults:
...
ComputeRealTimeParameters:
...
TunedProfileName: "realtime-virtual-host"
2. Use TripleO to deploy RHOS 13 with containerized services and compute realtime role.
Actual results:
TripleO deployment Fails:
[jparker@localhost stack]$ cat openstack_failures_long.log
overcloud.ComputeRealTime.0.PreNetworkConfig.HostParametersDeployment:
resource_type: OS::TripleO::Reboot::SoftwareDeployment
physical_resource_id: 9be772cf-a0f1-449b-9e0e-6aa43bd452de
status: CREATE_FAILED
status_reason: |
Error: resources.HostParametersDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
Expected results:
Deployment is successfully
Additional info:
# Overcloud Task Failures
[jparker@localhost stack]$ cat overcloud_install.log | grep -Eo '\[overcloud\..*_FAILED.*'
[overcloud.ComputeRealTime.0.PreNetworkConfig]: CREATE_FAILED Error: resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime.0]: CREATE_FAILED Resource CREATE failed: Error: resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime.0]: CREATE_FAILED Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime]: UPDATE_FAILED Resource CREATE failed: Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[overcloud.ComputeRealTime]: CREATE_FAILED resources.ComputeRealTime: Resource CREATE failed: Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
# Compute realtime messages
[jparker@localhost computerealtime-0]$ grep -E 'stderr|FAILED' var/log/messages
Apr 15 16:34:35 computerealtime-0 cloud-init: 2020-04-15 20:34:35,523 - main.py[WARNING]: Stdout, stderr changing to (| tee -a /var/log/cloud-init-output.log, | tee -a /var/log/cloud-init-output.log)
Apr 15 16:45:36 computerealtime-0 os-collect-config: [2020-04-15 16:45:36,063] (heat-config) [INFO] {"deploy_stdout": "\nPLAY [Configuration to be applied before rebooting the node] *******************\n\nTASK [Gathering Facts] *********************************************************\nok: [localhost]\n\nTASK [Get the command line args of the node] ***********************************\nchanged: [localhost]\n\nTASK [Get the active tuned profile] ********************************************\nchanged: [localhost]\n\nTASK [Ensure the kernel args ( default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=2,3,4,5,6,7,8,9,10,11,12,13 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] ***\nchanged: [localhost]\n\nTASK [Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter] ***\nchanged: [localhost]\n\nTASK [Generate grub config file] ***********************************************\nchanged: [localhost]\n\nTASK [Set reboot required fact] ************************************************\nok: [localhost]\n\nTASK [Check Tune-d Configuration file exists] **********************************\nok: [localhost]\n\nTASK [Tune-d Configuration] ****************************************************\nchanged: [localhost]\n\nTASK [Tune-d profile activation] ***********************************************\nfatal: [localhost]: FAILED! => {\"changed\": true, \"cmd\": \"tuned-adm profile realtime-virtual-host\", \"delta\": \"0:10:01.359620\", \"end\": \"2020-04-15 16:45:35.993044\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2020-04-15 16:35:34.633424\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.\", \"stdout_lines\": [\"Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.\"]}\n\tto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/80160902-ebc7-496f-9cf0-6c2ff66f8f99_playbook.retry\n\nPLAY RECAP *********************************************************************\nlocalhost : ok=9 changed=6 unreachable=0 failed=1 \n\n", "deploy_stderr": "", "deploy_status_code": 2}
Apr 15 16:45:36 computerealtime-0 os-collect-config: fatal: [localhost]: FAILED! => {"changed": true, "cmd": "tuned-adm profile realtime-virtual-host", "delta": "0:10:01.359620", "end": "2020-04-15 16:45:35.993044", "msg": "non-zero return code", "rc": 1, "start": "2020-04-15 16:35:34.633424", "stderr": "", "stderr_lines": [], "stdout": "Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.", "stdout_lines": ["Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async."]}
# QEMU Process Remains up
[root@computerealtime-0 heat-admin]# ps -efwww | grep qemu
root 16208 15846 82 16:49 ? 00:00:01 /usr/libexec/qemu-kvm -S -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel /usr/share/qemu-kvm/tscdeadline_latency.flat -cpu host -mon chardev=char0,mode=readline -chardev socket,id=char0,nowait,path=/tmp/tmp.NAK4S0QVsU,server
Specific Ansible Play:
https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/queens/extraconfig/pre_network/boot_param_tasks.yaml#L47
Packages Used:
[heat-admin@computerealtime-0 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)
[heat-admin@computerealtime-0 ~]$ uname -a
Linux computerealtime-0 3.10.0-1127.rt56.1093.el7.x86_64 #1 SMP PREEMPT RT Wed Feb 19 11:36:25 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
[heat-admin@computerealtime-0 ~]$ rpm -qa | grep tuned
tuned-2.11.0-8.el7.noarch
tuned-profiles-nfv-host-2.11.0-8.el7.noarch
tuned-profiles-realtime-2.11.0-8.el7.noarch
tuned-profiles-cpu-partitioning-2.11.0-8.el7.noarch
[heat-admin@computerealtime-0 ~]$ rpm -qa | grep rt-tests
rt-tests-1.5-9.el7.x86_64
Description of problem: TripleO deployment of a real-time compute role in a RHOS 13 fails during the PreNetworkConfig stage when it attempts to apply the realtime-virtal-host profile via 'tuned-adm profile realtime-virtual-host'. From TripleO's perspective when the profile is applied, the operation reaches its 600 second timeout and fails. When investigating the compute host the profile is hanging when it attempts to calculate the value for lapic_timer_advance_ns. tuned/profiles/realtime-virtual-host/script.sh calls /usr/libexec/qemu-kvm when executing run_tsc_deadline_latency() and simply hangs with nothing being passed to the tmp/out files. At the time of the execution the libvirtd service is loaded but inactive. Workaround currently is starting the libvirtd service and rerunning the deployment. Version-Release number of selected component (if applicable): RHEL 7.8 How reproducible: 100% Steps to Reproduce: 1. Create a ComputeRealTime role that uses the realtime-virtual-host profile parameter_defaults: ... ComputeRealTimeParameters: ... TunedProfileName: "realtime-virtual-host" 2. Use TripleO to deploy RHOS 13 with containerized services and compute realtime role. Actual results: TripleO deployment Fails: [jparker@localhost stack]$ cat openstack_failures_long.log overcloud.ComputeRealTime.0.PreNetworkConfig.HostParametersDeployment: resource_type: OS::TripleO::Reboot::SoftwareDeployment physical_resource_id: 9be772cf-a0f1-449b-9e0e-6aa43bd452de status: CREATE_FAILED status_reason: | Error: resources.HostParametersDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 Expected results: Deployment is successfully Additional info: # Overcloud Task Failures [jparker@localhost stack]$ cat overcloud_install.log | grep -Eo '\[overcloud\..*_FAILED.*' [overcloud.ComputeRealTime.0.PreNetworkConfig]: CREATE_FAILED Error: resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 [overcloud.ComputeRealTime.0]: CREATE_FAILED Resource CREATE failed: Error: resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 [overcloud.ComputeRealTime.0]: CREATE_FAILED Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 [overcloud.ComputeRealTime]: UPDATE_FAILED Resource CREATE failed: Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 [overcloud.ComputeRealTime]: CREATE_FAILED resources.ComputeRealTime: Resource CREATE failed: Error: resources[0].resources.PreNetworkConfig.resources.HostParametersDeployment: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 # Compute realtime messages [jparker@localhost computerealtime-0]$ grep -E 'stderr|FAILED' var/log/messages Apr 15 16:34:35 computerealtime-0 cloud-init: 2020-04-15 20:34:35,523 - main.py[WARNING]: Stdout, stderr changing to (| tee -a /var/log/cloud-init-output.log, | tee -a /var/log/cloud-init-output.log) Apr 15 16:45:36 computerealtime-0 os-collect-config: [2020-04-15 16:45:36,063] (heat-config) [INFO] {"deploy_stdout": "\nPLAY [Configuration to be applied before rebooting the node] *******************\n\nTASK [Gathering Facts] *********************************************************\nok: [localhost]\n\nTASK [Get the command line args of the node] ***********************************\nchanged: [localhost]\n\nTASK [Get the active tuned profile] ********************************************\nchanged: [localhost]\n\nTASK [Ensure the kernel args ( default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=2,3,4,5,6,7,8,9,10,11,12,13 ) is present as TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS] ***\nchanged: [localhost]\n\nTASK [Add TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS to the GRUB_CMDLINE_LINUX parameter] ***\nchanged: [localhost]\n\nTASK [Generate grub config file] ***********************************************\nchanged: [localhost]\n\nTASK [Set reboot required fact] ************************************************\nok: [localhost]\n\nTASK [Check Tune-d Configuration file exists] **********************************\nok: [localhost]\n\nTASK [Tune-d Configuration] ****************************************************\nchanged: [localhost]\n\nTASK [Tune-d profile activation] ***********************************************\nfatal: [localhost]: FAILED! => {\"changed\": true, \"cmd\": \"tuned-adm profile realtime-virtual-host\", \"delta\": \"0:10:01.359620\", \"end\": \"2020-04-15 16:45:35.993044\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2020-04-15 16:35:34.633424\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.\", \"stdout_lines\": [\"Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.\"]}\n\tto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/80160902-ebc7-496f-9cf0-6c2ff66f8f99_playbook.retry\n\nPLAY RECAP *********************************************************************\nlocalhost : ok=9 changed=6 unreachable=0 failed=1 \n\n", "deploy_stderr": "", "deploy_status_code": 2} Apr 15 16:45:36 computerealtime-0 os-collect-config: fatal: [localhost]: FAILED! => {"changed": true, "cmd": "tuned-adm profile realtime-virtual-host", "delta": "0:10:01.359620", "end": "2020-04-15 16:45:35.993044", "msg": "non-zero return code", "rc": 1, "start": "2020-04-15 16:35:34.633424", "stderr": "", "stderr_lines": [], "stdout": "Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.", "stdout_lines": ["Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async."]} # QEMU Process Remains up [root@computerealtime-0 heat-admin]# ps -efwww | grep qemu root 16208 15846 82 16:49 ? 00:00:01 /usr/libexec/qemu-kvm -S -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel /usr/share/qemu-kvm/tscdeadline_latency.flat -cpu host -mon chardev=char0,mode=readline -chardev socket,id=char0,nowait,path=/tmp/tmp.NAK4S0QVsU,server Specific Ansible Play: https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/queens/extraconfig/pre_network/boot_param_tasks.yaml#L47 Packages Used: [heat-admin@computerealtime-0 ~]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.8 (Maipo) [heat-admin@computerealtime-0 ~]$ uname -a Linux computerealtime-0 3.10.0-1127.rt56.1093.el7.x86_64 #1 SMP PREEMPT RT Wed Feb 19 11:36:25 EST 2020 x86_64 x86_64 x86_64 GNU/Linux [heat-admin@computerealtime-0 ~]$ rpm -qa | grep tuned tuned-2.11.0-8.el7.noarch tuned-profiles-nfv-host-2.11.0-8.el7.noarch tuned-profiles-realtime-2.11.0-8.el7.noarch tuned-profiles-cpu-partitioning-2.11.0-8.el7.noarch [heat-admin@computerealtime-0 ~]$ rpm -qa | grep rt-tests rt-tests-1.5-9.el7.x86_64