Description of problem: I tried deploying OSP 12 with 7.5 and it's unable to complete. It is failing with an error on docker: overcloud.AllNodesDeploySteps.ComputeDeployment_Step4.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 3d87c07c-ae48-44b6-b2f6-b47bcf6350dc status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... "stderr: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 29\\\"\".", "stdout: 3297a0da1f515b589d875a21c70cbc148e5b6da8f4ae0a61576d2e963ced6776", "stdout: e0f349bd96d1a30cd0135870d8dd1c7be7c724f5e1e2e850c33165e5fe5d045b" ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/00cce50b-25c4-4ac6-85a3-7709aece00db_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=7 changed=2 unreachable=0 failed=1 (truncated, view all with --long) deploy_stderr: | Those are my docker packages: python-docker-pycreds-1.10.6-3.el7.noarch python-docker-py-1.10.6-3.el7.noarch docker-client-1.13.1-58.git87f2fab.el7.x86_64 docker-common-1.13.1-58.git87f2fab.el7.x86_64 python-heat-agent-docker-cmd-1.4.0-1.el7ost.noarch docker-rhel-push-plugin-1.13.1-58.git87f2fab.el7.x86_64 docker-1.13.1-58.git87f2fab.el7.x86_64
Created attachment 1420875 [details] sosreport on the controller that fails
Can you please provide a full 'openstack stack failures list overcloud -f yaml'. I'm not seeing the error in the sosreport. I'm not sure the node that failed is the node provide in the sosreport.
Actual error: "Error running ['docker', 'run', '--name', 'ceilometer_agent_compute', '--label', 'config_id=tripleo_step4', '--label', 'container_name=ceilometer_agent_compute', '--label', 'managed_by=paunch', '--label', 'config_data={\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=3b9ff94d55e51b37915cd2e78aea42de\"], \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/ceilometer_agent_compute.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/ceilometer/:/var/lib/kolla/config_files/src:ro\", \"/var/run/libvirt:/var/run/libvirt:ro\", \"/var/log/containers/ceilometer:/var/log/ceilometer\"], \"image\": \"registry.access.redhat.com/rhosp12/openstack-ceilometer-compute:latest\", \"net\": \"host\", \"restart\": \"always\", \"privileged\": false}', '--detach=true', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=3b9ff94d55e51b37915cd2e78aea42de', '--net=host', '--privileged=false', '--restart=always', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/kolla/config_files/ceilometer_agent_compute.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/ceilometer/:/var/lib/kolla/config_files/src:ro', '--volume=/var/run/libvirt:/var/run/libvirt:ro', '--volume=/var/log/containers/ceilometer:/var/log/ceilometer', 'registry.access.redhat.com/rhosp12/openstack-ceilometer-compute:latest']. [125]
Created attachment 1420917 [details] sosreport on the compute-0 that fails
So this is likely the same issue being reported in Bug 1562035. This seems to be related to the sriov configuration being attempted.
Apr 12 13:42:35 overcloud-compute-0 dockerd-current[17824]: nsenter: failed to unshare namespaces: Invalid argument Apr 12 13:42:35 overcloud-compute-0 oci-systemd-hook[28408]: systemdhook <debug>: 8ae658bd92fc: Skipping as container command is kolla_start, not init or systemd Apr 12 13:42:35 overcloud-compute-0 oci-umount[28410]: umounthook <debug>: 8ae658bd92fc: only runs in prestart stage, ignoring Apr 12 13:42:35 overcloud-compute-0 dockerd-current[17824]: container_linux.go:247: starting container process caused "process_linux.go:245: running exec setns process for init caused \"exit status 29\""
Could you please try the suggestion in this comment and reply with your results? https://bugzilla.redhat.com/show_bug.cgi?id=1562035#c5
I tried, but it still fails in the same way
I can confirm it's an interaction with tuned. If i deploy without my tuned profile, deploy succeeds. The code i used for tuned is: yum install -y tuned-profiles-cpu-partitioning tuned_conf_path="/etc/tuned/cpu-partitioning-variables.conf" if [ -n "$TUNED_CORES" ]; then grep -q "^isolated_cores" $tuned_conf_path if [ "$?" -eq 0 ]; then sed -i 's/^isolated_cores=.*/isolated_cores=$TUNED_CORES/' $tuned_conf_path else echo "isolated_cores=$TUNED_CORES" >> $tuned_conf_path fi tuned-adm profile cpu-partitioning fi As soon as i disabled that change, the deploy works
Just as related and referenced BZ, assigning to DF:NFV dradaz and marking as untriaged. Once the issue is diagnosed it can already be reassigned back to us if the fix is changing the container config.
I faced same issue with nova-compute container during Step4 in a DPDK node. In my case, if you try to deploy again without changing anything, you can see how deployment goes through that step (I had two DPDK compute nodes so I had to run overcloud deploy 3 times to complete it). I'm using RHEL 7.4 but latest container image tags (12.0-20180405.1). As additional info, I don't use mistral workflow to calculate derived params (I did it manually). Also I've seen above that in the KernelArgs you are including isolcpus instead of using the IsolCpusList param (I use it in my deployment): Parameters used: TunedProfileName: "cpu-partitioning" OvsPmdCoreList: 2,34,18,50 NovaVcpuPinSet: 3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47,19,51,20,52,21,53,22,54,23,55,24,56,25,57,26,58,27,59,28,60,29,61,30,62,31,63 IsolCpusList: 2,34,3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47,18,50,19,51,20,52,21,53,22,54,23,55,24,56,25,57,26,58,27,59,28,60,29,61,30,62,31,63 OvsDpdkCoreList: 0,32,16,48 OvsDpdkMemoryChannels: 4 NovaReservedHostMemory: 28672 OvsDpdkSocketMemory: 6144,1024 KernelArgs: default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on NeutronDatapathType: "netdev" Error log: "outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [ "Error running ['docker', 'run', '--name', 'nova_compute', '--label', 'config_id=tripleo_step4', '--label', 'container_name=nova_compute', '--label', 'managed_by=paunch', '--label', 'config_data={\"ipc\": \"host\", \"image\": \"192.168.128.11:8787/rhosp12/open stack-nova-compute:12.0-20180405.1\", \"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=424fde46f2697dc3bb7f50f6a8ad3689\"], \"user\": \"nova\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-tr ust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/ dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/nova_compute.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/nova_libvirt/:/var/lib/kolla/config_f iles/src:ro\", \"/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro\", \"/dev:/dev\", \"/lib/modules:/lib/modules:ro\", \"/etc/iscsi:/etc/iscsi\", \"/run:/run\", \"/var/lib/nova:/var/lib/nova:shared\", \"/var/lib/libvirt:/var/lib/libvirt\", \"/var/log/containers/nova:/var/ log/nova\", \"/sys/class/net:/sys/class/net\", \"/sys/bus/pci:/sys/bus/pci\"], \"net\": \"host\", \"privileged\": true, \"restart\": \"always\"}', '--detach=true', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=424fde46f2697dc3bb7f50f6a8ad3689', '-- net=host', '--ipc=host', '--privileged=true', '--restart=always', '--user=nova', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/tls/certs/ca-bundle .crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh _known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/kolla/config_files/nova_compute.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/nova_libvirt/:/var/lib/kolla/config_files/src:ro', '--volume=/etc/ ceph:/var/lib/kolla/config_files/src-ceph:ro', '--volume=/dev:/dev', '--volume=/lib/modules:/lib/modules:ro', '--volume=/etc/iscsi:/etc/iscsi', '--volume=/run:/run', '--volume=/var/lib/nova:/var/lib/nova:shared', '--volume=/var/lib/libvirt:/var/lib/libvirt', '--volume=/va r/log/containers/nova:/var/log/nova', '--volume=/sys/class/net:/sys/class/net', '--volume=/sys/bus/pci:/sys/bus/pci', '192.168.128.11:8787/rhosp12/openstack-nova-compute:12.0-20180405.1']. [125]", "", "stdout: f2b2e112081e5c4a0e25c6b12ef3a75c4091126e2e29e470ab930e4537891c74", "stderr: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 29\\\"\".", "stdout: 0bdb69b5c147966400d9a12757da3e5148eedc5e7eac6238ba639b29d889fc2e", "stderr: ", "stdout: 44535769e4197b6634834584fd3abb12bff7e103c27ec52f0ab4a870d5a9961f" ] }
This blocks fast forward upgrade for telcos as well. These telcos have tuned profiles initially on their deployments. And when i execute fast forward upgrade on my computes, i cannot upgrade them.
Today i also could hit it with tuned disabled. It seems to be happening randomly on containers, but I mostly found failures on crond, iscsid, nova-libvirt and neutron
*** This bug has been marked as a duplicate of bug 1562035 ***