| Summary: | Newton (HA) Deployments failing w/ httpd service restart timeout | ||
|---|---|---|---|
| Product: | [Community] RDO | Reporter: | Matt Young <matyoung> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Tomas Sedovic <tsedovic> |
| Status: | CLOSED ERRATA | QA Contact: | Arik Chernetsky <achernet> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | trunk | CC: | apevec, chris.brown, jschluet, matyoung, mburns, whayutin |
| Target Milestone: | --- | Keywords: | Automation, AutomationBlocker |
| Target Release: | trunk | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-06-22 15:23:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
> Error: /Stage[main]/Apache::Service/Service[httpd]: Could not restart Service[httpd]: Execution of '/usr/bin/systemctl restart httpd' returned 1: Job for httpd.service failed because a timeout was exceeded. See "systemctl status httpd.service" and "journalctl -xe" for details. Timeout was increased to 5min in https://review.rdoproject.org/r/1822 Is it really taking that much, is this emulated qemu VM? @Alan, If httpd fails to start 3/10 times which is about what we've seen, then yes. Ideally, this setting should be configurable from the overcloud deployment. I need to check on that. It's not 100% clear yet the issue would be resolved w/ a longer timeout. @Alan, this is running in nested KVM, with --libvirt-type kvm being passed to overcloud deploy, so not qemu mode. I'm pretty sure these were resolved a while back with various fixes. I'll close but please re-open if not the case. |
Description of problem: We have seen this issue multiple times in the internal newton tripleo-quickstart based pipelines when deploying an overcloud in virt HA configuration. What we observe is thus: deploy command -------------- openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates --libvirt-type kvm --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server pool.ntp.org deploy output ------------- Started Mistral Workflow. Execution ID: a00f2423-5c6d-4957-94c8-a853457abbe9 Plan updated Deploying templates in the directory /tmp/tripleoclient-PD8sog/tripleo-heat-templates Started Mistral Workflow. Execution ID: f800a2b6-6353-4798-9971-b10d9a42194c 2016-10-18 18:22:22Z [overcloud]: CREATE_IN_PROGRESS Stack CREATE started 2016-10-18 18:22:22Z [overcloud.Networks]: CREATE_IN_PROGRESS state changed 2016-10-18 18:22:23Z [overcloud.HorizonSecret]: CREATE_IN_PROGRESS state changed 2016-10-18 18:22:23Z [overcloud.MysqlRootPassword]: CREATE_IN_PROGRESS state changed <snip>many CREATE_IN_PROGRESS && CREATE_COMPLETE</snip> 2016-10-18 19:04:21Z [overcloud.Heat Stack create failed. AllNodesDeploySteps.ComputeDeployment_Step5.0]: SIGNAL_IN_PROGRESS Signal: deployment 4e020e01-acd5-45f3-ad29-be122f27bcb5 succeeded 2016-10-18 19:04:21Z [overcloud.AllNodesDeploySteps.ComputeDeployment_Step5.0]: CREATE_COMPLETE state changed 2016-10-18 19:04:21Z [overcloud.AllNodesDeploySteps.ComputeDeployment_Step5]: CREATE_COMPLETE Stack CREATE completed successfully 2016-10-18 19:04:22Z [overcloud.AllNodesDeploySteps.ComputeDeployment_Step5]: CREATE_COMPLETE state changed 2016-10-18 19:24:18Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.2]: SIGNAL_IN_PROGRESS Signal: deployment 48ddd689-3651-47cd-b0cd-e7da00cc70b5 succeeded 2016-10-18 19:24:19Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.2]: CREATE_COMPLETE state changed 2016-10-18 19:27:18Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.1]: SIGNAL_IN_PROGRESS Signal: deployment 81a8ffaf-07a2-4b59-9e48-1ccb89335f55 failed (6) 2016-10-18 19:27:18Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.1]: CREATE_FAILED Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 2016-10-18 19:31:18Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.0]: CREATE_FAILED CREATE aborted 2016-10-18 19:31:18Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5]: CREATE_FAILED Resource CREATE failed: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 2016-10-18 19:31:19Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5]: CREATE_FAILED Error: resources.ControllerDeployment_Step5.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6 2016-10-18 19:31:19Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerDeployment_Step5.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6 2016-10-18 19:31:20Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step5.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6 2016-10-18 19:31:20Z [overcloud]: CREATE_FAILED Resource CREATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step5.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6 failed_deployment_list.log -------------------------- (see private comments for links to full logs) overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.1: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 81a8ffaf-07a2-4b59-9e48-1ccb89335f55 status: CREATE_FAILED status_reason: | Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 deploy_stdout: | ... [mNotice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Triggered 'refresh' from 6 events[0m [mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::config::end]: Triggered 'refresh' from 30 events[0m [mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::service::begin]: Triggered 'refresh' from 1 events[0m [mNotice: /Stage[main]/Neutron::Server/Service[neutron-server]/ensure: ensure changed 'stopped' to 'running'[0m [mNotice: /Stage[main]/Neutron::Agents::Ml2::Ovs/Service[neutron-ovs-agent-service]: Triggered 'refresh' from 1 events[0m [mNotice: /Stage[main]/Neutron::Agents::Dhcp/Service[neutron-dhcp-service]: Triggered 'refresh' from 1 events[0m [mNotice: /Stage[main]/Neutron::Agents::L3/Service[neutron-l3]: Triggered 'refresh' from 1 events[0m [mNotice: /Stage[main]/Neutron::Agents::Metadata/Service[neutron-metadata]: Triggered 'refresh' from 1 events[0m [mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::service::end]: Triggered 'refresh' from 5 events[0m [mNotice: Finished catalog run in 1081.00 seconds[0m (truncated, view all with --long) deploy_stderr: | ... [1;31mWarning: Scope(Class[Gnocchi::Api]): gnocchi:api::keystone_identity_uri is deprecated, use gnocchi::keystone::authtoken::auth_url instead[0m [1;31mWarning: Scope(Class[Gnocchi::Api]): gnocchi::api::keystone_auth_uri is deprecated, use gnocchi::keystone::authtoken::auth_uri instead[0m [1;31mWarning: Not collecting exported resources without storeconfigs[0m [1;31mWarning: Not collecting exported resources without storeconfigs[0m [1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.[0m [1;31mWarning: Not collecting exported resources without storeconfigs[0m [1;31mWarning: Not collecting exported resources without storeconfigs[0m [1;31mWarning: Not collecting exported resources without storeconfigs[0m [1;31mError: /Stage[main]/Apache::Service/Service[httpd]: Failed to call refresh: Could not restart Service[httpd]: Execution of '/usr/bin/systemctl restart httpd' returned 1: Job for httpd.service failed because a timeout was exceeded. See "systemctl status httpd.service" and "journalctl -xe" for details.[0m [1;31mError: /Stage[main]/Apache::Service/Service[httpd]: Could not restart Service[httpd]: Execution of '/usr/bin/systemctl restart httpd' returned 1: Job for httpd.service failed because a timeout was exceeded. See "systemctl status httpd.service" and "journalctl -xe" for details.[0m (truncated, view all with --long) overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: d55a81aa-3454-4eda-b164-f5b323416c08 status: CREATE_FAILED status_reason: | CREATE aborted deploy_stdout: | None deploy_stderr: | None --- Version-Release number of selected component (if applicable): We started observing this at: - https://trunk.rdoproject.org/centos7-newton/b0/bc/b0bca42c26080066f5f649c5cb76746e242f0c23_0dcd1644 Currently (tonight, 10/18) RDO just promoted: - https://trunk.rdoproject.org/centos7-newton/b2/6c/b26caaaf6e975588be93dbd873f4d995159d9905_70164fd8 which is currently in test/CI (awaiting results) How reproducible: Reproduced in CI, and we have successfully reproduced this in isolation as well (see comments below for internal RH links to artifacts Steps to Reproduce: We are using tripleo-quickstart to deploy HA --> virt, and CI jobs are reproducing. If we need to set up a separate reproducer box we can address. We can also pull a CI node in the failed state out of rotation to debug, if the artifacts are not enough to go on (for now). Actual results: overcloud deploy fails Expected results: overcloud deploys in HA configuration Additional info: https://review.rdoproject.org/etherpad/p/rdo-internal-issues #38