Created attachment 1262845 [details] logs Description of problem: OSP10 -> OSP11 upgrade fails when Ironic services are enabled on the overcloud and placed on a custom role. The failure occurs during major-upgrade-composable-steps: stdout: overcloud.AllNodesDeploySteps.AllNodesPostUpgradeSteps.ServiceApiDeployment_Step4.1: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 74406536-d626-4937-a98c-c8f28e8ce15e status: CREATE_FAILED status_reason: | Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 deploy_stdout: | ... Notice: /Stage[main]/Apache/Concat[/etc/httpd/conf/ports.conf]/File[/etc/httpd/conf/ports.conf]/content: content changed '{md5}99cf4d0a57605985d8bb74bd78f75b93' to '{md5}c1854096e2bbbe8099d98583abc9b843' Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::config::end]: Triggered 'refresh' from 15 events Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::begin]: Triggered 'refresh' from 1 events Notice: /Stage[main]/Apache::Service/Service[httpd]: Triggered 'refresh' from 4 events Notice: /Stage[main]/Tripleo::Firewall::Post/Firewall[998 log all]: Dependency Service[nova-compute] has failures: true Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]: Dependency Service[nova-compute] has failures: true Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Dependency Service[nova-compute] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Dependency Service[nova-compute] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Dependency Service[nova-compute] has failures: true Notice: Applied catalog in 112.79 seconds (truncated, view all with --long) deploy_stderr: | ... Mar 13 19:02:05 overcloud-serviceapi-1.localdomain systemd[1]: Unit openstack-nova-compute.service entered failed state. Mar 13 19:02:05 overcloud-serviceapi-1.localdomain systemd[1]: openstack-nova-compute.service failed. Warning: /Stage[main]/Nova::Deps/Anchor[nova::service::end]: Skipping because of failed dependencies Warning: /Stage[main]/Nova/Exec[networking-refresh]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Firewall::Post/Firewall[998 log all]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Skipping because of failed dependencies Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Skipping because of failed dependencies Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Skipping because of failed dependencies (truncated, view all with --long) overcloud.AllNodesDeploySteps.AllNodesPostUpgradeSteps.ServiceApiDeployment_Step4.2: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 3d95a4d4-bc6c-421c-99a1-75f94e6c9fa0 status: CREATE_FAILED status_reason: | Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 deploy_stdout: | ... Notice: /Stage[main]/Apache/Concat[/etc/httpd/conf/ports.conf]/File[/etc/httpd/conf/ports.conf]/content: content changed '{md5}bce868ac866ac85400659c81cb490b12' to '{md5}14a8ce9b875c82effc34ecffc067926b' Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::config::end]: Triggered 'refresh' from 15 events Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::begin]: Triggered 'refresh' from 1 events Notice: /Stage[main]/Apache::Service/Service[httpd]: Triggered 'refresh' from 4 events Notice: /Stage[main]/Tripleo::Firewall::Post/Firewall[998 log all]: Dependency Service[nova-compute] has failures: true Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]: Dependency Service[nova-compute] has failures: true Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Dependency Service[nova-compute] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Dependency Service[nova-compute] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Dependency Service[nova-compute] has failures: true Notice: Applied catalog in 109.05 seconds (truncated, view all with --long) deploy_stderr: | ... Mar 13 19:02:06 overcloud-serviceapi-2.localdomain systemd[1]: Unit openstack-nova-compute.service entered failed state. Mar 13 19:02:06 overcloud-serviceapi-2.localdomain systemd[1]: openstack-nova-compute.service failed. Warning: /Stage[main]/Nova::Deps/Anchor[nova::service::end]: Skipping because of failed dependencies Warning: /Stage[main]/Nova/Exec[networking-refresh]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Firewall::Post/Firewall[998 log all]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Skipping because of failed dependencies Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Skipping because of failed dependencies Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Skipping because of failed dependencies (truncated, view all with --long) Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-6.0.0-0.20170303152752.0rc1.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10: source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates/ openstack overcloud deploy --templates $THT \ -r ~/openstack_deployment/roles/roles_data.yaml \ -e $THT/environments/network-isolation.yaml \ -e $THT/environments/network-management.yaml \ -e $THT/environments/storage-environment.yaml \ -e $THT/environments/services/ironic.yaml \ -e $THT/environments/tls-endpoints-public-ip.yaml \ -e ~/openstack_deployment/environments/nodes.yaml \ -e ~/openstack_deployment/environments/network-environment.yaml \ -e ~/openstack_deployment/environments/disk-layout.yaml \ -e ~/openstack_deployment/environments/public_vip.yaml \ -e ~/openstack_deployment/environments/enable-tls.yaml \ -e ~/openstack_deployment/environments/inject-trust-anchor.yaml \ -e ~/openstack_deployment/environments/neutron-settings.yaml \ --log-file overcloud_deployment.log &> overcloud_install.log roles_data.yaml: http://paste.openstack.org/show/602634/ 2. Run major-upgrade-composable-steps: openstack overcloud deploy --templates $THT \ -r ~/openstack_deployment/roles/roles_data.yaml \ -e $THT/environments/network-isolation.yaml \ -e $THT/environments/network-management.yaml \ -e $THT/environments/storage-environment.yaml \ -e $THT/environments/services/ironic.yaml \ -e $THT/environments/tls-endpoints-public-ip.yaml \ -e ~/openstack_deployment/environments/nodes.yaml \ -e ~/openstack_deployment/environments/network-environment.yaml \ -e ~/openstack_deployment/environments/disk-layout.yaml \ -e ~/openstack_deployment/environments/public_vip.yaml \ -e ~/openstack_deployment/environments/enable-tls.yaml \ -e ~/openstack_deployment/environments/inject-trust-anchor.yaml \ -e ~/openstack_deployment/environments/neutron-settings.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps.yaml \ -e ~/repo.yaml \ --log-file overcloud_deployment.log &> overcloud_install.log Actual results: This step fails. Expected results: Upgrade proceeds. Additional info: The deployment contains a custom role called ServiceApi which contains all the systemd managed services. We can see that the failure is caused by nova-compute not being able to start on these nodes: Mar 13 19:02:05 overcloud-serviceapi-1.localdomain systemd[1]: Unit openstack-nova-compute.service entered failed state. Mar 13 19:02:05 overcloud-serviceapi-1.localdomain systemd[1]: openstack-nova-compute.service failed. Mar 13 19:02:06 overcloud-serviceapi-2.localdomain systemd[1]: Unit openstack-nova-compute.service entered failed state. Mar 13 19:02:06 overcloud-serviceapi-2.localdomain systemd[1]: openstack-nova-compute.service failed. Nevertheless I logged in to the nodes after the failure and I could see the nova-compute service was started but failures did show up in the logs. Attaching the nova compute log and os-collect-config log from one of the nodes.
*** This bug has been marked as a duplicate of bug 1432879 ***