Bug 1431988 - OSP10 -> OSP11 upgrade fails when Ironic services are enabled on the overcloud and placed on a custom role
Summary: OSP10 -> OSP11 upgrade fails when Ironic services are enabled on the overclou...
Keywords:
Status: CLOSED DUPLICATE of bug 1432879
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ga
: 11.0 (Ocata)
Assignee: Lucas Alvares Gomes
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-14 08:46 UTC by Marius Cornea
Modified: 2017-03-27 13:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-27 13:24:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (367.06 KB, application/x-gzip)
2017-03-14 08:46 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1675732 0 None None None 2017-03-24 11:56:19 UTC
OpenStack gerrit 449587 0 None ABANDONED [Ironic] Do not hard fail when Ironic is not available 2020-03-07 19:59:29 UTC

Description Marius Cornea 2017-03-14 08:46:51 UTC
Created attachment 1262845 [details]
logs

Description of problem:
OSP10 -> OSP11 upgrade fails when Ironic services are enabled on the overcloud and placed on a custom role. The failure occurs during major-upgrade-composable-steps:

stdout: overcloud.AllNodesDeploySteps.AllNodesPostUpgradeSteps.ServiceApiDeployment_Step4.1:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 74406536-d626-4937-a98c-c8f28e8ce15e
  status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
  deploy_stdout: |
    ...
    Notice: /Stage[main]/Apache/Concat[/etc/httpd/conf/ports.conf]/File[/etc/httpd/conf/ports.conf]/content: content changed '{md5}99cf4d0a57605985d8bb74bd78f75b93' to '{md5}c1854096e2bbbe8099d98583abc9b843'
    Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::config::end]: Triggered 'refresh' from 15 events
    Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::begin]: Triggered 'refresh' from 1 events
    Notice: /Stage[main]/Apache::Service/Service[httpd]: Triggered 'refresh' from 4 events
    Notice: /Stage[main]/Tripleo::Firewall::Post/Firewall[998 log all]: Dependency Service[nova-compute] has failures: true
    Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]: Dependency Service[nova-compute] has failures: true
    Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Dependency Service[nova-compute] has failures: true
    Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Dependency Service[nova-compute] has failures: true
    Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Dependency Service[nova-compute] has failures: true
    Notice: Applied catalog in 112.79 seconds
    (truncated, view all with --long)
  deploy_stderr: |
    ...
    Mar 13 19:02:05 overcloud-serviceapi-1.localdomain systemd[1]: Unit openstack-nova-compute.service entered failed state.
    Mar 13 19:02:05 overcloud-serviceapi-1.localdomain systemd[1]: openstack-nova-compute.service failed.
    
    Warning: /Stage[main]/Nova::Deps/Anchor[nova::service::end]: Skipping because of failed dependencies
    Warning: /Stage[main]/Nova/Exec[networking-refresh]: Skipping because of failed dependencies
    Warning: /Stage[main]/Tripleo::Firewall::Post/Firewall[998 log all]: Skipping because of failed dependencies
    Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]: Skipping because of failed dependencies
    Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Skipping because of failed dependencies
    Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Skipping because of failed dependencies
    Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Skipping because of failed dependencies
    (truncated, view all with --long)
overcloud.AllNodesDeploySteps.AllNodesPostUpgradeSteps.ServiceApiDeployment_Step4.2:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 3d95a4d4-bc6c-421c-99a1-75f94e6c9fa0
  status: CREATE_FAILED
  status_reason: |
    Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
  deploy_stdout: |
    ...
    Notice: /Stage[main]/Apache/Concat[/etc/httpd/conf/ports.conf]/File[/etc/httpd/conf/ports.conf]/content: content changed '{md5}bce868ac866ac85400659c81cb490b12' to '{md5}14a8ce9b875c82effc34ecffc067926b'
    Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::config::end]: Triggered 'refresh' from 15 events
    Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::begin]: Triggered 'refresh' from 1 events
    Notice: /Stage[main]/Apache::Service/Service[httpd]: Triggered 'refresh' from 4 events
    Notice: /Stage[main]/Tripleo::Firewall::Post/Firewall[998 log all]: Dependency Service[nova-compute] has failures: true
    Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]: Dependency Service[nova-compute] has failures: true
    Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Dependency Service[nova-compute] has failures: true
    Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Dependency Service[nova-compute] has failures: true
    Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Dependency Service[nova-compute] has failures: true
    Notice: Applied catalog in 109.05 seconds
    (truncated, view all with --long)
  deploy_stderr: |
    ...
    Mar 13 19:02:06 overcloud-serviceapi-2.localdomain systemd[1]: Unit openstack-nova-compute.service entered failed state.
    Mar 13 19:02:06 overcloud-serviceapi-2.localdomain systemd[1]: openstack-nova-compute.service failed.
    
    Warning: /Stage[main]/Nova::Deps/Anchor[nova::service::end]: Skipping because of failed dependencies
    Warning: /Stage[main]/Nova/Exec[networking-refresh]: Skipping because of failed dependencies
    Warning: /Stage[main]/Tripleo::Firewall::Post/Firewall[998 log all]: Skipping because of failed dependencies
    Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]: Skipping because of failed dependencies
    Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Skipping because of failed dependencies
    Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Skipping because of failed dependencies
    Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Skipping because of failed dependencies
    (truncated, view all with --long)
 

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-6.0.0-0.20170303152752.0rc1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10:

source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates/

openstack overcloud deploy --templates $THT \
-r ~/openstack_deployment/roles/roles_data.yaml \
-e $THT/environments/network-isolation.yaml \
-e $THT/environments/network-management.yaml \
-e $THT/environments/storage-environment.yaml \
-e $THT/environments/services/ironic.yaml \
-e $THT/environments/tls-endpoints-public-ip.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/public_vip.yaml \
-e ~/openstack_deployment/environments/enable-tls.yaml \
-e ~/openstack_deployment/environments/inject-trust-anchor.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \
--log-file overcloud_deployment.log &> overcloud_install.log

roles_data.yaml:
http://paste.openstack.org/show/602634/

2. Run major-upgrade-composable-steps:
openstack overcloud deploy --templates $THT \
-r ~/openstack_deployment/roles/roles_data.yaml \
-e $THT/environments/network-isolation.yaml \
-e $THT/environments/network-management.yaml \
-e $THT/environments/storage-environment.yaml \
-e $THT/environments/services/ironic.yaml \
-e $THT/environments/tls-endpoints-public-ip.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/public_vip.yaml \
-e ~/openstack_deployment/environments/enable-tls.yaml \
-e ~/openstack_deployment/environments/inject-trust-anchor.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps.yaml \
-e ~/repo.yaml \
--log-file overcloud_deployment.log &> overcloud_install.log

Actual results:
This step fails.

Expected results:
Upgrade proceeds.

Additional info:

The deployment contains a custom role called ServiceApi which contains all the systemd managed services. We can see that the failure is caused by nova-compute not being able to start on these nodes:

    Mar 13 19:02:05 overcloud-serviceapi-1.localdomain systemd[1]: Unit openstack-nova-compute.service entered failed state.
    Mar 13 19:02:05 overcloud-serviceapi-1.localdomain systemd[1]: openstack-nova-compute.service failed.
    Mar 13 19:02:06 overcloud-serviceapi-2.localdomain systemd[1]: Unit openstack-nova-compute.service entered failed state.
    Mar 13 19:02:06 overcloud-serviceapi-2.localdomain systemd[1]: openstack-nova-compute.service failed.

Nevertheless I logged in to the nodes after the failure and I could see the nova-compute service was started but failures did show up in the logs. Attaching the nova compute log and os-collect-config log from one of the nodes.

Comment 1 Marius Cornea 2017-03-27 13:24:05 UTC

*** This bug has been marked as a duplicate of bug 1432879 ***


Note You need to log in before you can comment on or make changes to this bug.