Bug 1575784 - FFU: ceph upgrade fails during tripleo.access.v1.enable_ssh_admin workflow on environments with predictable IPs and node placement
Summary: FFU: ceph upgrade fails during tripleo.access.v1.enable_ssh_admin workflow on...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-07 23:50 UTC by Marius Cornea
Modified: 2023-02-22 23:02 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-08 19:03:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marius Cornea 2018-05-07 23:50:45 UTC
Description of problem:

FFU: ceph upgrade fails during tripleo.access.v1.enable_ssh_admin workflow on environments with predictable IPs and node placement. We can see in the mistral engine log that create_admin_via_nova task failed. 


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.2-11.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 with 3 controllers + 2 computes + 3 ceph OSD nodes with predictable IPs and node placement/custom names

2. Run the fast forward upgrade procedure until the ceph upgrade step:

openstack overcloud ceph-upgrade run  --templates /usr/share/openstack-tripleo-heat-templates \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-management.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e ~/openstack_deployment/environments/enable-cpu-pinning.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/public_vip.yaml \
-e ~/openstack_deployment/environments/enable-tls.yaml \
-e ~/openstack_deployment/environments/inject-trust-anchor.yaml \
-e ~/openstack_deployment/environments/scheduler_hints_env.yaml \
-e ~/openstack_deployment/environments/ips-from-pool-all.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \
-e ~/openstack_deployment/environments/custom_hiera.yaml \
-e /home/stack/ceph-ansible-env.yaml \
--container-registry-file docker-images.yaml \
--ceph-ansible-playbook '/usr/share/ceph-ansible/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml,/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml'


Actual results:

2018-05-07 22:44:50Z [overcloud-AllNodesDeploySteps-agsulepbjlef.WorkflowTasks_Step2_Execution]: UPDATE_FAILED  resources.WorkflowTasks_Step2_Execution: ERROR
2018-05-07 22:44:50Z [overcloud-AllNodesDeploySteps-agsulepbjlef]: UPDATE_FAILED  Resource UPDATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2018-05-07 22:44:50Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.AllNodesDeploySteps: Resource UPDATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2018-05-07 22:44:50Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.AllNodesDeploySteps: Resource UPDATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack overcloud UPDATE_FAILED 

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::TripleO::WorkflowSteps
  physical_resource_id: e9cc04e0-3d8a-4479-9b34-a547961cc17d
  status: UPDATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR
Heat Stack update failed.
Heat Stack update failed.


Expected results:
No failure.

Additional info:

Attaching the sosreport.

Comment 2 Marius Cornea 2018-05-08 19:03:23 UTC
Closing this as not a bug since it was caused by the overcloud nodes not being able to validate the SSL connection to the undercloud public endpoint.


Note You need to log in before you can comment on or make changes to this bug.