Bug 1312930 - rhel-osp-director: 8.0 nodes with several disks don't boot with the overcloud image.
Summary: rhel-osp-director: 8.0 nodes with several disks don't boot with the overcloud...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Dan Macpherson
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-29 14:45 UTC by Alexander Chuzhoy
Modified: 2016-09-20 01:41 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-20 01:41:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1282897 0 high CLOSED [Docs] [Director] Include disk mapping for Ceph Deployments 2021-02-22 00:41:40 UTC

Internal Links: 1282897

Description Alexander Chuzhoy 2016-02-29 14:45:59 UTC
rhel-osp-director: 8.0 deployment fails with "Error: Could not find class ::tripleo::firewall for overcloud-controller-2.localdomain"


Environment:
openstack-tripleo-heat-templates-0.8.8-2.el7ost.noarch
instack-undercloud-2.2.3-1.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.2-3.el7ost.noarch
openstack-puppet-modules-7.0.9-1.el7ost.noarch


Steps to reproduce:
Attempt to deploy an overcloud on BM with:

export THT=/usr/share/openstack-tripleo-heat-templates
openstack overcloud deploy --templates $THT \
-e $THT/environments/storage-environment.yaml \
-e $THT/environments/network-isolation.yaml \
-e /home/stack/network-environment.yaml \
--control-scale 3 \
--ceph-storage-scale 3 \
--compute-scale 2 \
--neutron-disable-tunneling \
--neutron-network-type vlan \
--neutron-network-vlan-ranges tenantvlan:18:43 \
--neutron-bridge-mappings datacentre:br-ex,tenantvlan:br-nic4 \
--ntp-server clock.redhat.com \
--timeout 180



Result:
The deployment fails.


[stack@undercloud ~]$ heat resource-list -n5 overcloud|grep -v COMPLE                                                                                                                                             
+----------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+                                                                                            
| resource_name                                | physical_resource_id                          | resource_type                                     | resource_status | updated_time        | stack_name                                                                                                                                      |                                                                                            
+----------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+                                                                                            
| CephStorageAllNodesValidationDeployment      | 6e1fa4d7-e975-4b44-b8df-02bb95998c64          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2016-02-27T04:52:25 | overcloud                                                                                                                                       |                                                                                            
| ComputeNodesPostDeployment                   | 67627839-e962-41aa-be91-6655a8d01158          | OS::TripleO::ComputePostDeployment                | CREATE_FAILED   | 2016-02-27T04:52:26 | overcloud                                                                                                                                       |                                                                                            
| ControllerNodesPostDeployment                | 0fea6772-7a65-4c3b-8fd3-99c1205ca5c4          | OS::TripleO::ControllerPostDeployment             | CREATE_FAILED   | 2016-02-27T04:52:26 | overcloud                                                                                                                                       |                                                                                            
| 0                                            | 33190079-6aa2-4baa-a21b-ab652de4d2f9          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:37:45 | overcloud-CephStorageAllNodesValidationDeployment-g76mbxc4vxw4                                                                                  |                                                                                            
| ComputePuppetDeployment                      | df1f22f5-3986-4781-95a7-aa8e0a775516          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2016-02-27T05:37:57 | overcloud-ComputeNodesPostDeployment-4as4cpmwxywq                                                                                               |                                                                                            
| 1                                            | 77b41b6e-2dbc-4aa2-b73b-97f24d0957bd          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:38:01 | overcloud-ComputeNodesPostDeployment-4as4cpmwxywq-ComputePuppetDeployment-naa47n3rvsma                                                          |                                                                                            
| 0                                            | fbb569e9-fe38-4628-bfca-f0e9aef7b824          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:38:02 | overcloud-ComputeNodesPostDeployment-4as4cpmwxywq-ComputePuppetDeployment-naa47n3rvsma                                                          |                                                                                            
| ControllerLoadBalancerDeployment_Step1       | 2d5010ff-ce1e-40f1-93a0-f04b01afb758          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2016-02-27T05:38:11 | overcloud-ControllerNodesPostDeployment-5by2nnzblmxy                                                                                            |                                                                                            
| 1                                            | c2b012c0-599f-470c-be31-91c3b4d05152          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:39:24 | overcloud-ControllerNodesPostDeployment-5by2nnzblmxy-ControllerLoadBalancerDeployment_Step1-p5peyf3x7sof                                        |                                                                                            
| 0                                            | c81dc204-8249-4bdb-85d2-7d5dbc8a313f          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:39:26 | overcloud-ControllerNodesPostDeployment-5by2nnzblmxy-ControllerLoadBalancerDeployment_Step1-p5peyf3x7sof                                        |                                                                                            
| 2                                            | b8b614c2-deea-4738-a4fc-30ae0eef3383          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:39:27 | overcloud-ControllerNodesPostDeployment-5by2nnzblmxy-ControllerLoadBalancerDeployment_Step1-p5peyf3x7sof                                        |                                                                                            
+----------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+                                                                                            
[stack@undercloud ~]$ heat deployment-show b8b614c2-deea-4738-a4fc-30ae0eef3383                                                                                                                                      
{                                                                                                                                                                                                                    
  "status": "FAILED",                                                                                                                                                                                                
  "server_id": "1c2f8d4d-e31e-4bbc-bc2f-d71f4c79981e",                                                                                                                                                               
  "config_id": "04d24542-3a6e-488b-ae43-34554e780743",                                                                                                                                                               
  "output_values": {                                                                                                                                                                                                 
    "deploy_stdout": "",                                                                                                                                                                                             
    "deploy_stderr": "Device \"br_ex\" does not exist.\nDevice \"br_nic2\" does not exist.\nDevice \"br_nic4\" does not exist.\nDevice \"ovs_system\" does not exist.\n\u001b[1;31mError: Could not find class ::tripleo::firewall for overcloud-controller-2.localdomain on node overcloud-controller-2.localdomain\u001b[0m\n\u001b[1;31mError: Could not find class ::tripleo::firewall for overcloud-controller-2.localdomain on node overcloud-controller-2.localdomain\u001b[0m\n",                                                                                                                                                                      
    "deploy_status_code": 1                                                                                                                                                                                          
  },                                                                                                                                                                                                                 
  "creation_time": "2016-02-27T05:39:28",                                                                                                                                                                            
  "updated_time": "2016-02-27T05:40:26",                                                                                                                                                                             
  "input_values": {},                                                                                                                                                                                                
  "action": "CREATE",                                                                                                                                                                                                
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1",                                                                                                                            
  "id": "b8b614c2-deea-4738-a4fc-30ae0eef3383"                                                                                                                                                                       
}

Comment 4 Steve Baker 2016-03-01 23:57:11 UTC
If this is happening in a baremetal environment but not a virtual one, my only suggestion is that maybe the disk on the baremetal still has the image from an older install, or the overcloud image is out of date for some other reason.

Comment 5 Alexander Chuzhoy 2016-03-02 00:18:52 UTC
Indeed the opm version differes among OC nodes.

The introspection completed successfully on all.
The assigned deploy image is the same on all in ironic db.

So, apparently some nodes still booted with the old image.

Comment 6 Shinobu KINJO 2016-03-02 07:15:17 UTC
(In reply to Steve Baker from comment #4)
> If this is happening in a baremetal environment but not a virtual one, my
> only suggestion is that maybe the disk on the baremetal still has the image
> from an older install, or the overcloud image is out of date for some other
> reason.

@Steve,

Where does your suggestion some from? If you would elaborate on this more, it would be much appreciated.

I'm just quite interested in the difference of the behaviour between BMs and VMs.

Comment 7 Steve Baker 2016-03-02 19:58:45 UTC
Baremetal has a real disk which the OS image gets copied to on each run, whereas a VM has a virtual disk which always starts out emtpy.

If the image copying fails, the baremetal risks booting a previously copied image, whereas the VM won't boot at all. Also image copying is more likely to fail on baremetal since there are any number of storage configurations which haven't had as much testing as the simple single disk you would get with a VM.

Comment 8 James Slagle 2016-03-03 16:46:14 UTC
sasha, i think you're working with lucas on this one. so i'm assigning it to him to see if there's a BM provisioning issue here.

Comment 9 Lucas Alvares Gomes 2016-03-03 17:10:44 UTC
Ironic does have a mechanism called "cleaning" which can erase the disks prior to the node become available to nova and after the instance is teared down. 

Unfortunately cleaning only arrived to the iSCSI drivers (pxe_*) in Mitaka [0], and as it's a feature it wasn't backported to stable/liberty upstream.

[0] https://review.openstack.org/#/c/220898/

Comment 10 Alexander Chuzhoy 2016-03-03 17:13:05 UTC
The issue reproduces on nodes with more than 1 disk (4 disks).

Comment 11 Alexander Chuzhoy 2016-03-03 23:37:41 UTC
So I initialized all disks on nodes with multiple disks and upon re-deployment these nodes failed to boot due to missing bootloader, suggesting to chose another boot method.

Comment 12 Alexander Chuzhoy 2016-03-04 21:22:51 UTC
To overcome the issue:
1. Add a property to the node specifying the disk size:
   ironic node-update <node_ID> add properties/root_device='{"size": <size>}'

  Note:
  A node’s ‘local_gb’ property is often set to a value 1 GiB less than the actual disk size to account for partitioning.
  However, in this case size should be the actual size. For example, for a 128 GiB disk local_gb will be 127, but size hint will be 128.

2. In BIOS or Controller congiuration menu - select the same disk to do the boot from.

http://docs.openstack.org/developer/ironic/deploy/install-guide.html?highlight=wwn#specifying-the-disk-for-deployment

Comment 13 Dan Macpherson 2016-03-07 15:43:44 UTC
Related to this issue with Ceph:
https://bugzilla.redhat.com/show_bug.cgi?id=1282897

Comment 15 Alexander Chuzhoy 2016-03-24 23:04:32 UTC
I reproduce it now with 7.3 too.
The weird part is that I use the same setup where I used to deploy 7.3 without any issue.

Comment 18 Alexander Chuzhoy 2016-05-24 20:37:13 UTC
Hi Dan,
Suggestions:
1)
Instead 
export IRONIC_DISCOVERD_PASSWORD=`sudo grep admin_password /etc/ironic-inspector/inspector.conf | egrep -v '^#'  | awk '{print $NF}'`
export IRONIC_DISCOVERD_PASSWORD=`sudo grep admin_password /etc/ironic-inspector/inspector.conf | awk '! /^#/ {print $NF}'`

2)
Similarly, instead of:
for node in $(ironic node-list | grep -v UUID| awk '{print $2}');

for node in $(ironic node-list |awk '!/UUID/ {print $2}'); 

3)
To add a note to configure the BIOS to include boot from the respective disk or controller.

Comment 20 Dan Macpherson 2016-08-19 04:57:44 UTC
Sasha, the feedback from comment #18 is now live:

https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/director-installation-and-usage/#sect-Defining_the_Root_Disk_for_Nodes

How does it look? Anything further we should add/modify?

Comment 21 Alexander Chuzhoy 2016-09-19 20:19:39 UTC
Verified:
Looks good.
Thanks.

Comment 22 Dan Macpherson 2016-09-20 01:41:14 UTC
Cool. Closing.


Note You need to log in before you can comment on or make changes to this bug.