Bug 1312930 - rhel-osp-director: 8.0 nodes with several disks don't boot with the overcloud image.
rhel-osp-director: 8.0 nodes with several disks don't boot with the overcloud...
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
high Severity unspecified
: ga
: 8.0 (Liberty)
Assigned To: Dan Macpherson
RHOS Documentation Team
: Documentation
Depends On:
  Show dependency treegraph
Reported: 2016-02-29 09:45 EST by Alexander Chuzhoy
Modified: 2016-09-19 21:41 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-09-19 21:41:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Alexander Chuzhoy 2016-02-29 09:45:59 EST
rhel-osp-director: 8.0 deployment fails with "Error: Could not find class ::tripleo::firewall for overcloud-controller-2.localdomain"


Steps to reproduce:
Attempt to deploy an overcloud on BM with:

export THT=/usr/share/openstack-tripleo-heat-templates
openstack overcloud deploy --templates $THT \
-e $THT/environments/storage-environment.yaml \
-e $THT/environments/network-isolation.yaml \
-e /home/stack/network-environment.yaml \
--control-scale 3 \
--ceph-storage-scale 3 \
--compute-scale 2 \
--neutron-disable-tunneling \
--neutron-network-type vlan \
--neutron-network-vlan-ranges tenantvlan:18:43 \
--neutron-bridge-mappings datacentre:br-ex,tenantvlan:br-nic4 \
--ntp-server clock.redhat.com \
--timeout 180

The deployment fails.

[stack@undercloud ~]$ heat resource-list -n5 overcloud|grep -v COMPLE                                                                                                                                             
| resource_name                                | physical_resource_id                          | resource_type                                     | resource_status | updated_time        | stack_name                                                                                                                                      |                                                                                            
| CephStorageAllNodesValidationDeployment      | 6e1fa4d7-e975-4b44-b8df-02bb95998c64          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2016-02-27T04:52:25 | overcloud                                                                                                                                       |                                                                                            
| ComputeNodesPostDeployment                   | 67627839-e962-41aa-be91-6655a8d01158          | OS::TripleO::ComputePostDeployment                | CREATE_FAILED   | 2016-02-27T04:52:26 | overcloud                                                                                                                                       |                                                                                            
| ControllerNodesPostDeployment                | 0fea6772-7a65-4c3b-8fd3-99c1205ca5c4          | OS::TripleO::ControllerPostDeployment             | CREATE_FAILED   | 2016-02-27T04:52:26 | overcloud                                                                                                                                       |                                                                                            
| 0                                            | 33190079-6aa2-4baa-a21b-ab652de4d2f9          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:37:45 | overcloud-CephStorageAllNodesValidationDeployment-g76mbxc4vxw4                                                                                  |                                                                                            
| ComputePuppetDeployment                      | df1f22f5-3986-4781-95a7-aa8e0a775516          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2016-02-27T05:37:57 | overcloud-ComputeNodesPostDeployment-4as4cpmwxywq                                                                                               |                                                                                            
| 1                                            | 77b41b6e-2dbc-4aa2-b73b-97f24d0957bd          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:38:01 | overcloud-ComputeNodesPostDeployment-4as4cpmwxywq-ComputePuppetDeployment-naa47n3rvsma                                                          |                                                                                            
| 0                                            | fbb569e9-fe38-4628-bfca-f0e9aef7b824          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:38:02 | overcloud-ComputeNodesPostDeployment-4as4cpmwxywq-ComputePuppetDeployment-naa47n3rvsma                                                          |                                                                                            
| ControllerLoadBalancerDeployment_Step1       | 2d5010ff-ce1e-40f1-93a0-f04b01afb758          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2016-02-27T05:38:11 | overcloud-ControllerNodesPostDeployment-5by2nnzblmxy                                                                                            |                                                                                            
| 1                                            | c2b012c0-599f-470c-be31-91c3b4d05152          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:39:24 | overcloud-ControllerNodesPostDeployment-5by2nnzblmxy-ControllerLoadBalancerDeployment_Step1-p5peyf3x7sof                                        |                                                                                            
| 0                                            | c81dc204-8249-4bdb-85d2-7d5dbc8a313f          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:39:26 | overcloud-ControllerNodesPostDeployment-5by2nnzblmxy-ControllerLoadBalancerDeployment_Step1-p5peyf3x7sof                                        |                                                                                            
| 2                                            | b8b614c2-deea-4738-a4fc-30ae0eef3383          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2016-02-27T05:39:27 | overcloud-ControllerNodesPostDeployment-5by2nnzblmxy-ControllerLoadBalancerDeployment_Step1-p5peyf3x7sof                                        |                                                                                            
[stack@undercloud ~]$ heat deployment-show b8b614c2-deea-4738-a4fc-30ae0eef3383                                                                                                                                      
  "status": "FAILED",                                                                                                                                                                                                
  "server_id": "1c2f8d4d-e31e-4bbc-bc2f-d71f4c79981e",                                                                                                                                                               
  "config_id": "04d24542-3a6e-488b-ae43-34554e780743",                                                                                                                                                               
  "output_values": {                                                                                                                                                                                                 
    "deploy_stdout": "",                                                                                                                                                                                             
    "deploy_stderr": "Device \"br_ex\" does not exist.\nDevice \"br_nic2\" does not exist.\nDevice \"br_nic4\" does not exist.\nDevice \"ovs_system\" does not exist.\n\u001b[1;31mError: Could not find class ::tripleo::firewall for overcloud-controller-2.localdomain on node overcloud-controller-2.localdomain\u001b[0m\n\u001b[1;31mError: Could not find class ::tripleo::firewall for overcloud-controller-2.localdomain on node overcloud-controller-2.localdomain\u001b[0m\n",                                                                                                                                                                      
    "deploy_status_code": 1                                                                                                                                                                                          
  "creation_time": "2016-02-27T05:39:28",                                                                                                                                                                            
  "updated_time": "2016-02-27T05:40:26",                                                                                                                                                                             
  "input_values": {},                                                                                                                                                                                                
  "action": "CREATE",                                                                                                                                                                                                
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1",                                                                                                                            
  "id": "b8b614c2-deea-4738-a4fc-30ae0eef3383"                                                                                                                                                                       
Comment 4 Steve Baker 2016-03-01 18:57:11 EST
If this is happening in a baremetal environment but not a virtual one, my only suggestion is that maybe the disk on the baremetal still has the image from an older install, or the overcloud image is out of date for some other reason.
Comment 5 Alexander Chuzhoy 2016-03-01 19:18:52 EST
Indeed the opm version differes among OC nodes.

The introspection completed successfully on all.
The assigned deploy image is the same on all in ironic db.

So, apparently some nodes still booted with the old image.
Comment 6 Shinobu KINJO 2016-03-02 02:15:17 EST
(In reply to Steve Baker from comment #4)
> If this is happening in a baremetal environment but not a virtual one, my
> only suggestion is that maybe the disk on the baremetal still has the image
> from an older install, or the overcloud image is out of date for some other
> reason.


Where does your suggestion some from? If you would elaborate on this more, it would be much appreciated.

I'm just quite interested in the difference of the behaviour between BMs and VMs.
Comment 7 Steve Baker 2016-03-02 14:58:45 EST
Baremetal has a real disk which the OS image gets copied to on each run, whereas a VM has a virtual disk which always starts out emtpy.

If the image copying fails, the baremetal risks booting a previously copied image, whereas the VM won't boot at all. Also image copying is more likely to fail on baremetal since there are any number of storage configurations which haven't had as much testing as the simple single disk you would get with a VM.
Comment 8 James Slagle 2016-03-03 11:46:14 EST
sasha, i think you're working with lucas on this one. so i'm assigning it to him to see if there's a BM provisioning issue here.
Comment 9 Lucas Alvares Gomes 2016-03-03 12:10:44 EST
Ironic does have a mechanism called "cleaning" which can erase the disks prior to the node become available to nova and after the instance is teared down. 

Unfortunately cleaning only arrived to the iSCSI drivers (pxe_*) in Mitaka [0], and as it's a feature it wasn't backported to stable/liberty upstream.

[0] https://review.openstack.org/#/c/220898/
Comment 10 Alexander Chuzhoy 2016-03-03 12:13:05 EST
The issue reproduces on nodes with more than 1 disk (4 disks).
Comment 11 Alexander Chuzhoy 2016-03-03 18:37:41 EST
So I initialized all disks on nodes with multiple disks and upon re-deployment these nodes failed to boot due to missing bootloader, suggesting to chose another boot method.
Comment 12 Alexander Chuzhoy 2016-03-04 16:22:51 EST
To overcome the issue:
1. Add a property to the node specifying the disk size:
   ironic node-update <node_ID> add properties/root_device='{"size": <size>}'

  A node’s ‘local_gb’ property is often set to a value 1 GiB less than the actual disk size to account for partitioning.
  However, in this case size should be the actual size. For example, for a 128 GiB disk local_gb will be 127, but size hint will be 128.

2. In BIOS or Controller congiuration menu - select the same disk to do the boot from.

Comment 13 Dan Macpherson 2016-03-07 10:43:44 EST
Related to this issue with Ceph:
Comment 15 Alexander Chuzhoy 2016-03-24 19:04:32 EDT
I reproduce it now with 7.3 too.
The weird part is that I use the same setup where I used to deploy 7.3 without any issue.
Comment 18 Alexander Chuzhoy 2016-05-24 16:37:13 EDT
Hi Dan,
export IRONIC_DISCOVERD_PASSWORD=`sudo grep admin_password /etc/ironic-inspector/inspector.conf | egrep -v '^#'  | awk '{print $NF}'`
export IRONIC_DISCOVERD_PASSWORD=`sudo grep admin_password /etc/ironic-inspector/inspector.conf | awk '! /^#/ {print $NF}'`

Similarly, instead of:
for node in $(ironic node-list | grep -v UUID| awk '{print $2}');

for node in $(ironic node-list |awk '!/UUID/ {print $2}'); 

To add a note to configure the BIOS to include boot from the respective disk or controller.
Comment 20 Dan Macpherson 2016-08-19 00:57:44 EDT
Sasha, the feedback from comment #18 is now live:


How does it look? Anything further we should add/modify?
Comment 21 Alexander Chuzhoy 2016-09-19 16:19:39 EDT
Looks good.
Comment 22 Dan Macpherson 2016-09-19 21:41:14 EDT
Cool. Closing.

Note You need to log in before you can comment on or make changes to this bug.