Bug 1226097 - rhel-osp-director: The overcloud deployment times out.
Summary: rhel-osp-director: The overcloud deployment times out.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: unspecified
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ga
: Director
Assignee: Lucas Alvares Gomes
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-29 02:13 UTC by Alexander Chuzhoy
Modified: 2015-08-05 13:52 UTC (History)
8 users (show)

Fixed In Version: overcloud-full-7.0-18
Doc Type: Bug Fix
Doc Text:
The grub configuration set the kernel parameters to redirect the console to a serial port that might not be present. As a result, the node failed to boot. This fix disables console redirection to the serial port by default. The node now boots successfully.
Clone Of:
Environment:
Last Closed: 2015-08-05 13:52:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2015:1549 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 17:49:10 UTC

Description Alexander Chuzhoy 2015-05-29 02:13:12 UTC
rhel-osp-director: The overcloud deployment times out.
Environment:
instack-undercloud-2.1.0-3.el7ost.noarch

Steps to reproduce:
Attempt to deploy overcloud with "instack-deploy-overcloud --tuskar".

Result:
The deployment times out:
tuskar_templates/net-config-noop.yaml
tuskar_templates/puppet/ceph-cluster-config.yaml
tuskar_templates/extraconfig/post_deploy/default.yaml
+ OVERCLOUD_YAML_PATH=tuskar_templates/plan.yaml
+ ENVIRONMENT_YAML_PATH=tuskar_templates/environment.yaml
+ '[' ''\'''\''' = satellite ']'
+ heat stack-create -t 240 -f tuskar_templates/plan.yaml -e tuskar_templates/environment.yaml overcloud
+--------------------------------------+------------+--------------------+----------------------+
| id                                   | stack_name | stack_status       | creation_time        |
+--------------------------------------+------------+--------------------+----------------------+
| 1a8dd9ee-fc87-4fc6-935a-d436ff5de9e3 | overcloud  | CREATE_IN_PROGRESS | 2015-05-28T22:22:31Z |
+--------------------------------------+------------+--------------------+----------------------+
+ tripleo wait_for_stack_ready 220 10 overcloud
Timing out after 2200 seconds:
COMMAND=heat stack-show overcloud | awk '/stack_status / { print $4 }'
OUTPUT=CREATE_IN_PROGRESS


I can't collect logs from the undercloud controller/compute as the public key to login as heat-admin doesn't work.


Expected result:
The overcloud deployment should complete.

Comment 4 Jan Provaznik 2015-05-29 14:52:41 UTC
I'm hitting same issue when using prebuilt images:
http://download.devel.redhat.com/brewroot/work/tasks/5732/9275732/overcloud-full.tar

ssh to OC nodes doesn't work because cloud-init failed, here is a suspect part from /var/log/messages from OC controller node:
May 29 08:42:22 localhost systemd: Started D-Bus System Message Bus.
May 29 08:42:22 localhost systemd: cloud-init-local.service: main process exited, code=exited, status=209/STDOUT
May 29 08:42:22 localhost systemd: Failed to start Initial cloud-init job (pre-networking).
May 29 08:42:22 localhost systemd: Dependency failed for Cloud-config availability.
May 29 08:42:22 localhost systemd: Dependency failed for Execute cloud user/final scripts.
May 29 08:42:22 localhost systemd: 
May 29 08:42:22 localhost systemd: Dependency failed for Apply the settings specified in cloud-config.
May 29 08:42:22 localhost systemd: 
May 29 08:42:22 localhost systemd: 
May 29 08:42:22 localhost systemd: Unit cloud-init-local.service entered failed state.

Comment 5 Jan Provaznik 2015-05-29 15:03:55 UTC
Related BZ can be found here:
https://bugzilla.redhat.com/show_bug.cgi?id=974285

If I compare etc/default/grub on an older VM which works with the new which fails, I can see that:
GRUB_CMDLINE_LINUX="crashkernel=auto console=tty0 no_timer_check net.ifnames=0 console=ttyS0,115200n8"

is now:

GRUB_CMDLINE_LINUX="crashkernel=auto console=tty0 console=ttyS0,115200 rhgb quiet"

I'm not saying this causes the problem but https://bugzilla.redhat.com/show_bug.cgi?id=974285#c2 suggests it might be issue related to console setting, so it is worth investigating further this path.

Comment 6 Ben Nemec 2015-05-29 19:45:03 UTC
This can be worked around by disabling localboot, which results in a kernel cmdline of:

[root@ov-xz5nsxuty5-0-wnvnwultiimx-novacompute-t4ezwe664dah ~]# cat /proc/cmdline
root=UUID=79711e89-b049-4d78-bd01-2b4f5042af08 ro text nofb nomodeset vga=normal

I also confirmed that just manually removing all of the console params from the grub cmdline at boot fixes the problem.  This resulted in the following cmdline:

[root@ov-bh6esbx7vo-0-7gcluaz4jocg-novacompute-cmi5uggriiux heat-admin]# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-229.4.2.el7.x86_64 root=UUID=79711e89-b049-4d78-bd01-2b4f5042af08 ro crashkernel=auto

I don't know what implications that would have for baremetal though.

Comment 7 Jan Provaznik 2015-06-02 12:06:08 UTC
As Ben wrote, updating CMDLINE in /etc/default/grub fixes the issue, in particular removing "console=ttyS0,115200" from the line.

It seems that content of /etc/default/grub depends (is generated) on the host where overcloud images are being built. So this file can look like different on different machines.

In tripleo a "vm" element is used when building VM images to make sure that working params are set:
https://github.com/openstack/diskimage-builder/blob/master/elements/vm/finalise.d/51-bootloader#L146

But I can't confirm these params work for baremetal too. After discussing this with ironic folks I re-assigned the BZ to Lucas because this is ironic-related (localboot option).

Comment 8 chris alfonso 2015-06-19 17:04:03 UTC
Can you please retest this now? Also, is this only on virtual machines? If so, I don't think this would be a blocker. Can you confirm?

Comment 9 Alexander Chuzhoy 2015-06-19 17:14:45 UTC
The issue didn't reproduce for me on the last puddle.
Also, the command to deploy the overcloud has changed to:
openstack overcloud postconfig "[Overcloud IP]"

Comment 11 Alexander Chuzhoy 2015-06-19 20:45:56 UTC
Verified:
Environment:
instack-undercloud-2.1.2-1.el7ost.noarch

Resolving based on comment #9.

Comment 13 errata-xmlrpc 2015-08-05 13:52:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549


Note You need to log in before you can comment on or make changes to this bug.