rhel-osp-director: The overcloud deployment times out. Environment: instack-undercloud-2.1.0-3.el7ost.noarch Steps to reproduce: Attempt to deploy overcloud with "instack-deploy-overcloud --tuskar". Result: The deployment times out: tuskar_templates/net-config-noop.yaml tuskar_templates/puppet/ceph-cluster-config.yaml tuskar_templates/extraconfig/post_deploy/default.yaml + OVERCLOUD_YAML_PATH=tuskar_templates/plan.yaml + ENVIRONMENT_YAML_PATH=tuskar_templates/environment.yaml + '[' ''\'''\''' = satellite ']' + heat stack-create -t 240 -f tuskar_templates/plan.yaml -e tuskar_templates/environment.yaml overcloud +--------------------------------------+------------+--------------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+--------------------+----------------------+ | 1a8dd9ee-fc87-4fc6-935a-d436ff5de9e3 | overcloud | CREATE_IN_PROGRESS | 2015-05-28T22:22:31Z | +--------------------------------------+------------+--------------------+----------------------+ + tripleo wait_for_stack_ready 220 10 overcloud Timing out after 2200 seconds: COMMAND=heat stack-show overcloud | awk '/stack_status / { print $4 }' OUTPUT=CREATE_IN_PROGRESS I can't collect logs from the undercloud controller/compute as the public key to login as heat-admin doesn't work. Expected result: The overcloud deployment should complete.
I'm hitting same issue when using prebuilt images: http://download.devel.redhat.com/brewroot/work/tasks/5732/9275732/overcloud-full.tar ssh to OC nodes doesn't work because cloud-init failed, here is a suspect part from /var/log/messages from OC controller node: May 29 08:42:22 localhost systemd: Started D-Bus System Message Bus. May 29 08:42:22 localhost systemd: cloud-init-local.service: main process exited, code=exited, status=209/STDOUT May 29 08:42:22 localhost systemd: Failed to start Initial cloud-init job (pre-networking). May 29 08:42:22 localhost systemd: Dependency failed for Cloud-config availability. May 29 08:42:22 localhost systemd: Dependency failed for Execute cloud user/final scripts. May 29 08:42:22 localhost systemd: May 29 08:42:22 localhost systemd: Dependency failed for Apply the settings specified in cloud-config. May 29 08:42:22 localhost systemd: May 29 08:42:22 localhost systemd: May 29 08:42:22 localhost systemd: Unit cloud-init-local.service entered failed state.
Related BZ can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=974285 If I compare etc/default/grub on an older VM which works with the new which fails, I can see that: GRUB_CMDLINE_LINUX="crashkernel=auto console=tty0 no_timer_check net.ifnames=0 console=ttyS0,115200n8" is now: GRUB_CMDLINE_LINUX="crashkernel=auto console=tty0 console=ttyS0,115200 rhgb quiet" I'm not saying this causes the problem but https://bugzilla.redhat.com/show_bug.cgi?id=974285#c2 suggests it might be issue related to console setting, so it is worth investigating further this path.
This can be worked around by disabling localboot, which results in a kernel cmdline of: [root@ov-xz5nsxuty5-0-wnvnwultiimx-novacompute-t4ezwe664dah ~]# cat /proc/cmdline root=UUID=79711e89-b049-4d78-bd01-2b4f5042af08 ro text nofb nomodeset vga=normal I also confirmed that just manually removing all of the console params from the grub cmdline at boot fixes the problem. This resulted in the following cmdline: [root@ov-bh6esbx7vo-0-7gcluaz4jocg-novacompute-cmi5uggriiux heat-admin]# cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.10.0-229.4.2.el7.x86_64 root=UUID=79711e89-b049-4d78-bd01-2b4f5042af08 ro crashkernel=auto I don't know what implications that would have for baremetal though.
As Ben wrote, updating CMDLINE in /etc/default/grub fixes the issue, in particular removing "console=ttyS0,115200" from the line. It seems that content of /etc/default/grub depends (is generated) on the host where overcloud images are being built. So this file can look like different on different machines. In tripleo a "vm" element is used when building VM images to make sure that working params are set: https://github.com/openstack/diskimage-builder/blob/master/elements/vm/finalise.d/51-bootloader#L146 But I can't confirm these params work for baremetal too. After discussing this with ironic folks I re-assigned the BZ to Lucas because this is ironic-related (localboot option).
Can you please retest this now? Also, is this only on virtual machines? If so, I don't think this would be a blocker. Can you confirm?
The issue didn't reproduce for me on the last puddle. Also, the command to deploy the overcloud has changed to: openstack overcloud postconfig "[Overcloud IP]"
Verified: Environment: instack-undercloud-2.1.2-1.el7ost.noarch Resolving based on comment #9.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549