Description of problem: Was doing a RHV(self hosted, 1 H) + CFME deployment. The deployment of CFME failed with the following error on the CFEM deployment task: Failed to update root password on appliance. Retries exhausted. Error: execution expired I tried to ping the CFME VM, but it would not respond. I then went into the RHV engine's UI and saw that the VM existed, so I connected to its console, and found that it was in emergency mode. Unfortunately I hit CTRL-D by accident and then it rebooted. Version-Release number of selected component (if applicable): QCI-1.1-RHEL-7-20170130.t.0 How reproducible: First time seeing this. Steps to Reproduce: 1. Do a RHV(self-hosted, 1H) + CFME deployment. Actual results: Failed to deploy CFME. Expected results: CFME deployed fine.
Can you run journalctl and try to determine why it went into emergency mode? You should be able to use -b 1 to look at the first boot (or -b -1 to look at the previous boot if you've only rebooted once to look at the prior boot). Without more info there isn't much we can do to troubleshoot this.
Unfortunately, I could not get logged into the system to do the journalctl. At this point I've already recycled the nodes and am trying to recreate this bug. I'll be sure to try to get the journalctl output if it re-occurs.
When I did this deployment again I got the following error: Failed to update admin password on appliance. Error message: Request Timeout Note its not saying it couldn't update the admin password. This time I can login as root, but I can't login to the console.
I did notice that this is in the production.log file multiple times: 2017-01-31 10:01:00 [app] [F] | ActionView::MissingTemplate (Missing template common/500 with {:locale=>[:en], :formats=>[:json], :variants=>[], :handlers=>[:erb, :builder, :raw, :ruby, :rabl]}. Searched in: | * "/usr/share/foreman/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/redhat_access-1.0.13/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/fusor_ui-1.1.20/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/fusor_server-1.1.34/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.90/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.7.14.11/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_discovery-5.0.0.9/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/bastion-3.2.0.10/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_bootdisk-6.1.0.3/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_docker-2.0.1.11/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_theme_satellite-0.1.36/app/views" | * "/opt/theforeman/tfm/root/usr/share/gems/gems/apipie-rails-0.3.6/app/views" | ): | app/controllers/application_controller.rb:295:in `generic_exception' | lib/middleware/catch_json_parse_errors.rb:9:in `call' | |
In the deployment.log there is this message over and over at the end: D, [2017-01-31T12:49:19.249164 #19861] DEBUG -- : ================ CloudForms::WaitForConsole poll_external_task method ==================== D, [2017-01-31T12:49:19.250946 #19861] DEBUG -- : ================ CloudForms::WaitForConsole is_up method ==================== W, [2017-01-31T12:51:21.874424 #19861] WARN -- : Net::ReadTimeout
I was able to successfully deploy this scenario with the same compose. However, one thing that I have done differently, was to place the Satellite VM's libvirt image file on a separate physical disk. That is, I have spawn the Satellite's VM image on one physical disk, and spawn the hypervisor VM's image on another physical disk mounted within the same host. Also, please note that I've used the Satellite VM's NFS exported folders for storage. The reason why I've separated the Satellite VM's libvirt image file from the rest of the VM's is that, We've seen intermittent deployment failures for scenarios configured with MORE than just RHV (i.e. RHV+CFME, RHV+OCP, RHV+CFME+OCP), because those seem to cause high I/O disk loads. Separating the Satellite VM image from other VM images to it's own physical disk, seem to resolve those intermittent failures.
James, we believe the issue you saw is caused from the environment lacking sufficient resources.
Verified on QCI-1.1-RHEL-7-20170202.t.2. Deploying on nested virt with insufficient cpu resources in the rhv host to cover the host os + cfme can result in the cfme vm failing over to emergency mode.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:0335