Description of problem: Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Provision a test CF appliance with a reasonable amount of memory (12000mb) 2. Enable all roles 3. Find /var/log/tower/setup-<timestamp>.log shows it failed to install "This machine does not have sufficient RAM to run Ansible Tower." 4. Find no errors in CF 5. Give appliance more memory, disable and re-enable role Actual results: See in evm.log "ERROR -- : AwesomeSpawn: Failed to start postgresql-9.4.service: Unit not found." Expected results: Any/all of: * Failure of install to be shown in UI * Failure of install to preent attempting to start * Failed install being dis/re-enabled triggers reinstall
To repair a wedged CF, I've figured out a way to restart this process: * Disable the Ansible role (and perhaps a few others to reduce memory usage) * systemctl restart evmserverd #to be sure memory is freed up * rm /etc/tower/SECRET_KEY * vmdb; bin/rails c; > MiqDatabase.first.ansible_secret_key = nil; MiqDatabase.first.save * enable Ansible role * check /var/log/tower/setup-<timestamp>.log for status
This issue is very similar to the findings from https://bugzilla.redhat.com/show_bug.cgi?id=1439783 I'll leave this open as it's a more clear way to test the fix. https://github.com/ManageIQ/manageiq/pull/15313
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/42eb2f8deefcf4b5390d7ac31dbaa195f289afcf commit 42eb2f8deefcf4b5390d7ac31dbaa195f289afcf Author: Nick Carboni <ncarboni> AuthorDate: Mon Jun 5 17:49:01 2017 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Tue Jun 6 10:09:27 2017 -0400 Handle additional case for /etc/tower/SECRET_KEY Previously if we had a value in the database, but the file didn't exist on the filesystem .configured? would raise an error when it should really just return false and we will write out the value from the database to the filesystem. https://bugzilla.redhat.com/show_bug.cgi?id=1439783 https://bugzilla.redhat.com/show_bug.cgi?id=1458886 lib/embedded_ansible.rb | 1 + spec/lib/embedded_ansible_spec.rb | 6 ++++++ 2 files changed, 7 insertions(+)
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/f80e6dd559994125bce144f1ddcd6753324dcc69 commit f80e6dd559994125bce144f1ddcd6753324dcc69 Author: Nick Carboni <ncarboni> AuthorDate: Tue Jun 6 08:55:59 2017 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Tue Jun 6 10:09:31 2017 -0400 Remove the secret key from the database when the setup fails This will force `.configured?` to false the next time `.start` is run allowing us to retry the configuration. Before this change, users would have to blank the SECRET_KEY file on the filesystem to force a retry. https://bugzilla.redhat.com/show_bug.cgi?id=1439783 https://bugzilla.redhat.com/show_bug.cgi?id=1458886 lib/embedded_ansible.rb | 4 ++++ spec/lib/embedded_ansible_spec.rb | 12 +++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-)
Verified in 5.9.0.5