Bug 1302868

Summary: Silent crash when creating Database
Product: Red Hat CloudForms Management Engine Reporter: luke couzens <lcouzens>
Component: ApplianceAssignee: Joe Vlcek <jvlcek>
Status: CLOSED ERRATA QA Contact: luke couzens <lcouzens>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.5.0CC: abellott, jhardy, jprause, lcouzens, obarenbo, psavage
Target Milestone: GA   
Target Release: 5.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.6.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-29 15:34:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description luke couzens 2016-01-28 20:15:16 UTC
Description of problem: Using appliance_console_cli command + options crashes without configuring or setting up database. If you don't use postgres with the --username option the database plus settings wont be configured. It will say configuring database and then return to the cli, but checking the log files it has errors for not completing. (appliance_console.log) 

Version-Release number of selected component (if applicable):5.5.2.2


How reproducible:100%


Steps to Reproduce:
1.connect to an appliance without database set up
2.setup database with appliance_console_cli command
3.check if database has been created
4.check appliance_console.log

Actual results: returns configuring database but silently crashes


Expected results: crash with reasons


Additional info: 

command used to error:

appliance_console_cli --region 11 --internal --username admin --password test
--force-key

command used to create database:

appliance_console_cli --region 11 --internal --username postgress --password test --force-key

Comment 2 Shveta 2016-02-01 21:30:26 UTC
Assigning to add test case

Comment 3 Joe Rafaniello 2016-02-04 15:37:28 UTC
Hi Luke,

Is postgress a typo? (two s's)

"appliance_console_cli --region 11 --internal --username postgress --password test --force-key"

Are you saying using a valid 'postgres' user it works but using a non-existing user it silently fails?

Comment 4 luke couzens 2016-02-05 10:57:54 UTC
See logs below, we try to run the command twice on two different hosts. In both cases, the command fails with no error message. From the logs it seems that once a "bad" command is run, it pollutes the filesystem to a point where the database cannot be confiured any more until some kind of cleanup is run. Either way there is no feedback to the user.

ADMIN
[root@host1 ~]# appliance_console_cli --region 11 --internal --username admin --password test
create encryption key
configuring internal database
[root@host1 ~]# appliance_console_cli --region 11 --internal --username admin --password test
configuring internal database
[root@host1 ~]#

=============== /var/www/miq/vmdb/log/appliance_console.log =================
# Logfile created on 2016-02-05 05:30:32 -0500 by logger.rb/47272
I,  [2016-02-05T05:30:32.915752 #10285]  INFO -- :  MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql)  : starting
I,  [2016-02-05T05:30:59.340967 #10285]  INFO -- :  MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql)  : complete
E,  [2016-02-05T05:30:59.626222 #10285] ERROR -- :  MIQ(ApplianceConsole::InternalDatabaseConfiguration#validated)  FATAL:   Peer authentication failed for user "admin"

I,  [2016-02-05T05:38:48.060119 #12862]  INFO -- :  MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql)  : starting
E,  [2016-02-05T05:38:48.157397 #12862] ERROR -- :  MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql)   Command failed: service exit code: 1. Error: Hint: the preferred way to  do this is now "/opt/rh/rh-postgresql94/root/usr/bin/postgresql-setup  --initdb --unit rh-postgresql94-postgresql"
 * Initializing database in '/var/opt/rh/rh-postgresql94/lib/pgsql/data'
ERROR: Data directory /var/opt/rh/rh-postgresql94/lib/pgsql/data is not empty!
ERROR: Initializing database failed, possibly see /var/lib/pgsql/initdb_rh-postgresql94-postgresql.log
. Output: . At: /var/www/miq/vmdb/gems/pending/appliance_console/internal_database_configuration.rb:160:in `run_initdb'
E,  [2016-02-05T05:38:48.249635 #12862] ERROR -- :  MIQ(ApplianceConsole::InternalDatabaseConfiguration#validated)  FATAL:   Peer authentication failed for user "admin"


POSTGRES
[root@host2 ~]# appliance_console_cli --region 11 --internal --username postgres --password test
create encryption key
configuring internal database
[root@host2 ~]# appliance_console_cli --region 11 --internal --username postgres --password test
configuring internal database
[root@host2 ~]# 

=============== /var/www/miq/vmdb/log/appliance_console.log =================
# Logfile created on 2016-02-05 05:34:21 -0500 by logger.rb/47272
I, [2016-02-05T05:34:21.174970 #11657]  INFO -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) : starting
E, [2016-02-05T05:34:38.796596 #11657] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql)  Error: PG::DuplicateObject with message: ERROR:  role "postgres" already exists
. Failed at: /var/www/miq/vmdb/gems/pending/appliance_console/internal_database_configuration.rb:177:in `exec'
E, [2016-02-05T05:34:39.050627 #11657] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#validated)  FATAL:  database "vmdb_production" does not exist

I, [2016-02-05T05:38:31.862734 #12438]  INFO -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) : starting
E, [2016-02-05T05:38:31.942307 #12438] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql)  Command failed: service exit code: 1. Error: Hint: the preferred way to do this is now "/opt/rh/rh-postgresql94/root/usr/bin/postgresql-setup --initdb --unit rh-postgresql94-postgresql"
 * Initializing database in '/var/opt/rh/rh-postgresql94/lib/pgsql/data'
ERROR: Data directory /var/opt/rh/rh-postgresql94/lib/pgsql/data is not empty!
ERROR: Initializing database failed, possibly see /var/lib/pgsql/initdb_rh-postgresql94-postgresql.log
. Output: . At: /var/www/miq/vmdb/gems/pending/appliance_console/internal_database_configuration.rb:160:in `run_initdb'
E, [2016-02-05T05:38:32.025072 #12438] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#validated)  FATAL:  database "vmdb_production" does not exist

How is this command supposed to work?

Comment 5 Joe Rafaniello 2016-02-05 15:09:33 UTC
Thanks for the details Luke and I'm sorry it's not working for you.

> From the logs it seems that once a "bad" command is run, it pollutes the >filesystem to a point where the database cannot be confiured any more until >some kind of cleanup is run. Either way there is no feedback to the user.

That is correct, the setup of the volume group, physical volume, logical volume, formatting, pg init, database migration, etc. is not atomic so if it fails in there, it cannot be easily re-attempted.

I'm trying to understand if it always fails initially.

Could you help me understand what this means:

>command used to error:
>
>appliance_console_cli --region 11 --internal --username admin --password >test --force-key
>
>command used to create database:
>
>appliance_console_cli --region 11 --internal --username postgress --password >test --force-key

1) Are you saying that the "admin" failed initially and "postgress" was successful initially?

2) Did this work differently with prior versions?  I don't recall much changing in this area since 5.4 but I could be wrong.

3) Finally, is there a use case driving the need to use the CLI over the user interface?  Is there something we are missing in the UI?

Thank you for the information!
Joe

Comment 7 Joe Rafaniello 2016-02-05 18:43:46 UTC
Great, thanks Luke.

We'll try to recreate it and see what we can do about it.

My hunch is we can fix the "silently failing" thing in this BZ, which will probably lead us to fixing why this is happening.  The latter appears to be a change in behavior in the scl postgresql 94 command line options.

We will probably have to do much more work to get the whole process retryable on error.  It's a valid problem, but there haven't been many requests to do this though.

Comment 9 CFME Bot 2016-02-12 19:20:47 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/c8dafe4c726bed39057a521a2c9122703d1f2c6f

commit c8dafe4c726bed39057a521a2c9122703d1f2c6f
Author:     Joe VLcek <jvlcek>
AuthorDate: Thu Feb 11 17:05:28 2016 -0500
Commit:     Joe VLcek <jvlcek>
CommitDate: Thu Feb 11 17:09:20 2016 -0500

    Report info and errors messages from appliance_console_cli
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1302868

 gems/pending/appliance_console/logging.rb                  | 14 ++++++--------
 gems/pending/appliance_console/utilities.rb                |  1 -
 .../spec/appliance_console/database_configuration_spec.rb  |  5 +++--
 gems/pending/spec/appliance_console/logging_spec.rb        |  6 +++++-
 4 files changed, 14 insertions(+), 12 deletions(-)

Comment 11 CFME Bot 2016-02-15 15:44:09 UTC
Detected commit referencing this ticket while ticket status is MODIFIED.

Comment 12 luke couzens 2016-04-27 20:54:08 UTC
Verified in 5.6.0.4-beta2.3

Comment 14 errata-xmlrpc 2016-06-29 15:34:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348