Bug 1302868
Summary: | Silent crash when creating Database | ||
---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | luke couzens <lcouzens> |
Component: | Appliance | Assignee: | Joe Vlcek <jvlcek> |
Status: | CLOSED ERRATA | QA Contact: | luke couzens <lcouzens> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.5.0 | CC: | abellott, jhardy, jprause, lcouzens, obarenbo, psavage |
Target Milestone: | GA | ||
Target Release: | 5.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 5.6.0.0 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-06-29 15:34:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
luke couzens
2016-01-28 20:15:16 UTC
Assigning to add test case Hi Luke, Is postgress a typo? (two s's) "appliance_console_cli --region 11 --internal --username postgress --password test --force-key" Are you saying using a valid 'postgres' user it works but using a non-existing user it silently fails? See logs below, we try to run the command twice on two different hosts. In both cases, the command fails with no error message. From the logs it seems that once a "bad" command is run, it pollutes the filesystem to a point where the database cannot be confiured any more until some kind of cleanup is run. Either way there is no feedback to the user. ADMIN [root@host1 ~]# appliance_console_cli --region 11 --internal --username admin --password test create encryption key configuring internal database [root@host1 ~]# appliance_console_cli --region 11 --internal --username admin --password test configuring internal database [root@host1 ~]# =============== /var/www/miq/vmdb/log/appliance_console.log ================= # Logfile created on 2016-02-05 05:30:32 -0500 by logger.rb/47272 I, [2016-02-05T05:30:32.915752 #10285] INFO -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) : starting I, [2016-02-05T05:30:59.340967 #10285] INFO -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) : complete E, [2016-02-05T05:30:59.626222 #10285] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#validated) FATAL: Peer authentication failed for user "admin" I, [2016-02-05T05:38:48.060119 #12862] INFO -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) : starting E, [2016-02-05T05:38:48.157397 #12862] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) Command failed: service exit code: 1. Error: Hint: the preferred way to do this is now "/opt/rh/rh-postgresql94/root/usr/bin/postgresql-setup --initdb --unit rh-postgresql94-postgresql" * Initializing database in '/var/opt/rh/rh-postgresql94/lib/pgsql/data' ERROR: Data directory /var/opt/rh/rh-postgresql94/lib/pgsql/data is not empty! ERROR: Initializing database failed, possibly see /var/lib/pgsql/initdb_rh-postgresql94-postgresql.log . Output: . At: /var/www/miq/vmdb/gems/pending/appliance_console/internal_database_configuration.rb:160:in `run_initdb' E, [2016-02-05T05:38:48.249635 #12862] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#validated) FATAL: Peer authentication failed for user "admin" POSTGRES [root@host2 ~]# appliance_console_cli --region 11 --internal --username postgres --password test create encryption key configuring internal database [root@host2 ~]# appliance_console_cli --region 11 --internal --username postgres --password test configuring internal database [root@host2 ~]# =============== /var/www/miq/vmdb/log/appliance_console.log ================= # Logfile created on 2016-02-05 05:34:21 -0500 by logger.rb/47272 I, [2016-02-05T05:34:21.174970 #11657] INFO -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) : starting E, [2016-02-05T05:34:38.796596 #11657] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) Error: PG::DuplicateObject with message: ERROR: role "postgres" already exists . Failed at: /var/www/miq/vmdb/gems/pending/appliance_console/internal_database_configuration.rb:177:in `exec' E, [2016-02-05T05:34:39.050627 #11657] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#validated) FATAL: database "vmdb_production" does not exist I, [2016-02-05T05:38:31.862734 #12438] INFO -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) : starting E, [2016-02-05T05:38:31.942307 #12438] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) Command failed: service exit code: 1. Error: Hint: the preferred way to do this is now "/opt/rh/rh-postgresql94/root/usr/bin/postgresql-setup --initdb --unit rh-postgresql94-postgresql" * Initializing database in '/var/opt/rh/rh-postgresql94/lib/pgsql/data' ERROR: Data directory /var/opt/rh/rh-postgresql94/lib/pgsql/data is not empty! ERROR: Initializing database failed, possibly see /var/lib/pgsql/initdb_rh-postgresql94-postgresql.log . Output: . At: /var/www/miq/vmdb/gems/pending/appliance_console/internal_database_configuration.rb:160:in `run_initdb' E, [2016-02-05T05:38:32.025072 #12438] ERROR -- : MIQ(ApplianceConsole::InternalDatabaseConfiguration#validated) FATAL: database "vmdb_production" does not exist How is this command supposed to work? Thanks for the details Luke and I'm sorry it's not working for you. > From the logs it seems that once a "bad" command is run, it pollutes the >filesystem to a point where the database cannot be confiured any more until >some kind of cleanup is run. Either way there is no feedback to the user. That is correct, the setup of the volume group, physical volume, logical volume, formatting, pg init, database migration, etc. is not atomic so if it fails in there, it cannot be easily re-attempted. I'm trying to understand if it always fails initially. Could you help me understand what this means: >command used to error: > >appliance_console_cli --region 11 --internal --username admin --password >test --force-key > >command used to create database: > >appliance_console_cli --region 11 --internal --username postgress --password >test --force-key 1) Are you saying that the "admin" failed initially and "postgress" was successful initially? 2) Did this work differently with prior versions? I don't recall much changing in this area since 5.4 but I could be wrong. 3) Finally, is there a use case driving the need to use the CLI over the user interface? Is there something we are missing in the UI? Thank you for the information! Joe Great, thanks Luke. We'll try to recreate it and see what we can do about it. My hunch is we can fix the "silently failing" thing in this BZ, which will probably lead us to fixing why this is happening. The latter appears to be a change in behavior in the scl postgresql 94 command line options. We will probably have to do much more work to get the whole process retryable on error. It's a valid problem, but there haven't been many requests to do this though. New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/c8dafe4c726bed39057a521a2c9122703d1f2c6f commit c8dafe4c726bed39057a521a2c9122703d1f2c6f Author: Joe VLcek <jvlcek> AuthorDate: Thu Feb 11 17:05:28 2016 -0500 Commit: Joe VLcek <jvlcek> CommitDate: Thu Feb 11 17:09:20 2016 -0500 Report info and errors messages from appliance_console_cli https://bugzilla.redhat.com/show_bug.cgi?id=1302868 gems/pending/appliance_console/logging.rb | 14 ++++++-------- gems/pending/appliance_console/utilities.rb | 1 - .../spec/appliance_console/database_configuration_spec.rb | 5 +++-- gems/pending/spec/appliance_console/logging_spec.rb | 6 +++++- 4 files changed, 14 insertions(+), 12 deletions(-) Detected commit referencing this ticket while ticket status is MODIFIED. Verified in 5.6.0.4-beta2.3 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1348 |