Bug 1696237

Summary:

unable to run fix_auth on database with "stack too deep"

Product:

Red Hat CloudForms Management Engine

Reporter:

Felix Dewaleyne <fdewaley>

Component:

Appliance

Assignee:

Keenan Brock <kbrock>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Jaroslav Henner <jhenner>

Severity:

high

Docs Contact:

Red Hat CloudForms Documentation <cloudforms-docs>

Priority:

high

Version:

5.9.7

CC:

abellott, dmetzger, fdewaley, kbrock, mshriver, ncarboni, obarenbo

Target Milestone:

Keywords:

TestOnly, ZStream

Target Release:

5.11.0

Hardware:

All

OS:

All

Whiteboard:

Fixed In Version:

5.11.0.1

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1702072 (view as bug list)

Environment:

Last Closed:

2019-12-13 14:54:30 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

Bug

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

CFME Core

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1702072

Attachments:

Description	Flags
fix_auth output	none

Description Felix Dewaleyne 2019-04-04 11:51:37 UTC

Description of problem:
while importing a database to look into an issue where objects couldn't be deleted, I ran into a "stack too deep" trace running fix_auth.

Version-Release number of selected component (if applicable):
5.9.7

How reproducible:
all the time

Steps to Reproduce:
1. create appliance based on 5.9.7 
2. create database using appliance_console and /dev/vdb
3. destroy and recreate the database to ready importing the dump
4. import database
5. fix authentication in database.yml
6. fix authentication in database

Actual results:
fixing authentications.password, auth_key
fixing miq_databases.registration_http_proxy_server, session_secret_token, csrf_secret_token
fixing miq_ae_values.value
fixing miq_ae_fields.default_value
fixing miq_requests.options
fixing miq_request_tasks.options
bundler: failed to load command: tools/fix_auth.rb (tools/fix_auth.rb)
SystemStackError: stack level too deep
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
:
[...]
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:29:in `block (2 levels) in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:28:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:28:in `each_with_index'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:28:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/tools/fix_auth/auth_config_model.rb:38:in `recrypt'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:46:in `block in fix_passwords'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:44:in `each'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:44:in `fix_passwords'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:85:in `block in run'
  /opt/rh/cfme-gemset/gems/activerecord-5.0.7.1/lib/active_record/relation/delegation.rb:38:in `each'
  /opt/rh/cfme-gemset/gems/activerecord-5.0.7.1/lib/active_record/relation/delegation.rb:38:in `each'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:84:in `run'
  /var/www/miq/vmdb/tools/fix_auth/fix_auth.rb:63:in `block in fix_database_passwords'
  /var/www/miq/vmdb/tools/fix_auth/fix_auth.rb:62:in `each'
  /var/www/miq/vmdb/tools/fix_auth/fix_auth.rb:62:in `fix_database_passwords'
  /var/www/miq/vmdb/tools/fix_auth/fix_auth.rb:86:in `run'
  /var/www/miq/vmdb/tools/fix_auth/cli.rb:37:in `run'
  /var/www/miq/vmdb/tools/fix_auth/cli.rb:41:in `run'
  tools/fix_auth.rb:26:in `<top (required)>'


Expected results:
authentication is fixed as expected

Additional info:
I used the sbr-cfme lab to create the appliance every time.
rake db:migrate doesn't seem to think the db need any migration
trying again with 5.9.9 does not change the behaviour observed
trying with a new private key doesn't change the behaviour either
exact commands in private notes.
original customer issue is that they cannot delete a container provider
, after the first attempt it cannot be edited either.

Comment 3 Felix Dewaleyne 2019-04-04 12:06:25 UTC

Created attachment 1551834 [details]
fix_auth output

output of `bundle exec tools/fix_auth.rb --v2 --invalid bogus`

Comment 4 Felix Dewaleyne 2019-04-04 12:09:40 UTC

note : the region is number 34, usually it gets fixed after fix_auth but I running fix_auth with the correct region doesn't change anything.

Comment 5 CFME Bot 2019-04-04 19:36:07 UTC

https://github.com/ManageIQ/manageiq/pull/18631

Comment 6 Keenan Brock 2019-04-05 12:58:50 UTC

This was caused by a bad miq_request_tasks record

The customer inserted amazon credentials with a self reference (you can do this in yaml) that resulted in an infinite recursion.

I have added code to detect one or two cases with this recursion.
I have also patched fix_auth on this appliance so Felix can continue on with the problem he was originally investigating.


Just waiting for a merge and backport.

Comment 7 CFME Bot 2019-04-05 15:25:58 UTC

New commit detected on ManageIQ/manageiq/master:

https://github.com/ManageIQ/manageiq/commit/2eabc44f1a1874c3707964279db8aa7c3793bf1e
commit 2eabc44f1a1874c3707964279db8aa7c3793bf1e
Author:     Keenan Brock <keenan>
AuthorDate: Thu Apr  4 15:21:20 2019 -0400
Commit:     Keenan Brock <keenan>
CommitDate: Thu Apr  4 15:21:20 2019 -0400

    fix_auth now handles recursive settings

    situation:

    1. For one customer, miq_request_tasks has an options hash with
    recursive values.
    2. fix_auth recurses all the options looking for passwords to convert

    before:
    it recurses forever

    after:
    it now detects the recursion and does not go forever

    NOTE: this only detects very simple recursive cases.

    https://bugzilla.redhat.com/show_bug.cgi?id=1696237
 lib/vmdb/settings_walker.rb | 3 +-
 spec/lib/vmdb/settings_spec.rb | 77 +-
 2 files changed, 53 insertions(+), 27 deletions(-)

Comment 8 Keenan Brock 2019-04-05 17:27:38 UTC

merged.


Of Note: to "fix" a region of 34, just delete the REGION file and everything will work great.

Comment 11 Jaroslav Henner 2019-06-13 13:47:50 UTC

The DB dump I have seen in the Red Hat Customer Portal is 1G. It will certainly take much of resources for verification as well as for automating the test for this. Can I get some reproduce steps that would not involve getting a DB as big as that?

Comment 12 Jaroslav Henner 2019-07-23 15:49:26 UTC

Ok, I tried to restore the db on some larger VM I made on 5.11 and failed like this:
[root@dhcp-8-198-123 vmdb]#  pg_restore --no-password --dbname vmdb_production --verbose --exit-on-error /net/$DB_DUMPS_IP/srv/export/customer_db_dump_migrated 
pg_restore: connecting to database for restore
pg_restore: creating SCHEMA "public"
pg_restore: creating COMMENT "SCHEMA public"
pg_restore: creating SCHEMA "repmgr_miq_region_34_cluster"
pg_restore: creating EXTENSION "plpgsql"
pg_restore: creating COMMENT "EXTENSION plpgsql"
pg_restore: creating FUNCTION "public.metric_rollups_inheritance_after()"
pg_restore: creating FUNCTION "public.metric_rollups_inheritance_before()"
pg_restore: creating FUNCTION "public.metrics_inheritance_after()"
pg_restore: creating FUNCTION "public.metrics_inheritance_before()"
pg_restore: creating FUNCTION "repmgr_miq_region_34_cluster.repmgr_get_last_standby_location()"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 825; 1255 29576 FUNCTION repmgr_get_last_standby_location() root
pg_restore: [archiver (db)] could not execute query: ERROR:  could not access file "$libdir/repmgr_funcs": No such file or directory
    Command was: CREATE FUNCTION repmgr_miq_region_34_cluster.repmgr_get_last_standby_location() RETURNS text
    LANGUAGE c STRICT
    AS '$libdir/repmgr_funcs', 'repmgr_get_last_standby_location';

Comment 13 Jaroslav Henner 2019-07-24 11:38:18 UTC

I restored the DB without --exit-on-error. There were 10 errors. THen I tried the fix_auth. It didn't reproduce on 5.11.0.15 but It also didn't reproduce on cfme-5.10.3.3. I don't know what I am doing wrong

# pg_restore -U root -j 4 -d vmdb_production /net/$NFS_SHARE/srv/export/customer_db_dump
# fix_auth --databaseyml
# fix_auth --v2 --invalid bogus

Comment 14 Jaroslav Henner 2019-07-24 13:38:14 UTC

I  couldn't reproduce. The fix_auth tool does work, so hopefully it is OK to make this VERIFIED.

Comment 15 Jaroslav Henner 2019-11-01 16:56:51 UTC

Setting the qe_test_coverage- as I was unable to reproduce