1696237 – unable to run fix_auth on database with "stack too deep"

Bug 1696237 - unable to run fix_auth on database with "stack too deep"

Summary: unable to run fix_auth on database with "stack too deep"

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Appliance
Sub Component:
Version:	5.9.7
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.11.0
Assignee:	Keenan Brock
QA Contact:	Jaroslav Henner
Docs Contact:	Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks:	1702072
TreeView+	depends on / blocked

Reported:	2019-04-04 11:51 UTC by Felix Dewaleyne
Modified:	2019-12-13 14:54 UTC (History)
CC List:	7 users (show)
Fixed In Version:	5.11.0.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1702072 (view as bug list)
Environment:
Last Closed:	2019-12-13 14:54:30 UTC
Category:	Bug
Cloudforms Team:	CFME Core
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
fix_auth output (574.60 KB, text/plain) 2019-04-04 12:06 UTC, Felix Dewaleyne	no flags	Details
View All

Description Felix Dewaleyne 2019-04-04 11:51:37 UTC

Description of problem:
while importing a database to look into an issue where objects couldn't be deleted, I ran into a "stack too deep" trace running fix_auth.

Version-Release number of selected component (if applicable):
5.9.7

How reproducible:
all the time

Steps to Reproduce:
1. create appliance based on 5.9.7 
2. create database using appliance_console and /dev/vdb
3. destroy and recreate the database to ready importing the dump
4. import database
5. fix authentication in database.yml
6. fix authentication in database

Actual results:
fixing authentications.password, auth_key
fixing miq_databases.registration_http_proxy_server, session_secret_token, csrf_secret_token
fixing miq_ae_values.value
fixing miq_ae_fields.default_value
fixing miq_requests.options
fixing miq_request_tasks.options
bundler: failed to load command: tools/fix_auth.rb (tools/fix_auth.rb)
SystemStackError: stack level too deep
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
:
[...]
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:26:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:29:in `block (2 levels) in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:28:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:28:in `each_with_index'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:28:in `block in walk'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `each'
  /var/www/miq/vmdb/lib/vmdb/settings_walker.rb:19:in `walk'
  /var/www/miq/vmdb/tools/fix_auth/auth_config_model.rb:38:in `recrypt'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:46:in `block in fix_passwords'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:44:in `each'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:44:in `fix_passwords'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:85:in `block in run'
  /opt/rh/cfme-gemset/gems/activerecord-5.0.7.1/lib/active_record/relation/delegation.rb:38:in `each'
  /opt/rh/cfme-gemset/gems/activerecord-5.0.7.1/lib/active_record/relation/delegation.rb:38:in `each'
  /var/www/miq/vmdb/tools/fix_auth/auth_model.rb:84:in `run'
  /var/www/miq/vmdb/tools/fix_auth/fix_auth.rb:63:in `block in fix_database_passwords'
  /var/www/miq/vmdb/tools/fix_auth/fix_auth.rb:62:in `each'
  /var/www/miq/vmdb/tools/fix_auth/fix_auth.rb:62:in `fix_database_passwords'
  /var/www/miq/vmdb/tools/fix_auth/fix_auth.rb:86:in `run'
  /var/www/miq/vmdb/tools/fix_auth/cli.rb:37:in `run'
  /var/www/miq/vmdb/tools/fix_auth/cli.rb:41:in `run'
  tools/fix_auth.rb:26:in `<top (required)>'


Expected results:
authentication is fixed as expected

Additional info:
I used the sbr-cfme lab to create the appliance every time.
rake db:migrate doesn't seem to think the db need any migration
trying again with 5.9.9 does not change the behaviour observed
trying with a new private key doesn't change the behaviour either
exact commands in private notes.
original customer issue is that they cannot delete a container provider
, after the first attempt it cannot be edited either.

Comment 3 Felix Dewaleyne 2019-04-04 12:06:25 UTC

Created attachment 1551834 [details]
fix_auth output

output of `bundle exec tools/fix_auth.rb --v2 --invalid bogus`

Comment 4 Felix Dewaleyne 2019-04-04 12:09:40 UTC

note : the region is number 34, usually it gets fixed after fix_auth but I running fix_auth with the correct region doesn't change anything.

Comment 5 CFME Bot 2019-04-04 19:36:07 UTC

https://github.com/ManageIQ/manageiq/pull/18631

Comment 6 Keenan Brock 2019-04-05 12:58:50 UTC

This was caused by a bad miq_request_tasks record

The customer inserted amazon credentials with a self reference (you can do this in yaml) that resulted in an infinite recursion.

I have added code to detect one or two cases with this recursion.
I have also patched fix_auth on this appliance so Felix can continue on with the problem he was originally investigating.


Just waiting for a merge and backport.

Comment 7 CFME Bot 2019-04-05 15:25:58 UTC

New commit detected on ManageIQ/manageiq/master:

https://github.com/ManageIQ/manageiq/commit/2eabc44f1a1874c3707964279db8aa7c3793bf1e
commit 2eabc44f1a1874c3707964279db8aa7c3793bf1e
Author:     Keenan Brock <keenan>
AuthorDate: Thu Apr  4 15:21:20 2019 -0400
Commit:     Keenan Brock <keenan>
CommitDate: Thu Apr  4 15:21:20 2019 -0400

    fix_auth now handles recursive settings

    situation:

    1. For one customer, miq_request_tasks has an options hash with
    recursive values.
    2. fix_auth recurses all the options looking for passwords to convert

    before:
    it recurses forever

    after:
    it now detects the recursion and does not go forever

    NOTE: this only detects very simple recursive cases.

    https://bugzilla.redhat.com/show_bug.cgi?id=1696237
 lib/vmdb/settings_walker.rb | 3 +-
 spec/lib/vmdb/settings_spec.rb | 77 +-
 2 files changed, 53 insertions(+), 27 deletions(-)

Comment 8 Keenan Brock 2019-04-05 17:27:38 UTC

merged.


Of Note: to "fix" a region of 34, just delete the REGION file and everything will work great.

Comment 11 Jaroslav Henner 2019-06-13 13:47:50 UTC

The DB dump I have seen in the Red Hat Customer Portal is 1G. It will certainly take much of resources for verification as well as for automating the test for this. Can I get some reproduce steps that would not involve getting a DB as big as that?

Comment 12 Jaroslav Henner 2019-07-23 15:49:26 UTC

Ok, I tried to restore the db on some larger VM I made on 5.11 and failed like this:
[root@dhcp-8-198-123 vmdb]#  pg_restore --no-password --dbname vmdb_production --verbose --exit-on-error /net/$DB_DUMPS_IP/srv/export/customer_db_dump_migrated 
pg_restore: connecting to database for restore
pg_restore: creating SCHEMA "public"
pg_restore: creating COMMENT "SCHEMA public"
pg_restore: creating SCHEMA "repmgr_miq_region_34_cluster"
pg_restore: creating EXTENSION "plpgsql"
pg_restore: creating COMMENT "EXTENSION plpgsql"
pg_restore: creating FUNCTION "public.metric_rollups_inheritance_after()"
pg_restore: creating FUNCTION "public.metric_rollups_inheritance_before()"
pg_restore: creating FUNCTION "public.metrics_inheritance_after()"
pg_restore: creating FUNCTION "public.metrics_inheritance_before()"
pg_restore: creating FUNCTION "repmgr_miq_region_34_cluster.repmgr_get_last_standby_location()"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 825; 1255 29576 FUNCTION repmgr_get_last_standby_location() root
pg_restore: [archiver (db)] could not execute query: ERROR:  could not access file "$libdir/repmgr_funcs": No such file or directory
    Command was: CREATE FUNCTION repmgr_miq_region_34_cluster.repmgr_get_last_standby_location() RETURNS text
    LANGUAGE c STRICT
    AS '$libdir/repmgr_funcs', 'repmgr_get_last_standby_location';

Comment 13 Jaroslav Henner 2019-07-24 11:38:18 UTC

I restored the DB without --exit-on-error. There were 10 errors. THen I tried the fix_auth. It didn't reproduce on 5.11.0.15 but It also didn't reproduce on cfme-5.10.3.3. I don't know what I am doing wrong

# pg_restore -U root -j 4 -d vmdb_production /net/$NFS_SHARE/srv/export/customer_db_dump
# fix_auth --databaseyml
# fix_auth --v2 --invalid bogus

Comment 14 Jaroslav Henner 2019-07-24 13:38:14 UTC

I  couldn't reproduce. The fix_auth tool does work, so hopefully it is OK to make this VERIFIED.

Comment 15 Jaroslav Henner 2019-11-01 16:56:51 UTC

Setting the qe_test_coverage- as I was unable to reproduce

Note You need to log in before you can comment on or make changes to this bug.