Bug 1696237
Summary: | unable to run fix_auth on database with "stack too deep" | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Felix Dewaleyne <fdewaley> | ||||
Component: | Appliance | Assignee: | Keenan Brock <kbrock> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Jaroslav Henner <jhenner> | ||||
Severity: | high | Docs Contact: | Red Hat CloudForms Documentation <cloudforms-docs> | ||||
Priority: | high | ||||||
Version: | 5.9.7 | CC: | abellott, dmetzger, fdewaley, kbrock, mshriver, ncarboni, obarenbo | ||||
Target Milestone: | GA | Keywords: | TestOnly, ZStream | ||||
Target Release: | 5.11.0 | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | 5.11.0.1 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1702072 (view as bug list) | Environment: | |||||
Last Closed: | 2019-12-13 14:54:30 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | Bug | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | CFME Core | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1702072 | ||||||
Attachments: |
|
Description
Felix Dewaleyne
2019-04-04 11:51:37 UTC
Created attachment 1551834 [details]
fix_auth output
output of `bundle exec tools/fix_auth.rb --v2 --invalid bogus`
note : the region is number 34, usually it gets fixed after fix_auth but I running fix_auth with the correct region doesn't change anything. This was caused by a bad miq_request_tasks record The customer inserted amazon credentials with a self reference (you can do this in yaml) that resulted in an infinite recursion. I have added code to detect one or two cases with this recursion. I have also patched fix_auth on this appliance so Felix can continue on with the problem he was originally investigating. Just waiting for a merge and backport. New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/2eabc44f1a1874c3707964279db8aa7c3793bf1e commit 2eabc44f1a1874c3707964279db8aa7c3793bf1e Author: Keenan Brock <keenan> AuthorDate: Thu Apr 4 15:21:20 2019 -0400 Commit: Keenan Brock <keenan> CommitDate: Thu Apr 4 15:21:20 2019 -0400 fix_auth now handles recursive settings situation: 1. For one customer, miq_request_tasks has an options hash with recursive values. 2. fix_auth recurses all the options looking for passwords to convert before: it recurses forever after: it now detects the recursion and does not go forever NOTE: this only detects very simple recursive cases. https://bugzilla.redhat.com/show_bug.cgi?id=1696237 lib/vmdb/settings_walker.rb | 3 +- spec/lib/vmdb/settings_spec.rb | 77 +- 2 files changed, 53 insertions(+), 27 deletions(-) merged. Of Note: to "fix" a region of 34, just delete the REGION file and everything will work great. The DB dump I have seen in the Red Hat Customer Portal is 1G. It will certainly take much of resources for verification as well as for automating the test for this. Can I get some reproduce steps that would not involve getting a DB as big as that? Ok, I tried to restore the db on some larger VM I made on 5.11 and failed like this: [root@dhcp-8-198-123 vmdb]# pg_restore --no-password --dbname vmdb_production --verbose --exit-on-error /net/$DB_DUMPS_IP/srv/export/customer_db_dump_migrated pg_restore: connecting to database for restore pg_restore: creating SCHEMA "public" pg_restore: creating COMMENT "SCHEMA public" pg_restore: creating SCHEMA "repmgr_miq_region_34_cluster" pg_restore: creating EXTENSION "plpgsql" pg_restore: creating COMMENT "EXTENSION plpgsql" pg_restore: creating FUNCTION "public.metric_rollups_inheritance_after()" pg_restore: creating FUNCTION "public.metric_rollups_inheritance_before()" pg_restore: creating FUNCTION "public.metrics_inheritance_after()" pg_restore: creating FUNCTION "public.metrics_inheritance_before()" pg_restore: creating FUNCTION "repmgr_miq_region_34_cluster.repmgr_get_last_standby_location()" pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 825; 1255 29576 FUNCTION repmgr_get_last_standby_location() root pg_restore: [archiver (db)] could not execute query: ERROR: could not access file "$libdir/repmgr_funcs": No such file or directory Command was: CREATE FUNCTION repmgr_miq_region_34_cluster.repmgr_get_last_standby_location() RETURNS text LANGUAGE c STRICT AS '$libdir/repmgr_funcs', 'repmgr_get_last_standby_location'; I restored the DB without --exit-on-error. There were 10 errors. THen I tried the fix_auth. It didn't reproduce on 5.11.0.15 but It also didn't reproduce on cfme-5.10.3.3. I don't know what I am doing wrong # pg_restore -U root -j 4 -d vmdb_production /net/$NFS_SHARE/srv/export/customer_db_dump # fix_auth --databaseyml # fix_auth --v2 --invalid bogus I couldn't reproduce. The fix_auth tool does work, so hopefully it is OK to make this VERIFIED. Setting the qe_test_coverage- as I was unable to reproduce |