Description of problem: engine-setup spends hours in checking the consistency when "audit_log" and "event_notification_hist" tables are big. Version-Release number of selected component (if applicable): 4.4.3 How reproducible: Always Steps to Reproduce: 1. run "engine-setup --offline --accep-defaults" in an installation where the "audit_log" and "event_notification_hist" tables are big. Actual results: The engine is stuck for hours checking the consistency: ~~~ [ INFO ] Cleaning async tasks and compensations [ INFO ] Unlocking existing entities [ INFO ] Checking the Engine database consistency ~~~ The CPU is close to 100% and the following SQL queries can be found in the database: ~~~ pid | usename | datname | query | client_addr | application_name --------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+-------------------- ---- 184378 | engine | engine | copy ( +| ::1 | psql | | | select fk_violation, fk_status +| | | | | from fn_db_validate_fks(false, 0 != 0) +| | | | | where fk_status=1 +| | | | | ) to stdout with delimiter as '|'; | | 184379 | engine | engine | select (audit_log_id) from public.event_notification_hist where (audit_log_id) IS NOT NULL and (audit_log_id) not in (select audit_log_id from public.audit_log); | | psql 184380 | engine | engine | select (audit_log_id) from public.event_notification_hist where (audit_log_id) IS NOT NULL and (audit_log_id) not in (select audit_log_id from public.audit_log); | | psql ~~~ Expected results: The engine-setup command finishes in a reasonable amount of time. Additional info: Data table sizes ~~~ engine=# select count(*) from audit_log; count -------- 660608 (1 row) engine=# select count(*) from event_notification_hist; count -------- 635110 (1 row) ~~~ Truncating the "event_notification_hist" table as advised in [1] makes the setup to finish in seconds: ~~~ su - postgres -c "pg_dump -Fc --data-only -d engine -t event_notification_hist > /tmp/event_notification_hist.sql" engine-setup su - postgres -c "psql engine -c 'truncate table event_notification_hist;'" su - postgres -c "pg_restore --data-only -d engine -t event_notification_hist /tmp/event_notification_hist.sql" ~~~ [1] https://bugzilla.redhat.com/show_bug.cgi?id=1323826#c13
Can you please attach the dump of the tested database ?
Well , the verification process simply verifies that when we remove an entity it will be removed on all dependant tables via the DB FK definition on the dependant table, this will be time consuming when dealing with large tables Looking in the problematic table I see \d event_notification_hist Table "public.event_notification_hist" Column | Type | Collation | Nullable | Default --------------+--------------------------+-----------+----------+--------- event_name | character varying(100) | | not null | audit_log_id | bigint | | not null | method_type | character(10) | | not null | sent_at | timestamp with time zone | | not null | status | boolean | | not null | reason | character(255) | | | Indexes: "idx_event_notification_hist" btree (audit_log_id) Foreign-key constraints: "fk_event_notification_hist_audit_log" FOREIGN KEY (audit_log_id) REFERENCES audit_log(audit_log_id) ON DELETE CASCADE I came to the conclusion that there is no point to remove history data when an entity is removed and this may also violate the meaning of keeping the history So, simple solution would be IMO , removing fk_event_notification_hist_audit_log from event_notification_hist table and since all insertions to this table are programmatic and not manual , I see no problem with not checking the passed audit_log_id
connecting to the database in comment 2 , I found the following : engine=# select count(*) from event_notification_hist ; 635110 So, history table is quite big Looking on the dates it keeps : engine=# select sent_at from event_notification_hist order by sent_at desc limit 1; 2021-01-27 14:08:49.183+00 engine=# select sent_at from event_notification_hist order by sent_at asc limit 1; 2020-12-28 14:16:11.666+00 So, this table really keeps a 30 days of history Seems that audit_log table and event_notification_hist have a independent transaction mechanism set for 30 days In event_notification_hist it is controlled by DAYS_TO_KEEP_HISTORY while in audit_log it is controlled by AuditLogAgingThreshold However , seems we have another issue here regarding audit_log table truncation : engine=# select * from vdc_options where option_name= 'AuditLogAgingThreshold'; 13 | AuditLogAgingThreshold | 30 | general | 30 engine=# select log_time from audit_log order by log_time asc limit 1; 2020-12-28 00:38:13.964+00 engine=# select log_time from audit_log order by log_time desc limit 1; 2021-02-03 09:57:31.272+00 Seems that audit_log truncation is not working from some reason here and should be investigated separately Anyway, since audit_log and event_notification_hist tables have separate truncation mechanisms and configuration it is not make sense to tie them with FK that cascades deletes from audit_log to event_notification_hist
Verified on: ovirt-engine-4.4.5.9-0.1.el8ev.noarch Steps: 1. Run engine-setup with very large audit_log table data engine=# select count(*) from audit_log; count -------- 673335 (1 row) # time engine-setup --offline --accept-defaults (...) real 3m0.328s user 1m19.096s sys 0m29.415s 2. Run engine-setup with small/no audit_log table data engine=# delete from audit_log; engine=# select count(*) from audit_log; count -------- 0 (1 row) # time engine-setup --offline --accept-defaults (...) real 2m41.927s user 1m12.190s sys 0m27.500s Results: Both executions finished in a reasonable and similar time
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) 4.4.z [ovirt-4.4.5] security, bug fix, enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1169