Bug 1381857
Summary: | createOrUpdatePath API creates duplicated entries in rhnConfigInfo table | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite 5 | Reporter: | Ales Dujicek <adujicek> | ||||||||||
Component: | API | Assignee: | Grant Gainey <ggainey> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ales Dujicek <adujicek> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 570 | CC: | eherget, ggainey, pstudeni, shughes, tkasparek, tlestach | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | spacewalk-java-2.5.14-80-sat spacewalk-schema-2.5.1-49-sat satellite-schema-5.8.0.31-1-sat | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2017-06-21 12:17:38 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1340444, 1410775 | ||||||||||||
Attachments: |
|
Description
Ales Dujicek
2016-10-05 08:34:29 UTC
Created attachment 1207484 [details]
python api reproducer
adding reproducer to duplicate rhnconfiginfo entries
This is going to be non-trivial to fix, due to the difference between the way Oracle and Postgres treat NULL and uniqueness, and the overloaded use of the rhnconfiginfo table. * rhnconfiginfo's unique-contraint is in place to avoid duplicate entries for common information (e.g root/wheel/744) * rhnconfiginfo is being overloaded to *also* track the filename of a symlink. * in the latter context, 'NULL' means 'is not a symlink' - *not* 'this value is unknown at this time', which is what Postgres (and SQL) assume NULL means. A better choice back in the mists of time would have been to add the symlink_target_filename_id to *rhnConfigFile*, rather than rhnConfigInfo. That ship, alas, has sailed. The way to avoid duplication in Postgres is one of the following: * Have two indices, one where the column is not-null and one where it is: === CREATE UNIQUE INDEX rhn_confinfo_ugf5_uq ON rhnConfigInfo (username, groupname, filemode, selinux_ctx, symlink_target_filename_id) WHERE symlink_target_filename_id IS NOT NULL; CREATE UNIQUE INDEX rhn_confinfo_ugf4_uq ON rhnConfigInfo (username, groupname, filemode, selinux_ctx) WHERE symlink_target_filename_id IS NULL; === * store a known-not-used value into a column, rather than using NULL for that purpose (which loses the FK-ness of that column): === CREATE UNIQUE INDEX rhn_confinfo_ugf5_uq ON rhnConfigInfo (username, groupname, filemode, selinux_ctx, COALESCE(symlink_target_filename_id, 0)); === Clearly, the first option is preferable. In fact, we already do exactly the first-option described for tables like rhnDistChannelMap - see spacewalk/postgres/tables/rhnDistChannelMap_index.sql for an example. Adding discussion from email, so it's all in one place:
=== ggainey: ===
Basically, Postgres has been allowing duplicates in rhnConfigInfo when it shouldn't, which makes migrating to Oracle impossible (since Oracle correctly refuses the dups). The cause is that a) Postgres treats NULL-in-unique-constraints differently than Oracle, and b) the constraint on this table doesn't "do the right thing" to get the desired behavior.
I have a fix for the schema, and a fix to the Java code that allows the dups to happen in the first place, and an upgrade script - yay! HOWEVER - we need a way to fix existing DBs that have the problem, as part of the upgrade. And "just delete the dups" won't do, as those rows are referenced as foreign-keys by rhnConfigRevision.
As an example - 'symlink-id' and 'selinux-ctx' are the columns that can be null, and Postgres says that all nulls are unique. Therefore, we might have data that looks like this:
rhnConfigInfo:
id username group perms symlink-id selinux-ctx
1 auser agrp aperm null null
2 auser agrp aperm null null
3 buser bgrp bperm null null
4 auser agrp aperm null null
5 buser bgrp bperm null null
rhnConfigRevision:
id config_info_id
187 1
215 2
385 3
433 4
772 5
What we want to end up with, is
rhnConfigInfo
1 auser agrp aperm null null
3 buser bgrp bperm null null
rhnConfigRevision
187 1
215 1
385 3
433 1
772 3
with the other rows gone. The hard part, of course, is updating the *rows in rhnConfigRevision* that have an affected config_info_id, to point at the appropriate *remaining* config_info_id. Ay de mi.
=== tlestach: ===
We may try to combine the duplicated rows. The question is, whether we
can be sure, what rows can be combined.
Risks:
1. Combining rows from different orgs (since neither rhnConfigInfo,
nor rhnConfigRevision contains org_id)
- e.g. same symlink is configured in different orgs
2. Combining incorrectly lines with real changes (group, ctx, content, ...)
Just thoughts, not sure, if these are valid.
Do you see any other options than to combine the duplicated lines?
=== ggainey: ===
rhnConfigInfo data isn't org-specific - if two orgs have config-files with the same username/group/permissions/symlink/selinux, it's represented as one row. Basically, this entire constraint exists only to minimize the size of this table.
> Do you see any other options than to combine the duplicated lines?
Remove the constraint. Without it, the migration would Just Work even in the presence of duplicate rows. With the change I have to the Java side, the duplication is prevented on the app-side (at least in the API call context), so the table is unlikely to grow out of bounds. Otherwise, having dups in the table appears to be harmless (although I would need a bunch of testing to confirm).
I don't like having tables with duplicated data in them :( However, we've been allowing it to happen in postgres since 2010, apparently without harm. Because of the foreign-key constraints, fixing the data will definitely be...exciting.
Created attachment 1246892 [details]
Testcase - sample-data-creation
Created attachment 1246893 [details]
Testcase - sample-data-drop
Created attachment 1246894 [details]
Testcase - check-for-broken and fix-it functions
As noted in #c6, fixing the data in cases like this is...exciting. It is not, however, impossible. Attached find three testing files: * creates.sql - creates 2 tables, one with unwanted dups and one that points at them * drop.sql - blows away the tables from creates.sql * fix_t2.sql - contains 2 postgresql functions: * is_t1_broken() - returns true if t1 has dups * fix_t2() - removes dups from t1 and repoints t2 to retained rows We can use these functions as either templates for per-table fixes, or perhaps someone can figure out how to generalize code 'like this' to handle arbitrary tables. * Fixed the API hole that allowed the dup-creation to happen in the first place * Added (grossly ugly) upgrade script to remove dups when it's already happened * Added corrected indices to make it impossible to create dups at all spacewalk.github: 7923fef31657243adc77f723f9d8dbedff4c1fd3 Added empty .ORA to make upgrade happy spacewalk.github: 09122968f2a2c945028120a519896efd341fca80 Upgrade of Satellite 5.7 with duplicates in the database failed: /var/log/spacewalk/schema-upgrade/20170524-043747-to-satellite-schema-5.8.log: --------------------------------------------------------------------------------------- satellite-schema-5.7-to-satellite-schema-5.8/012-rhnConfigInfo_indexes.sql.postgresql (1 row) CREATE FUNCTION CREATE FUNCTION CREATE FUNCTION CREATE FUNCTION CREATE FUNCTION CREATE FUNCTION psql:/var/log/spacewalk/schema-upgrade/20170524-043747-script.sql:16819: INFO: DUP FILE INFO found, selinux NULL CONTEXT: SQL statement "select is_cfginfo_broken()" PL/pgSQL function fix_cfgrev_files() line 6 at SQL statement psql:/var/log/spacewalk/schema-upgrade/20170524-043747-script.sql:16819: INFO: ...REPOINTING RECORDS, rhnConfigInfo = 2 CONTEXT: SQL statement "select fix_cfgrev_files_SELINUX_NULL()" PL/pgSQL function fix_cfgrev_files() line 9 at SQL statement psql:/var/log/spacewalk/schema-upgrade/20170524-043747-script.sql:16819: ERROR: update or delete on table "rhnconfiginfo" violates foreign key constraint "rhn_confrevision_ciid_fk" on table "rhnconfigrevision" DETAIL: Key (id)=(4) is still referenced from table "rhnconfigrevision". CONTEXT: SQL statement "DELETE from rhnConfigInfo ci where 1=1 and ci.username = good_username and ci.groupname = good_groupname and ci.filemode = good_filemode and ci.selinux_ctx is null and ci.id <> good_cfginfo_id" PL/pgSQL function fix_cfgrev_files_selinux_null() line 56 at SQL statement SQL statement "select fix_cfgrev_files_SELINUX_NULL()" PL/pgSQL function fix_cfgrev_files() line 9 at SQL statement what was in the database: # select id, username, groupname, filemode, symlink_target_filename_id, selinux_ctx from rhnconfiginfo; id | username | groupname | filemode | symlink_target_filename_id | selinux_ctx ----+----------+-----------+----------+----------------------------+------------ - 1 | zroot0 | root | 644 | | 2 | zroot0 | root | 644 | | 3 | zroot0 | root | 644 | | 4 | zroot0 | root | 644 | | 5 | zroot1 | root | 644 | | 6 | zroot1 | root | 644 | | 7 | zroot1 | root | 644 | | 8 | zroot2 | root | 644 | | 9 | zroot3 | root | 644 | | 10 | zroot3 | root | 644 | | (10 rows) # select id, config_file_id, config_info_id from rhnconfigrevision order by id; id | config_file_id | config_info_id ----+----------------+---------------- 1 | 4 | 3 2 | 3 | 2 3 | 2 | 4 4 | 1 | 1 5 | 5 | 5 6 | 8 | 5 7 | 7 | 7 8 | 6 | 6 9 | 9 | 8 10 | 11 | 8 11 | 12 | 8 12 | 10 | 8 13 | 13 | 9 14 | 14 | 10 15 | 15 | 10 16 | 16 | 10 (16 rows) one more note 4528a69c0ed542924e4d3f9ebf724f9c52c24e90 has changed unique index to unique constraint on Oracle: schema/spacewalk/common/tables/rhnConfigInfo.sql -CREATE UNIQUE INDEX rhn_confinfo_ugf_uq - ON rhnConfigInfo (username, groupname, filemode, selinux_ctx, symlink_target_filename_id) - TABLESPACE [[4m_tbs]]; - schema/spacewalk/oracle/tables/rhnConfigInfo_index.sql +ALTER TABLE rhnConfigInfo + ADD CONSTRAINT rhn_confinfo_ugf_uq UNIQUE (username, groupname, filemode, selinux_ctx, symlink_target_filename_id); I believe it is OK, just to be sure Ugh, nice catch - a last-minute decision that I didn't test correctly/adequately. Also, the ORA index-to-constraint change was deliberate, and it works - but it's still suboptimal, it's better to stay consistent, and while the index doesn't gain us much, it doesn't cost much either. Setting it back to index. spacewalk.github: bf018be3d5c25f0ce9f993fe69b6a90dad2deb7d now schema upgrade works even with duplicates in rhnConfigInfo duplicates like this # select id, username, groupname, filemode, symlink_target_filename_id, selinux_ctx from rhnconfiginfo; id | username | groupname | filemode | symlink_target_filename_id | selinux_ct x -----+----------+-----------+----------+----------------------------+----------- -- 99 | 1-root0 | 1-root | 644 | | 100 | 1-root0 | 1-root | 644 | | 97 | 1-root0 | 1-root | 644 | | 98 | 1-root0 | 1-root | 644 | | 119 | 2-root0 | 2-root | 644 | | sys_t 120 | 2-root0 | 2-root | 644 | | sys_t 121 | 2-root0 | 2-root | 644 | | sys_t 122 | 2-root0 | 2-root | 644 | | sys_t 147 | | | | 263 | 148 | | | | 263 | 146 | | | | 263 | 149 | | | | 263 | 176 | | | | 316 | sys_t 177 | | | | 316 | sys_t 178 | | | | 316 | sys_t ... (110 rows) were reduced to # select id, username, groupname, filemode, symlink_target_filename_id, selinux_ctx from rhnconfiginfo; id | username | groupname | filemode | symlink_target_filename_id | selinux_ct x -----+----------+-----------+----------+----------------------------+----------- -- 97 | 1-root0 | 1-root | 644 | | 119 | 2-root0 | 2-root | 644 | | sys_t 146 | | | | 263 | 176 | | | | 316 | sys_t ... (32 rows) it covers all possible options: username+groupname, symlink_target_filename_id, selinux_ctx notnull, null, null notnull, null, notnull null, notnull, null null, notnull, notnull Verified spacewalk-schema-2.5.1-49.el6sat.noarch spacewalk-java-2.5.14-89.el6sat.noarch satellite-schema-5.8.0.31-1.el6sat.noarch |