Bug 1381857

Summary: createOrUpdatePath API creates duplicated entries in rhnConfigInfo table
Product: Red Hat Satellite 5 Reporter: Ales Dujicek <adujicek>
Component: APIAssignee: Grant Gainey <ggainey>
Status: CLOSED CURRENTRELEASE QA Contact: Ales Dujicek <adujicek>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 570CC: eherget, ggainey, pstudeni, shughes, tkasparek, tlestach
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: spacewalk-java-2.5.14-80-sat spacewalk-schema-2.5.1-49-sat satellite-schema-5.8.0.31-1-sat Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-21 12:17:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1340444, 1410775    
Attachments:
Description Flags
python api reproducer
none
Testcase - sample-data-creation
none
Testcase - sample-data-drop
none
Testcase - check-for-broken and fix-it functions none

Description Ales Dujicek 2016-10-05 08:34:29 UTC
Description of problem:

if you create configuration files using configchannel.createOrUpdatePath or system.config.createOrUpdatePath
in parallel, then duplicated entries in rhnConfigInfo table can be created


rhnConfigInfo contains multicolumn uniq index

rhnschema=# \d rhnconfiginfo
Indexes:
    "rhn_confinfo_ugf_uq" UNIQUE, btree (username, groupname, filemode, selinux_ctx, symlink_target_filename_id)

but in Postgres NULL is always uniq, so it can contain duplicated entries, e.g.
 id | username | groupname | filemode | symlink_target_filename_id | selinux_ctx 
----+----------+-----------+----------+----------------------------+-------------
rhnschema=# select id, username, groupname, filemode, symlink_target_filename_id, selinux_ctx from rhnconfiginfo;
  4 | xroot1   | root      |      644 |                     [NULL] | [NULL]
  6 | xroot1   | root      |      644 |                     [NULL] | [NULL]


which can cause error
*
ERROR at line 1:
ORA-00001: unique constraint (RHNSAT.RHN_CONFINFO_UGF_UQ) violated
during Postgres to Oracle migration


Version-Release number of selected component (if applicable):
spacewalk-java-2.3.8-153.el6sat.noarch
satellite-schema-5.7.0.24-1.el6sat.noarch

How reproducible:
always

Steps to Reproduce:
1. create several configuration files in parallel
2. check rhnconfiginfo content
select id, username, groupname, filemode, symlink_target_filename_id, selinux_ctx from rhnconfiginfo;

Actual results:
rhnconfiginfo allows duplicated entries

Expected results:
rhnconfiginfo contains really unique entries

Additional info:

Oracle:
SQL> insert into rhnconfiginfo (id, username, groupname, filemode) values (31, 'x', 'x', 644);
1 row created.
SQL> insert into rhnconfiginfo (id, username, groupname, filemode) values (32, 'x', 'x', 644);
insert into rhnconfiginfo (id, username, groupname, filemode) values (32, 'x', 'x', 644)
*
ERROR at line 1:
ORA-00001: unique constraint (RHNSAT.RHN_CONFINFO_UGF_UQ) violated

Postgres:
rhnschema=# insert into rhnconfiginfo (id, username, groupname, filemode) values (30, 'x', 'x', 644);
INSERT 0 1
rhnschema=# insert into rhnconfiginfo (id, username, groupname, filemode) values (31, 'x', 'x', 644);
INSERT 0 1

Comment 1 Ales Dujicek 2016-10-05 08:45:49 UTC
Created attachment 1207484 [details]
python api reproducer

adding reproducer to duplicate rhnconfiginfo entries

Comment 3 Grant Gainey 2017-01-05 15:49:55 UTC
This is going to be non-trivial to fix, due to the difference between the way Oracle and Postgres treat NULL and uniqueness, and the overloaded use of the rhnconfiginfo table.

* rhnconfiginfo's unique-contraint is in place to avoid duplicate entries for common information (e.g root/wheel/744)
* rhnconfiginfo is being overloaded to *also* track the filename of a symlink.
* in the latter context, 'NULL' means 'is not a symlink' - *not* 'this value is unknown at this time', which is what Postgres (and SQL) assume NULL means.

A better choice back in the mists of time would have been to add the symlink_target_filename_id to *rhnConfigFile*, rather than rhnConfigInfo. That ship, alas, has sailed.

The way to avoid duplication in Postgres is one of the following:

* Have two indices, one where the column is not-null and one where it is:

===
CREATE UNIQUE INDEX 
rhn_confinfo_ugf5_uq ON rhnConfigInfo
 (username, groupname, filemode, selinux_ctx, symlink_target_filename_id)
WHERE symlink_target_filename_id IS NOT NULL;

CREATE UNIQUE INDEX 
rhn_confinfo_ugf4_uq ON rhnConfigInfo
 (username, groupname, filemode, selinux_ctx)
WHERE symlink_target_filename_id IS NULL;
===

* store a known-not-used value into a column, rather than using NULL for that purpose (which loses the FK-ness of that column):

===
CREATE UNIQUE INDEX
rhn_confinfo_ugf5_uq ON rhnConfigInfo
 (username, groupname, filemode, selinux_ctx, 
  COALESCE(symlink_target_filename_id, 0));
===

Clearly, the first option is preferable.

Comment 4 Grant Gainey 2017-01-05 15:58:07 UTC
In fact, we already do exactly the first-option described for tables like rhnDistChannelMap - see spacewalk/postgres/tables/rhnDistChannelMap_index.sql for an example.

Comment 6 Grant Gainey 2017-01-06 13:05:26 UTC
Adding discussion from email, so it's all in one place:

=== ggainey: ===
Basically, Postgres has been allowing duplicates in rhnConfigInfo when it shouldn't, which makes migrating to Oracle impossible (since Oracle correctly refuses the dups). The cause is that a) Postgres treats NULL-in-unique-constraints differently than Oracle, and b) the constraint on this table doesn't "do the right thing" to get the desired behavior.

I have a fix for the schema, and a fix to the Java code that allows the dups to happen in the first place, and an upgrade script - yay!  HOWEVER - we need a way to fix existing DBs that have the problem, as part of the upgrade. And "just delete the dups" won't do, as those rows are referenced as foreign-keys by rhnConfigRevision.

As an example - 'symlink-id' and 'selinux-ctx' are the columns that can be null, and Postgres says that all nulls are unique. Therefore, we might have data that looks like this:

rhnConfigInfo:
id username group perms symlink-id selinux-ctx
1  auser    agrp  aperm null       null
2  auser    agrp  aperm null       null
3  buser    bgrp  bperm null       null
4  auser    agrp  aperm null       null
5  buser    bgrp  bperm null       null

rhnConfigRevision:
id  config_info_id
187   1
215   2
385   3
433   4
772   5

What we want to end up with, is

rhnConfigInfo
1  auser    agrp  aperm null       null
3  buser    bgrp  bperm null       null

rhnConfigRevision
187   1
215   1
385   3
433   1
772   3

with the other rows gone.  The hard part, of course, is updating the *rows in rhnConfigRevision* that have an affected config_info_id, to point at the appropriate *remaining* config_info_id. Ay de mi.

=== tlestach: ===
We may try to combine the duplicated rows. The question is, whether we
can be sure, what rows can be combined.

Risks:
1. Combining rows from different orgs (since neither rhnConfigInfo,
nor rhnConfigRevision contains org_id)
 - e.g. same symlink is configured in different orgs
2. Combining incorrectly lines with real changes (group, ctx, content, ...)

Just thoughts, not sure, if these are valid.

Do you see any other options than to combine the duplicated lines?


=== ggainey: ===

rhnConfigInfo data isn't org-specific - if two orgs have config-files with the same username/group/permissions/symlink/selinux, it's represented as one row. Basically, this entire constraint exists only to minimize the size of this table.

> Do you see any other options than to combine the duplicated lines?

Remove the constraint. Without it, the migration would Just Work even in the presence of duplicate rows. With the change I have to the Java side, the duplication is prevented on the app-side (at least in the API call context), so the table is unlikely to grow out of bounds. Otherwise, having dups in the table appears to be harmless (although I would need a bunch of testing to confirm).

I don't like having tables with duplicated data in them :( However, we've been allowing it to happen in postgres since 2010, apparently without harm. Because of the foreign-key constraints, fixing the data will definitely be...exciting.

Comment 8 Grant Gainey 2017-02-01 21:33:52 UTC
Created attachment 1246892 [details]
Testcase - sample-data-creation

Comment 9 Grant Gainey 2017-02-01 21:34:22 UTC
Created attachment 1246893 [details]
Testcase - sample-data-drop

Comment 10 Grant Gainey 2017-02-01 21:34:57 UTC
Created attachment 1246894 [details]
Testcase - check-for-broken and fix-it functions

Comment 11 Grant Gainey 2017-02-01 21:38:42 UTC
As noted in #c6, fixing the data in cases like this is...exciting. It is not, however, impossible.

Attached find three testing files:

* creates.sql - creates 2 tables, one with unwanted dups and one that points at them

* drop.sql - blows away the tables from creates.sql

* fix_t2.sql - contains 2 postgresql functions:
  * is_t1_broken() - returns true if t1 has dups
  * fix_t2() - removes dups from t1 and repoints t2 to retained rows

We can use these functions as either templates for per-table fixes, or perhaps someone can figure out how to generalize code 'like this' to handle arbitrary tables.

Comment 12 Grant Gainey 2017-05-08 17:42:39 UTC
* Fixed the API hole that allowed the dup-creation to happen in the first place
* Added (grossly ugly) upgrade script to remove dups when it's already happened
* Added corrected indices to make it impossible to create dups at all

spacewalk.github:
7923fef31657243adc77f723f9d8dbedff4c1fd3

Comment 15 Grant Gainey 2017-05-11 14:10:40 UTC
Added empty .ORA to make upgrade happy

spacewalk.github:
09122968f2a2c945028120a519896efd341fca80

Comment 17 Ales Dujicek 2017-05-24 08:47:10 UTC
Upgrade of Satellite 5.7 with duplicates in the database failed:

/var/log/spacewalk/schema-upgrade/20170524-043747-to-satellite-schema-5.8.log:
---------------------------------------------------------------------------------------
 satellite-schema-5.7-to-satellite-schema-5.8/012-rhnConfigInfo_indexes.sql.postgresql
(1 row)

CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
psql:/var/log/spacewalk/schema-upgrade/20170524-043747-script.sql:16819: INFO:  DUP FILE INFO found, selinux NULL
CONTEXT:  SQL statement "select is_cfginfo_broken()"
PL/pgSQL function fix_cfgrev_files() line 6 at SQL statement
psql:/var/log/spacewalk/schema-upgrade/20170524-043747-script.sql:16819: INFO:  ...REPOINTING RECORDS, rhnConfigInfo = 2
CONTEXT:  SQL statement "select fix_cfgrev_files_SELINUX_NULL()"
PL/pgSQL function fix_cfgrev_files() line 9 at SQL statement
psql:/var/log/spacewalk/schema-upgrade/20170524-043747-script.sql:16819: ERROR:  update or delete on table "rhnconfiginfo" violates foreign key constraint "rhn_confrevision_ciid_fk" on table "rhnconfigrevision"
DETAIL:  Key (id)=(4) is still referenced from table "rhnconfigrevision".
CONTEXT:  SQL statement "DELETE from rhnConfigInfo ci
             where 1=1
               and ci.username = good_username
               and ci.groupname = good_groupname
               and ci.filemode = good_filemode
               and ci.selinux_ctx is null
               and ci.id <> good_cfginfo_id"
PL/pgSQL function fix_cfgrev_files_selinux_null() line 56 at SQL statement
SQL statement "select fix_cfgrev_files_SELINUX_NULL()"
PL/pgSQL function fix_cfgrev_files() line 9 at SQL statement


what was in the database:

# select id, username, groupname, filemode, symlink_target_filename_id, selinux_ctx from rhnconfiginfo;
 id | username | groupname | filemode | symlink_target_filename_id | selinux_ctx
 
----+----------+-----------+----------+----------------------------+------------
-
  1 | zroot0   | root      |      644 |                            | 
  2 | zroot0   | root      |      644 |                            | 
  3 | zroot0   | root      |      644 |                            | 
  4 | zroot0   | root      |      644 |                            | 
  5 | zroot1   | root      |      644 |                            | 
  6 | zroot1   | root      |      644 |                            | 
  7 | zroot1   | root      |      644 |                            | 
  8 | zroot2   | root      |      644 |                            | 
  9 | zroot3   | root      |      644 |                            | 
 10 | zroot3   | root      |      644 |                            | 
(10 rows)

# select id, config_file_id, config_info_id  from rhnconfigrevision order by id;
 id | config_file_id | config_info_id 
----+----------------+----------------
  1 |              4 |              3
  2 |              3 |              2
  3 |              2 |              4
  4 |              1 |              1
  5 |              5 |              5
  6 |              8 |              5
  7 |              7 |              7
  8 |              6 |              6
  9 |              9 |              8
 10 |             11 |              8
 11 |             12 |              8
 12 |             10 |              8
 13 |             13 |              9
 14 |             14 |             10
 15 |             15 |             10
 16 |             16 |             10
(16 rows)

Comment 18 Ales Dujicek 2017-05-26 06:38:51 UTC
one more note

4528a69c0ed542924e4d3f9ebf724f9c52c24e90 has changed unique index to unique constraint on Oracle:

schema/spacewalk/common/tables/rhnConfigInfo.sql
-CREATE UNIQUE INDEX rhn_confinfo_ugf_uq
-    ON rhnConfigInfo (username, groupname, filemode, selinux_ctx, symlink_target_filename_id)
-    TABLESPACE [[4m_tbs]];
-

schema/spacewalk/oracle/tables/rhnConfigInfo_index.sql
+ALTER TABLE rhnConfigInfo
+    ADD CONSTRAINT rhn_confinfo_ugf_uq UNIQUE (username, groupname, filemode, selinux_ctx, symlink_target_filename_id);

I believe it is OK, just to be sure

Comment 19 Grant Gainey 2017-05-31 13:14:08 UTC
Ugh, nice catch - a last-minute decision that I didn't test correctly/adequately.

Also, the ORA index-to-constraint change was deliberate, and it works -
 but it's still suboptimal, it's better to stay consistent, and while the index doesn't gain us much, it doesn't cost much either. Setting it back to index.

spacewalk.github:
bf018be3d5c25f0ce9f993fe69b6a90dad2deb7d

Comment 22 Ales Dujicek 2017-06-02 12:13:36 UTC
now schema upgrade works even with duplicates in rhnConfigInfo

duplicates like this
# select id, username, groupname, filemode, symlink_target_filename_id, selinux_ctx from rhnconfiginfo;
 id  | username | groupname | filemode | symlink_target_filename_id | selinux_ct
x 
-----+----------+-----------+----------+----------------------------+-----------
--
  99 | 1-root0  | 1-root    |      644 |                            | 
 100 | 1-root0  | 1-root    |      644 |                            | 
  97 | 1-root0  | 1-root    |      644 |                            | 
  98 | 1-root0  | 1-root    |      644 |                            | 
 119 | 2-root0  | 2-root    |      644 |                            | sys_t
 120 | 2-root0  | 2-root    |      644 |                            | sys_t
 121 | 2-root0  | 2-root    |      644 |                            | sys_t
 122 | 2-root0  | 2-root    |      644 |                            | sys_t
 147 |          |           |          |                        263 | 
 148 |          |           |          |                        263 | 
 146 |          |           |          |                        263 | 
 149 |          |           |          |                        263 | 
 176 |          |           |          |                        316 | sys_t
 177 |          |           |          |                        316 | sys_t
 178 |          |           |          |                        316 | sys_t
...
(110 rows)

were reduced to
# select id, username, groupname, filemode, symlink_target_filename_id, selinux_ctx from rhnconfiginfo;
 id  | username | groupname | filemode | symlink_target_filename_id | selinux_ct
x 
-----+----------+-----------+----------+----------------------------+-----------
--
  97 | 1-root0  | 1-root    |      644 |                            | 
 119 | 2-root0  | 2-root    |      644 |                            | sys_t
 146 |          |           |          |                        263 | 
 176 |          |           |          |                        316 | sys_t
...
(32 rows)

it covers all possible options:
username+groupname, symlink_target_filename_id, selinux_ctx 
notnull, null, null
notnull, null, notnull
null, notnull, null
null, notnull, notnull



Verified

spacewalk-schema-2.5.1-49.el6sat.noarch
spacewalk-java-2.5.14-89.el6sat.noarch
satellite-schema-5.8.0.31-1.el6sat.noarch