Bug 1474293 - duplicate key value violates unique constraint "index_fact_names_on_name_and_type" on simulatenous host register
duplicate key value violates unique constraint "index_fact_names_on_name_and_...
Status: CLOSED ERRATA
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Registration (Show other bugs)
6.2.11
Unspecified Unspecified
medium Severity medium (vote)
: GA
: --
Assigned To: Lukas Zapletal
Roman Plevka
: Regression, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-24 06:31 EDT by Roman Plevka
Modified: 2018-02-21 11:54 EST (History)
16 users (show)

See Also:
Fixed In Version: foreman-1.15.6.31-1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-02-21 11:54:17 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Foreman Issue Tracker 21599 None None None 2017-11-08 05:33 EST
Foreman Issue Tracker 21776 None None None 2018-01-18 07:06 EST

  None (edit)
Description Roman Plevka 2017-07-24 06:31:15 EDT
Description of problem:
Our automation hit yet another race condition on registering the hosts simultaneously (2 hosts).
See attached foreman-debug:
production.log - see requests: 4c6a15a5 and 64762ef4 (which failed) (2017-07-20 21:05:54)


2017-07-20 21:05:51 64762ef4 [app] [I] Started POST "/rhsm/consumers?owner=Ag72NG&activation_keys=IsRwhiODa7" for 10.8.30.229 at 2017-07-20 21:05:51 -0400
2017-07-20 21:05:51 64762ef4 [app] [I] Processing by Katello::Api::Rhsm::CandlepinProxiesController#consumer_activate as JSON
...
...
2017-07-20 21:05:54  [audit] [I] [140318963445328.lab.eng.rdu2.redhat.com] added 153 (3963.0ms)
2017-07-20 21:05:54  [app] [I] Import facts for '140318963445328.lab.eng.rdu2.redhat.com' completed. Added: 153, Updated: 0, Deleted 0 facts
2017-07-20 21:05:54  [app] [E] Fact dmi::system::uuid could not be imported because of PG::Error: ERROR:  duplicate key value violates unique constraint "index_fact_names_on_name_and_type"
 | : INSERT INTO "fact_names" ("compose", "created_at", "name", "short_name", "type", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"
2017-07-20 21:05:54  [app] [E] Fact distribution::name could not be imported because of PG::Error: ERROR:  current transaction is aborted, commands ignored until end of transaction block
 | : SELECT  "fact_names".* FROM "fact_names"  WHERE "fact_names"."type" IN ('Katello::RhsmFactName') AND "fact_names"."name" = 'distribution' AND "fact_names"."type" = 'Katello::RhsmFactName'  ORDER BY fact_names.name LIMIT 1
...
...
2017-07-20 21:05:55  [audit] [I] [140599713026320.lab.eng.rdu2.redhat.com] added 166 (1966.6ms)
2017-07-20 21:05:55  [app] [W] Error during fact import for 140599713026320.lab.eng.rdu2.redhat.com
 | ActiveRecord::StatementInvalid: PG::Error: ERROR:  current transaction is aborted, commands ignored until end of transaction block
 | : SELECT  "fact_names".* FROM "fact_names"  WHERE "fact_names"."type" IN ('Katello::RhsmFactName') AND "fact_names"."name" = 'memor
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/postgresql_adapter.rb:822:in `ex
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/postgresql_adapter.rb:822:in `bl
...
...
2017-07-20 21:05:55  [foreman-tasks/action] [E] ERF50-2486 [Foreman::WrappedException]: Import of facts failed for host 14059971302632
 | /usr/share/foreman/app/services/fact_importer.rb:48:in `import!'
...
 | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_thread_pool_executor.rb:304:in `blo
 | /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `call'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_c
2017-07-20 21:05:56 64762ef4 [app] [I] Completed 500 Internal Server Error in 5045ms (Views: 0.6ms | ActiveRecord: 92.5ms)
2017-07-20 21:05:57 4c6a15a5 [app] [I] Completed 200 OK in 9308ms (Views: 2.9ms | ActiveRecord: 45.2ms)


Version-Release number of selected component (if applicable):
6.2.11-1

How reproducible:
sometimes

Steps to Reproduce:
hard to reproduce, however:
1. get 2 different hosts register to the same satellite at the same time, with the same org, env and activation key
2. making katello more busy might help, in our case, there were some POST requests to Environments and Content Views made just after the register requests.

Actual results:
duplicate key value violates unique constraint "index_fact_names_on_name_and_type"
Comment 7 Lukas Zapletal 2017-10-12 11:36:58 EDT
I think this might be related to: https://bugzilla.redhat.com/show_bug.cgi?id=1458857
Comment 11 Pavel Moravec 2017-10-13 06:48:22 EDT
Shimon, could you please review why the fact names were created (or updated?) as compose=false, while they have a "child" fact name? (see latest 3 (private) updates)
Comment 12 Pavel Moravec 2017-10-13 08:03:43 EDT
how to identify fact names that shall have composite=true but have it false:

su - postgres -c "psql foreman -c \"select * from fact_names;\"" > fact_names.txt

# generate list of fact names that should have compose=true
for i in $(awk '{ print $3 }' fact_names.txt | grep "::"); do echo ${i%::*}; done | sort -u > compose_fact_names.txt

# auxiliary file with fact name and "compose" flag only
cat fact_names.txt | awk '{ print $3" "$11 }' > fact_names_composeFlag.txt

# whatever is printed by this is bad (compose flag not set)
for i in $(cat compose_fact_names.txt); do if [ $(grep -c "^${i} t" fact_names_composeFlag.txt) -eq 0 ]; then echo $i; fi; done
Comment 13 Pavel Moravec 2017-10-13 08:11:34 EDT
I think this is another symptom of the same bug: importing facts from already registered host fails with:

2017-10-13 06:21:13 062eb762 [app] [W] Action failed
 | ActiveRecord::RecordInvalid: Validation failed: Description has already been taken, Title has already been taken
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/validations.rb:57:in `save!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/attribute_methods/dirty.rb:29:in `save!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `block in save!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:329:in `block in with_transaction_returning_status'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `block in transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:209:in `within_new_transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:208:in `transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:326:in `with_transaction_returning_status'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `save!'
 | /usr/share/foreman/app/services/puppet_fact_parser.rb:47:in `operatingsystem'
 | /usr/share/foreman/app/models/host/base.rb:185:in `block in set_non_empty_values'
 | /usr/share/foreman/app/models/host/base.rb:184:in `each'
 | /usr/share/foreman/app/models/host/base.rb:184:in `set_non_empty_values'
 | /usr/share/foreman/app/models/host/base.rb:177:in `populate_fields_from_facts'
 | /usr/share/foreman/app/models/host/managed.rb:481:in `populate_fields_from_facts'
 | /usr/share/foreman/app/models/host/base.rb:152:in `import_facts'
 | /usr/share/foreman/app/models/host/managed.rb:464:in `import_host_and_facts'
 | /usr/share/foreman/app/controllers/api/v2/hosts_controller.rb:254:in `facts'
..

But fact_names are OK wrt compose property, esp. operatingsystem:

$ grep operatingsystem fact_names.txt 
     9 | operatingsystem                                      | 2016-11-04 14:03:19.872207 | 2016-11-04 14:03:19.872207 | f       | operatingsystem               | FactName              | 
    10 | operatingsystemmajrelease                            | 2016-11-04 14:03:19.879504 | 2016-11-04 14:03:19.879504 | f       | operatingsystemmajrelease     | FactName              | 
    11 | operatingsystemrelease                               | 2016-11-04 14:03:19.887704 | 2016-11-04 14:03:19.887704 | f       | operatingsystemrelease        | FactName              | 
$


(maybe the host was uploading operatingsystem::* fact?)
Comment 14 Lukas Zapletal 2017-10-13 08:46:08 EDT
First, I'd recommend to put 'veth*' to the ignored interfaces Global Setting so Satellite is not wasting fact names on these interfaces. Then delete them all.

Anyway, It looks like the customer has already some facts which are unstructured. This was created before structured facts were introduced for rhsm perhaps. I think the issue is that this unique index is on NAME and TYPE and COMPOSE flag or ANCESTRY is not taken into account.

In your case Foreman is trying to find "dmi::meta::cpu_socket_count" using composed search (via its parent) and it finds nothing. Therefore it tries to create new fact name "dmi::meta::cpu_socket_count" with parent set and that fails.

We had no migration in order to convert plain facts into structured fact, therefore this is the reason why it fails. I think we need to have a migration which will convert all plain names into structured ones.

I can't think of a workaround other than deleting all fact values and names and starting over.

TLV office shutdown ends today, so we can let Shim to take a look on Sunday.
Comment 15 Pavel Moravec 2017-10-13 08:48:18 EDT
(In reply to Pavel Moravec from comment #12)
> how to identify fact names that shall have composite=true but have it false:
> 
> su - postgres -c "psql foreman -c \"select * from fact_names;\"" >
> fact_names.txt
> 
> # generate list of fact names that should have compose=true
> for i in $(awk '{ print $3 }' fact_names.txt | grep "::"); do echo ${i%::*};
> done | sort -u > compose_fact_names.txt
> 
> # auxiliary file with fact name and "compose" flag only
> cat fact_names.txt | awk '{ print $3" "$11 }' > fact_names_composeFlag.txt
> 
> # whatever is printed by this is bad (compose flag not set)
> for i in $(cat compose_fact_names.txt); do if [ $(grep -c "^${i} t"
> fact_names_composeFlag.txt) -eq 0 ]; then echo $i; fi; done

I was wrong here. Per lzap, compose=true if and only if ancestry is not empty (then I see this flag as redundant). To identify that kind of mis-set compose flag:

cat fact_names.txt | awk '{ print $3" "$11" "$17 }' | grep " f [0-9]" | cut -d' ' -f1
Comment 24 Shimon Shtein 2017-11-08 05:33:50 EST
Created redmine issue http://projects.theforeman.org/issues/21599 from this bug
Comment 25 pm-sat@redhat.com 2017-11-09 12:16:18 EST
Upstream bug assigned to sshtein@redhat.com
Comment 26 pm-sat@redhat.com 2017-11-09 12:16:23 EST
Upstream bug assigned to sshtein@redhat.com
Comment 27 Lukas Zapletal 2017-11-13 10:27:37 EST
Roman,

I have one theory I would like to test. In the app/services/fact_importer.rb file in import! method there is an explicit transaction:

  def import!
    ActiveRecord::Base.transaction do
      ensure_fact_names
      delete_removed_facts
      update_facts
      add_new_facts
    end

    # and the rest of the method
  end

Could you try to comment the two lines out:

  def import!
    ###ActiveRecord::Base.transaction do
      ensure_fact_names
      delete_removed_facts
      update_facts
      add_new_facts
    ###end

    # and the rest of the method
  end

After httpd restart, try to run your test suite. Can you tell that you are running into the issue less likely? I think (https://github.com/theforeman/foreman/pull/4996#issuecomment-343953000) that the root cause of this are explicit transactions.
Comment 28 Roman Plevka 2017-11-14 07:49:27 EST
(In reply to Lukas Zapletal from comment #27)
> Roman,
> 
> I have one theory I would like to test. In the app/services/fact_importer.rb
> file in import! method there is an explicit transaction:
> 
>   def import!
>     ActiveRecord::Base.transaction do
>       ensure_fact_names
>       delete_removed_facts
>       update_facts
>       add_new_facts
>     end
> 
>     # and the rest of the method
>   end
> 
> Could you try to comment the two lines out:
> 
>   def import!
>     ###ActiveRecord::Base.transaction do
>       ensure_fact_names
>       delete_removed_facts
>       update_facts
>       add_new_facts
>     ###end
> 
>     # and the rest of the method
>   end
> 
> After httpd restart, try to run your test suite. Can you tell that you are
> running into the issue less likely? I think
> (https://github.com/theforeman/foreman/pull/4996#issuecomment-343953000)
> that the root cause of this are explicit transactions.

discussed this over irc - I was not even able to reproduce the issue without making the changes. I've mocked the /rhsm endpoint by sending hundreds of concurrent requests to it.
Comment 29 Lukas Zapletal 2017-11-14 08:41:45 EST
No problem, I've put together a test that confirms my idea - this can be solved simply by removing the transaction block - see above.

https://gist.github.com/lzap/c84107e245250bf287f09c4f87d8b67c

We will discuss this further with Shim and come to some conclusion and patch.
Comment 30 Roman Plevka 2017-11-14 10:03:12 EST
(In reply to Lukas Zapletal from comment #29)
> No problem, I've put together a test that confirms my idea - this can be
> solved simply by removing the transaction block - see above.
> 
> https://gist.github.com/lzap/c84107e245250bf287f09c4f87d8b67c
> 
> We will discuss this further with Shim and come to some conclusion and patch.

uhm, are you sure?
as i just gave it a go (thanks for the script btw) with the transaction block commented out as suggested above...yet i came across the RC:


2017-11-14 09:51:11 11cf9914 [app] [E] Fact random_fact_29303 could not be imported because of PG::Error: ERROR:  duplicate key value violates unique constraint "index_fact_names_on_name_and_type"
 | DETAIL:  Key (name, type)=(random_fact_29303, PuppetFactName) already exists.
 | : INSERT INTO "fact_names" ("type", "name", "short_name", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5) RETURNING "id"
2017-11-14 09:51:14 86e6798b [audit] [I] [fake-host-29294] added 600 (27693.8ms)

- i'll try to keep the satellite instance alive if you'd like to take a look.
Comment 31 Lukas Zapletal 2017-11-20 05:56:22 EST
Are you sure you restarted? Hmmm I was not running into these. Thanks.
Comment 32 Lukas Zapletal 2017-11-20 05:57:26 EST
Are you able to compare the rate of these errors before and after removal of the block? I'd expect them to appear much less often.
Comment 33 Roman Plevka 2017-11-23 11:30:33 EST
(In reply to Lukas Zapletal from comment #32)
> Are you able to compare the rate of these errors before and after removal of
> the block? I'd expect them to appear much less often.

forgot to tell you, that I actually added an "&" to the end of the curl command in your script, which actually makes the requests simultaneous. Oh, and i generated a new host id in every iteration, not just the facts part.


if i use your script and iterate only through the facts sending (for the same hosts) in parallel, i'm getting the following:

2017-11-23 11:23:15 e57b2070 [app] [E] Fact random_fact_22376 could not be imported because of PG::Error: ERROR:  duplicate key value violates unique constraint "index_fact_values_on_fact_name_id_and_host_id"
 | DETAIL:  Key (fact_name_id, host_id)=(21670, 2143) already exists.
 | : INSERT INTO "fact_values" ("value", "fact_name_id", "host_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5) RETURNING "id"

(http://pastebin.test.redhat.com/534863)
Comment 34 Lukas Zapletal 2017-11-24 03:45:49 EST
Ok thanks, sorry I should mention this - the script must be executed in parallel using some tool (like GNU parallel or screen). Nevermind we have a new patch incoming into the codebase now.
Comment 35 pm-sat@redhat.com 2017-11-27 04:18:28 EST
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/21599 has been resolved.
Comment 36 Lukas Zapletal 2017-11-27 08:35:01 EST
So we ended up in removing the transaction in 1.17 upstream version and improving the code for 1.18. If we choose to backport this into Satellite (and I vote to do so) we should remove transactions from 6.3 until we are sure it performs well.
Comment 37 Roman Plevka 2017-12-08 05:53:58 EST
flipping back to ASSIGNED, since nothing has changed in downstream
Comment 38 Lukas Zapletal 2017-12-08 06:09:27 EST
Roman, this BZ was cherry picked already. Please re-test:

    BZ #1474293 - duplicate key value violates unique constraint index_fact_names_on_name_and_type on simulatenous host registe

    Fixes #21599 - explicit transaction for import dropped

    (cherry picked from commit b5256c3a3cd23c50ef6d90e44ea0c45c2500cc00)
Comment 39 pm-sat@redhat.com 2017-12-08 06:16:45 EST
Upstream bug assigned to lzap@redhat.com
Comment 46 Roman Plevka 2018-02-06 11:17:09 EST
VERIFIED

on satellite-6.3.0-23.0.el7sat.noarch

the simultaneous run of script mentioned in comment #29
now seems to finished smoothly, with no race conditions

- tried on 100 simultaneous posts to /facts endpoint
Comment 47 pm-sat@redhat.com 2018-02-21 11:54:17 EST
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> 
> For information on the advisory, and where to find the updated files, follow the link below.
> 
> If the solution does not work for you, open a new bug report.
> 
> https://access.redhat.com/errata/RHSA-2018:0336

Note You need to log in before you can comment on or make changes to this bug.