Bug 1474293 - duplicate key value violates unique constraint "index_fact_names_on_name_and_type" on simulatenous host register
Summary: duplicate key value violates unique constraint "index_fact_names_on_name_and_...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Registration
Version: 6.2.11
Hardware: Unspecified
OS: Unspecified
medium
medium vote
Target Milestone: Unspecified
Assignee: Lukas Zapletal
QA Contact: Roman Plevka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-24 10:31 UTC by Roman Plevka
Modified: 2019-08-12 14:53 UTC (History)
18 users (show)

Fixed In Version: foreman-1.15.6.31-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-21 16:54:17 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Foreman Issue Tracker 21599 Normal Closed duplicate key value violates unique constraint "index_fact_names_on_name_and_type" on simulatenous host register 2020-06-10 09:32:01 UTC
Foreman Issue Tracker 21776 Normal Closed Fix fact names transaction handling 2020-06-10 09:32:00 UTC
Red Hat Bugzilla 1458857 None CLOSED Improved performance of Puppet and RHSM fact importers 2019-11-04 15:08:53 UTC
Red Hat Bugzilla 1480562 None None None 2019-11-04 15:08:53 UTC

Internal Links: 1458857 1480562

Description Roman Plevka 2017-07-24 10:31:15 UTC
Description of problem:
Our automation hit yet another race condition on registering the hosts simultaneously (2 hosts).
See attached foreman-debug:
production.log - see requests: 4c6a15a5 and 64762ef4 (which failed) (2017-07-20 21:05:54)


2017-07-20 21:05:51 64762ef4 [app] [I] Started POST "/rhsm/consumers?owner=Ag72NG&activation_keys=IsRwhiODa7" for 10.8.30.229 at 2017-07-20 21:05:51 -0400
2017-07-20 21:05:51 64762ef4 [app] [I] Processing by Katello::Api::Rhsm::CandlepinProxiesController#consumer_activate as JSON
...
...
2017-07-20 21:05:54  [audit] [I] [140318963445328.lab.eng.rdu2.redhat.com] added 153 (3963.0ms)
2017-07-20 21:05:54  [app] [I] Import facts for '140318963445328.lab.eng.rdu2.redhat.com' completed. Added: 153, Updated: 0, Deleted 0 facts
2017-07-20 21:05:54  [app] [E] Fact dmi::system::uuid could not be imported because of PG::Error: ERROR:  duplicate key value violates unique constraint "index_fact_names_on_name_and_type"
 | : INSERT INTO "fact_names" ("compose", "created_at", "name", "short_name", "type", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"
2017-07-20 21:05:54  [app] [E] Fact distribution::name could not be imported because of PG::Error: ERROR:  current transaction is aborted, commands ignored until end of transaction block
 | : SELECT  "fact_names".* FROM "fact_names"  WHERE "fact_names"."type" IN ('Katello::RhsmFactName') AND "fact_names"."name" = 'distribution' AND "fact_names"."type" = 'Katello::RhsmFactName'  ORDER BY fact_names.name LIMIT 1
...
...
2017-07-20 21:05:55  [audit] [I] [140599713026320.lab.eng.rdu2.redhat.com] added 166 (1966.6ms)
2017-07-20 21:05:55  [app] [W] Error during fact import for 140599713026320.lab.eng.rdu2.redhat.com
 | ActiveRecord::StatementInvalid: PG::Error: ERROR:  current transaction is aborted, commands ignored until end of transaction block
 | : SELECT  "fact_names".* FROM "fact_names"  WHERE "fact_names"."type" IN ('Katello::RhsmFactName') AND "fact_names"."name" = 'memor
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/postgresql_adapter.rb:822:in `ex
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/postgresql_adapter.rb:822:in `bl
...
...
2017-07-20 21:05:55  [foreman-tasks/action] [E] ERF50-2486 [Foreman::WrappedException]: Import of facts failed for host 14059971302632
 | /usr/share/foreman/app/services/fact_importer.rb:48:in `import!'
...
 | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_thread_pool_executor.rb:304:in `blo
 | /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `call'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_c
2017-07-20 21:05:56 64762ef4 [app] [I] Completed 500 Internal Server Error in 5045ms (Views: 0.6ms | ActiveRecord: 92.5ms)
2017-07-20 21:05:57 4c6a15a5 [app] [I] Completed 200 OK in 9308ms (Views: 2.9ms | ActiveRecord: 45.2ms)


Version-Release number of selected component (if applicable):
6.2.11-1

How reproducible:
sometimes

Steps to Reproduce:
hard to reproduce, however:
1. get 2 different hosts register to the same satellite at the same time, with the same org, env and activation key
2. making katello more busy might help, in our case, there were some POST requests to Environments and Content Views made just after the register requests.

Actual results:
duplicate key value violates unique constraint "index_fact_names_on_name_and_type"

Comment 7 Lukas Zapletal 2017-10-12 15:36:58 UTC
I think this might be related to: https://bugzilla.redhat.com/show_bug.cgi?id=1458857

Comment 11 Pavel Moravec 2017-10-13 10:48:22 UTC
Shimon, could you please review why the fact names were created (or updated?) as compose=false, while they have a "child" fact name? (see latest 3 (private) updates)

Comment 12 Pavel Moravec 2017-10-13 12:03:43 UTC
how to identify fact names that shall have composite=true but have it false:

su - postgres -c "psql foreman -c \"select * from fact_names;\"" > fact_names.txt

# generate list of fact names that should have compose=true
for i in $(awk '{ print $3 }' fact_names.txt | grep "::"); do echo ${i%::*}; done | sort -u > compose_fact_names.txt

# auxiliary file with fact name and "compose" flag only
cat fact_names.txt | awk '{ print $3" "$11 }' > fact_names_composeFlag.txt

# whatever is printed by this is bad (compose flag not set)
for i in $(cat compose_fact_names.txt); do if [ $(grep -c "^${i} t" fact_names_composeFlag.txt) -eq 0 ]; then echo $i; fi; done

Comment 13 Pavel Moravec 2017-10-13 12:11:34 UTC
I think this is another symptom of the same bug: importing facts from already registered host fails with:

2017-10-13 06:21:13 062eb762 [app] [W] Action failed
 | ActiveRecord::RecordInvalid: Validation failed: Description has already been taken, Title has already been taken
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/validations.rb:57:in `save!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/attribute_methods/dirty.rb:29:in `save!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `block in save!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:329:in `block in with_transaction_returning_status'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `block in transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:209:in `within_new_transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:208:in `transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:326:in `with_transaction_returning_status'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `save!'
 | /usr/share/foreman/app/services/puppet_fact_parser.rb:47:in `operatingsystem'
 | /usr/share/foreman/app/models/host/base.rb:185:in `block in set_non_empty_values'
 | /usr/share/foreman/app/models/host/base.rb:184:in `each'
 | /usr/share/foreman/app/models/host/base.rb:184:in `set_non_empty_values'
 | /usr/share/foreman/app/models/host/base.rb:177:in `populate_fields_from_facts'
 | /usr/share/foreman/app/models/host/managed.rb:481:in `populate_fields_from_facts'
 | /usr/share/foreman/app/models/host/base.rb:152:in `import_facts'
 | /usr/share/foreman/app/models/host/managed.rb:464:in `import_host_and_facts'
 | /usr/share/foreman/app/controllers/api/v2/hosts_controller.rb:254:in `facts'
..

But fact_names are OK wrt compose property, esp. operatingsystem:

$ grep operatingsystem fact_names.txt 
     9 | operatingsystem                                      | 2016-11-04 14:03:19.872207 | 2016-11-04 14:03:19.872207 | f       | operatingsystem               | FactName              | 
    10 | operatingsystemmajrelease                            | 2016-11-04 14:03:19.879504 | 2016-11-04 14:03:19.879504 | f       | operatingsystemmajrelease     | FactName              | 
    11 | operatingsystemrelease                               | 2016-11-04 14:03:19.887704 | 2016-11-04 14:03:19.887704 | f       | operatingsystemrelease        | FactName              | 
$


(maybe the host was uploading operatingsystem::* fact?)

Comment 14 Lukas Zapletal 2017-10-13 12:46:08 UTC
First, I'd recommend to put 'veth*' to the ignored interfaces Global Setting so Satellite is not wasting fact names on these interfaces. Then delete them all.

Anyway, It looks like the customer has already some facts which are unstructured. This was created before structured facts were introduced for rhsm perhaps. I think the issue is that this unique index is on NAME and TYPE and COMPOSE flag or ANCESTRY is not taken into account.

In your case Foreman is trying to find "dmi::meta::cpu_socket_count" using composed search (via its parent) and it finds nothing. Therefore it tries to create new fact name "dmi::meta::cpu_socket_count" with parent set and that fails.

We had no migration in order to convert plain facts into structured fact, therefore this is the reason why it fails. I think we need to have a migration which will convert all plain names into structured ones.

I can't think of a workaround other than deleting all fact values and names and starting over.

TLV office shutdown ends today, so we can let Shim to take a look on Sunday.

Comment 15 Pavel Moravec 2017-10-13 12:48:18 UTC
(In reply to Pavel Moravec from comment #12)
> how to identify fact names that shall have composite=true but have it false:
> 
> su - postgres -c "psql foreman -c \"select * from fact_names;\"" >
> fact_names.txt
> 
> # generate list of fact names that should have compose=true
> for i in $(awk '{ print $3 }' fact_names.txt | grep "::"); do echo ${i%::*};
> done | sort -u > compose_fact_names.txt
> 
> # auxiliary file with fact name and "compose" flag only
> cat fact_names.txt | awk '{ print $3" "$11 }' > fact_names_composeFlag.txt
> 
> # whatever is printed by this is bad (compose flag not set)
> for i in $(cat compose_fact_names.txt); do if [ $(grep -c "^${i} t"
> fact_names_composeFlag.txt) -eq 0 ]; then echo $i; fi; done

I was wrong here. Per lzap, compose=true if and only if ancestry is not empty (then I see this flag as redundant). To identify that kind of mis-set compose flag:

cat fact_names.txt | awk '{ print $3" "$11" "$17 }' | grep " f [0-9]" | cut -d' ' -f1

Comment 24 Shimon Shtein 2017-11-08 10:33:50 UTC
Created redmine issue http://projects.theforeman.org/issues/21599 from this bug

Comment 25 pm-sat@redhat.com 2017-11-09 17:16:18 UTC
Upstream bug assigned to sshtein@redhat.com

Comment 26 pm-sat@redhat.com 2017-11-09 17:16:23 UTC
Upstream bug assigned to sshtein@redhat.com

Comment 27 Lukas Zapletal 2017-11-13 15:27:37 UTC
Roman,

I have one theory I would like to test. In the app/services/fact_importer.rb file in import! method there is an explicit transaction:

  def import!
    ActiveRecord::Base.transaction do
      ensure_fact_names
      delete_removed_facts
      update_facts
      add_new_facts
    end

    # and the rest of the method
  end

Could you try to comment the two lines out:

  def import!
    ###ActiveRecord::Base.transaction do
      ensure_fact_names
      delete_removed_facts
      update_facts
      add_new_facts
    ###end

    # and the rest of the method
  end

After httpd restart, try to run your test suite. Can you tell that you are running into the issue less likely? I think (https://github.com/theforeman/foreman/pull/4996#issuecomment-343953000) that the root cause of this are explicit transactions.

Comment 28 Roman Plevka 2017-11-14 12:49:27 UTC
(In reply to Lukas Zapletal from comment #27)
> Roman,
> 
> I have one theory I would like to test. In the app/services/fact_importer.rb
> file in import! method there is an explicit transaction:
> 
>   def import!
>     ActiveRecord::Base.transaction do
>       ensure_fact_names
>       delete_removed_facts
>       update_facts
>       add_new_facts
>     end
> 
>     # and the rest of the method
>   end
> 
> Could you try to comment the two lines out:
> 
>   def import!
>     ###ActiveRecord::Base.transaction do
>       ensure_fact_names
>       delete_removed_facts
>       update_facts
>       add_new_facts
>     ###end
> 
>     # and the rest of the method
>   end
> 
> After httpd restart, try to run your test suite. Can you tell that you are
> running into the issue less likely? I think
> (https://github.com/theforeman/foreman/pull/4996#issuecomment-343953000)
> that the root cause of this are explicit transactions.

discussed this over irc - I was not even able to reproduce the issue without making the changes. I've mocked the /rhsm endpoint by sending hundreds of concurrent requests to it.

Comment 29 Lukas Zapletal 2017-11-14 13:41:45 UTC
No problem, I've put together a test that confirms my idea - this can be solved simply by removing the transaction block - see above.

https://gist.github.com/lzap/c84107e245250bf287f09c4f87d8b67c

We will discuss this further with Shim and come to some conclusion and patch.

Comment 30 Roman Plevka 2017-11-14 15:03:12 UTC
(In reply to Lukas Zapletal from comment #29)
> No problem, I've put together a test that confirms my idea - this can be
> solved simply by removing the transaction block - see above.
> 
> https://gist.github.com/lzap/c84107e245250bf287f09c4f87d8b67c
> 
> We will discuss this further with Shim and come to some conclusion and patch.

uhm, are you sure?
as i just gave it a go (thanks for the script btw) with the transaction block commented out as suggested above...yet i came across the RC:


2017-11-14 09:51:11 11cf9914 [app] [E] Fact random_fact_29303 could not be imported because of PG::Error: ERROR:  duplicate key value violates unique constraint "index_fact_names_on_name_and_type"
 | DETAIL:  Key (name, type)=(random_fact_29303, PuppetFactName) already exists.
 | : INSERT INTO "fact_names" ("type", "name", "short_name", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5) RETURNING "id"
2017-11-14 09:51:14 86e6798b [audit] [I] [fake-host-29294] added 600 (27693.8ms)

- i'll try to keep the satellite instance alive if you'd like to take a look.

Comment 31 Lukas Zapletal 2017-11-20 10:56:22 UTC
Are you sure you restarted? Hmmm I was not running into these. Thanks.

Comment 32 Lukas Zapletal 2017-11-20 10:57:26 UTC
Are you able to compare the rate of these errors before and after removal of the block? I'd expect them to appear much less often.

Comment 33 Roman Plevka 2017-11-23 16:30:33 UTC
(In reply to Lukas Zapletal from comment #32)
> Are you able to compare the rate of these errors before and after removal of
> the block? I'd expect them to appear much less often.

forgot to tell you, that I actually added an "&" to the end of the curl command in your script, which actually makes the requests simultaneous. Oh, and i generated a new host id in every iteration, not just the facts part.


if i use your script and iterate only through the facts sending (for the same hosts) in parallel, i'm getting the following:

2017-11-23 11:23:15 e57b2070 [app] [E] Fact random_fact_22376 could not be imported because of PG::Error: ERROR:  duplicate key value violates unique constraint "index_fact_values_on_fact_name_id_and_host_id"
 | DETAIL:  Key (fact_name_id, host_id)=(21670, 2143) already exists.
 | : INSERT INTO "fact_values" ("value", "fact_name_id", "host_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5) RETURNING "id"

(http://pastebin.test.redhat.com/534863)

Comment 34 Lukas Zapletal 2017-11-24 08:45:49 UTC
Ok thanks, sorry I should mention this - the script must be executed in parallel using some tool (like GNU parallel or screen). Nevermind we have a new patch incoming into the codebase now.

Comment 35 pm-sat@redhat.com 2017-11-27 09:18:28 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/21599 has been resolved.

Comment 36 Lukas Zapletal 2017-11-27 13:35:01 UTC
So we ended up in removing the transaction in 1.17 upstream version and improving the code for 1.18. If we choose to backport this into Satellite (and I vote to do so) we should remove transactions from 6.3 until we are sure it performs well.

Comment 37 Roman Plevka 2017-12-08 10:53:58 UTC
flipping back to ASSIGNED, since nothing has changed in downstream

Comment 38 Lukas Zapletal 2017-12-08 11:09:27 UTC
Roman, this BZ was cherry picked already. Please re-test:

    BZ #1474293 - duplicate key value violates unique constraint index_fact_names_on_name_and_type on simulatenous host registe

    Fixes #21599 - explicit transaction for import dropped

    (cherry picked from commit b5256c3a3cd23c50ef6d90e44ea0c45c2500cc00)

Comment 39 pm-sat@redhat.com 2017-12-08 11:16:45 UTC
Upstream bug assigned to lzap@redhat.com

Comment 46 Roman Plevka 2018-02-06 16:17:09 UTC
VERIFIED

on satellite-6.3.0-23.0.el7sat.noarch

the simultaneous run of script mentioned in comment #29
now seems to finished smoothly, with no race conditions

- tried on 100 simultaneous posts to /facts endpoint

Comment 47 pm-sat@redhat.com 2018-02-21 16:54:17 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> 
> For information on the advisory, and where to find the updated files, follow the link below.
> 
> If the solution does not work for you, open a new bug report.
> 
> https://access.redhat.com/errata/RHSA-2018:0336


Note You need to log in before you can comment on or make changes to this bug.