Bug 1474293
Summary: | duplicate key value violates unique constraint "index_fact_names_on_name_and_type" on simulatenous host register | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Roman Plevka <rplevka> |
Component: | Registration | Assignee: | Lukas Zapletal <lzap> |
Status: | CLOSED ERRATA | QA Contact: | Roman Plevka <rplevka> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.2.11 | CC: | akarimi, andrew.schofield, bbuckingham, bkearney, ehelms, lzap, mlele, mmello, molasaga, pcreech, pmoravec, rplevka, sreber, sshtein, wclark, wpinheir, xdmoon, zhunting |
Target Milestone: | Unspecified | Keywords: | Regression, Triaged |
Target Release: | Unused | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | foreman-1.15.6.31-1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-02-21 16:54:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Roman Plevka
2017-07-24 10:31:15 UTC
I think this might be related to: https://bugzilla.redhat.com/show_bug.cgi?id=1458857 Shimon, could you please review why the fact names were created (or updated?) as compose=false, while they have a "child" fact name? (see latest 3 (private) updates) how to identify fact names that shall have composite=true but have it false: su - postgres -c "psql foreman -c \"select * from fact_names;\"" > fact_names.txt # generate list of fact names that should have compose=true for i in $(awk '{ print $3 }' fact_names.txt | grep "::"); do echo ${i%::*}; done | sort -u > compose_fact_names.txt # auxiliary file with fact name and "compose" flag only cat fact_names.txt | awk '{ print $3" "$11 }' > fact_names_composeFlag.txt # whatever is printed by this is bad (compose flag not set) for i in $(cat compose_fact_names.txt); do if [ $(grep -c "^${i} t" fact_names_composeFlag.txt) -eq 0 ]; then echo $i; fi; done I think this is another symptom of the same bug: importing facts from already registered host fails with: 2017-10-13 06:21:13 062eb762 [app] [W] Action failed | ActiveRecord::RecordInvalid: Validation failed: Description has already been taken, Title has already been taken | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/validations.rb:57:in `save!' | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/attribute_methods/dirty.rb:29:in `save!' | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `block in save!' | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:329:in `block in with_transaction_returning_status' | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `block in transaction' | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:209:in `within_new_transaction' | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `transaction' | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:208:in `transaction' | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:326:in `with_transaction_returning_status' | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `save!' | /usr/share/foreman/app/services/puppet_fact_parser.rb:47:in `operatingsystem' | /usr/share/foreman/app/models/host/base.rb:185:in `block in set_non_empty_values' | /usr/share/foreman/app/models/host/base.rb:184:in `each' | /usr/share/foreman/app/models/host/base.rb:184:in `set_non_empty_values' | /usr/share/foreman/app/models/host/base.rb:177:in `populate_fields_from_facts' | /usr/share/foreman/app/models/host/managed.rb:481:in `populate_fields_from_facts' | /usr/share/foreman/app/models/host/base.rb:152:in `import_facts' | /usr/share/foreman/app/models/host/managed.rb:464:in `import_host_and_facts' | /usr/share/foreman/app/controllers/api/v2/hosts_controller.rb:254:in `facts' .. But fact_names are OK wrt compose property, esp. operatingsystem: $ grep operatingsystem fact_names.txt 9 | operatingsystem | 2016-11-04 14:03:19.872207 | 2016-11-04 14:03:19.872207 | f | operatingsystem | FactName | 10 | operatingsystemmajrelease | 2016-11-04 14:03:19.879504 | 2016-11-04 14:03:19.879504 | f | operatingsystemmajrelease | FactName | 11 | operatingsystemrelease | 2016-11-04 14:03:19.887704 | 2016-11-04 14:03:19.887704 | f | operatingsystemrelease | FactName | $ (maybe the host was uploading operatingsystem::* fact?) First, I'd recommend to put 'veth*' to the ignored interfaces Global Setting so Satellite is not wasting fact names on these interfaces. Then delete them all. Anyway, It looks like the customer has already some facts which are unstructured. This was created before structured facts were introduced for rhsm perhaps. I think the issue is that this unique index is on NAME and TYPE and COMPOSE flag or ANCESTRY is not taken into account. In your case Foreman is trying to find "dmi::meta::cpu_socket_count" using composed search (via its parent) and it finds nothing. Therefore it tries to create new fact name "dmi::meta::cpu_socket_count" with parent set and that fails. We had no migration in order to convert plain facts into structured fact, therefore this is the reason why it fails. I think we need to have a migration which will convert all plain names into structured ones. I can't think of a workaround other than deleting all fact values and names and starting over. TLV office shutdown ends today, so we can let Shim to take a look on Sunday. (In reply to Pavel Moravec from comment #12) > how to identify fact names that shall have composite=true but have it false: > > su - postgres -c "psql foreman -c \"select * from fact_names;\"" > > fact_names.txt > > # generate list of fact names that should have compose=true > for i in $(awk '{ print $3 }' fact_names.txt | grep "::"); do echo ${i%::*}; > done | sort -u > compose_fact_names.txt > > # auxiliary file with fact name and "compose" flag only > cat fact_names.txt | awk '{ print $3" "$11 }' > fact_names_composeFlag.txt > > # whatever is printed by this is bad (compose flag not set) > for i in $(cat compose_fact_names.txt); do if [ $(grep -c "^${i} t" > fact_names_composeFlag.txt) -eq 0 ]; then echo $i; fi; done I was wrong here. Per lzap, compose=true if and only if ancestry is not empty (then I see this flag as redundant). To identify that kind of mis-set compose flag: cat fact_names.txt | awk '{ print $3" "$11" "$17 }' | grep " f [0-9]" | cut -d' ' -f1 Created redmine issue http://projects.theforeman.org/issues/21599 from this bug Upstream bug assigned to sshtein Upstream bug assigned to sshtein Roman, I have one theory I would like to test. In the app/services/fact_importer.rb file in import! method there is an explicit transaction: def import! ActiveRecord::Base.transaction do ensure_fact_names delete_removed_facts update_facts add_new_facts end # and the rest of the method end Could you try to comment the two lines out: def import! ###ActiveRecord::Base.transaction do ensure_fact_names delete_removed_facts update_facts add_new_facts ###end # and the rest of the method end After httpd restart, try to run your test suite. Can you tell that you are running into the issue less likely? I think (https://github.com/theforeman/foreman/pull/4996#issuecomment-343953000) that the root cause of this are explicit transactions. (In reply to Lukas Zapletal from comment #27) > Roman, > > I have one theory I would like to test. In the app/services/fact_importer.rb > file in import! method there is an explicit transaction: > > def import! > ActiveRecord::Base.transaction do > ensure_fact_names > delete_removed_facts > update_facts > add_new_facts > end > > # and the rest of the method > end > > Could you try to comment the two lines out: > > def import! > ###ActiveRecord::Base.transaction do > ensure_fact_names > delete_removed_facts > update_facts > add_new_facts > ###end > > # and the rest of the method > end > > After httpd restart, try to run your test suite. Can you tell that you are > running into the issue less likely? I think > (https://github.com/theforeman/foreman/pull/4996#issuecomment-343953000) > that the root cause of this are explicit transactions. discussed this over irc - I was not even able to reproduce the issue without making the changes. I've mocked the /rhsm endpoint by sending hundreds of concurrent requests to it. No problem, I've put together a test that confirms my idea - this can be solved simply by removing the transaction block - see above. https://gist.github.com/lzap/c84107e245250bf287f09c4f87d8b67c We will discuss this further with Shim and come to some conclusion and patch. (In reply to Lukas Zapletal from comment #29) > No problem, I've put together a test that confirms my idea - this can be > solved simply by removing the transaction block - see above. > > https://gist.github.com/lzap/c84107e245250bf287f09c4f87d8b67c > > We will discuss this further with Shim and come to some conclusion and patch. uhm, are you sure? as i just gave it a go (thanks for the script btw) with the transaction block commented out as suggested above...yet i came across the RC: 2017-11-14 09:51:11 11cf9914 [app] [E] Fact random_fact_29303 could not be imported because of PG::Error: ERROR: duplicate key value violates unique constraint "index_fact_names_on_name_and_type" | DETAIL: Key (name, type)=(random_fact_29303, PuppetFactName) already exists. | : INSERT INTO "fact_names" ("type", "name", "short_name", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5) RETURNING "id" 2017-11-14 09:51:14 86e6798b [audit] [I] [fake-host-29294] added 600 (27693.8ms) - i'll try to keep the satellite instance alive if you'd like to take a look. Are you sure you restarted? Hmmm I was not running into these. Thanks. Are you able to compare the rate of these errors before and after removal of the block? I'd expect them to appear much less often. (In reply to Lukas Zapletal from comment #32) > Are you able to compare the rate of these errors before and after removal of > the block? I'd expect them to appear much less often. forgot to tell you, that I actually added an "&" to the end of the curl command in your script, which actually makes the requests simultaneous. Oh, and i generated a new host id in every iteration, not just the facts part. if i use your script and iterate only through the facts sending (for the same hosts) in parallel, i'm getting the following: 2017-11-23 11:23:15 e57b2070 [app] [E] Fact random_fact_22376 could not be imported because of PG::Error: ERROR: duplicate key value violates unique constraint "index_fact_values_on_fact_name_id_and_host_id" | DETAIL: Key (fact_name_id, host_id)=(21670, 2143) already exists. | : INSERT INTO "fact_values" ("value", "fact_name_id", "host_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5) RETURNING "id" (http://pastebin.test.redhat.com/534863) Ok thanks, sorry I should mention this - the script must be executed in parallel using some tool (like GNU parallel or screen). Nevermind we have a new patch incoming into the codebase now. Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/21599 has been resolved. So we ended up in removing the transaction in 1.17 upstream version and improving the code for 1.18. If we choose to backport this into Satellite (and I vote to do so) we should remove transactions from 6.3 until we are sure it performs well. flipping back to ASSIGNED, since nothing has changed in downstream Roman, this BZ was cherry picked already. Please re-test: BZ #1474293 - duplicate key value violates unique constraint index_fact_names_on_name_and_type on simulatenous host registe Fixes #21599 - explicit transaction for import dropped (cherry picked from commit b5256c3a3cd23c50ef6d90e44ea0c45c2500cc00) Upstream bug assigned to lzap VERIFIED on satellite-6.3.0-23.0.el7sat.noarch the simultaneous run of script mentioned in comment #29 now seems to finished smoothly, with no race conditions - tried on 100 simultaneous posts to /facts endpoint Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
>
> For information on the advisory, and where to find the updated files, follow the link below.
>
> If the solution does not work for you, open a new bug report.
>
> https://access.redhat.com/errata/RHSA-2018:0336
|