Bug 2021255
Summary: | Satellite schedules one recurring InventorySync::Async::InventoryScheduledSync per org but each task syncs all orgs, resulting in harmless but unnecessary tasks | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Pablo Hess <phess> | ||||||
Component: | RH Cloud - Inventory | Assignee: | Shimon Shtein <sshtein> | ||||||
Status: | CLOSED ERRATA | QA Contact: | addubey | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 6.9.6 | CC: | achadha, ahumbe, aruzicka, bbuckingham, ben.argyle, dmule, jpathan, peter.vreman, pmoravec, sshtein, zhunting | ||||||
Target Milestone: | 6.11.0 | Keywords: | Triaged | ||||||
Target Release: | Unused | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 5.0.29 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 2027786 (view as bug list) | Environment: | |||||||
Last Closed: | 2022-07-05 14:30:00 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Pablo Hess
2021-11-08 16:52:18 UTC
Adding important observation / aspect from a reproducer: the extra tasks do fail with PG violation error: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_host_status_on_type_and_host_id" DETAIL: Key (type, host_id)=(InventorySync::InventoryStatus, 1052) already exists. I have two orgs (although one has no content hosts, as it's not in use yet) within my Satellite (6.9.6). It appears that both "Synchronize Automatically" for Insights AND "Automatic inventory upload" for Red Hat Inventory (both accessible from Satellite GUI "Configure") are _global_ settings, rather than per-organisation. That is, changing into an organisation and then setting the Insights "Synchronize Automatically" to "ON" or "OFF", and/or changing the slider on the Red Hat Inventory "Automatic inventory upload", and then changing to the other organisation and going to the same pages will show the same changes. Changing them back and going to the first organisation will show that they've changed back for the first organisation, too. Looking at my Recurring Logics seems to back this up as I only have one logic for "Inventory scheduled sync". In any event, it doesn't feel like this issue is what's causing the PG::UniqueViolation error. I've had more than one organisation for months, had been at 6.9.5, and then 6.9.6 for (I think) a week or more, before this issue occurred. It seems more to have coincided with the accident partial subscription/registration and then deletion of a few RHEL hosts (Ansible mess-up). As such it feels more like a Postgres table/row mismatch or corruption. This error only crops up when I register/subscribe a new host. After doing so overnight I then get this PG error. If I then delete/unsubscribe that host and then reregister it the issue goes away and the following overnights there are no further PG errors. The next time I register/subscribe a new host the PG error will appear overnight. If I register/subscribe two or more hosts it's only the last one to be done that generates the error (found using "hammer host info --id 1052 | grep -i fqdn"). It feels like there's an off-by-one error somewhere in Postgres' data. InventoryScheduledSync is indeed a singleton task that runs once a day. It initiates InventoryFullSync task per organization (in parallel) which is responsible for writing the statuses into the database. The duplicate status can be created by one of two reasons: Either you have two InventoryFullSync tasks for the same org that are running in parallel or there are duplicate uuids in subscription manager. If the same task does not fail in _some_ cases, I would go with the former (too much parallel tasks). I definitely only have one InventoryFullSync task. I would imagine I therefore have duplicate UUIDs. How do I find that out? So I registered two hosts to Satellite on Friday evening and set Satellite GUI "Configure" -> "Insights": Settings Synchronize Automatically to "OFF" on Friday 2021-11-12 before registering the RHEL VMs with Satellite. Did NOT disable Satellite GUI "Configure" -> "Inventory Upload": "Automatic inventory upload" InventorySync::Async::InventoryScheduledSync still ran, and produced a PG error. I unregistered and reregistered the host based on the ID on "hammer host info --id 1054 | grep -i fqdn", resumed the task, and as usual it completed. Next time we register a host I'll disable the latter and see if we get the PG error the next morning. So over night I got a _new/different_ PG error for the host I dealt with yesterday (ID 1054, reregistered as ID 1056). This is the first time a reregistered host has caused a PG error: 2021-11-16T00:00:02 [I|bac|15e4e6f8] Task {label: InventorySync::Async::InventoryScheduledSync, id: bbcc92d2-5bca-44a9-8a5b-a1abbd1a99d6, execution_plan_id: 05108322-0f33-4f94-9e06-b549ec6f6ffa} state changed: planning 2021-11-16T00:00:02 [I|bac|15e4e6f8] Task {label: InventorySync::Async::InventoryScheduledSync, id: bbcc92d2-5bca-44a9-8a5b-a1abbd1a99d6, execution_plan_id: 05108322-0f33-4f94-9e06-b549ec6f6ffa} state changed: planned 2021-11-16T00:00:02 [I|bac|7fd8835a] Task {label: InventorySync::Async::InventoryScheduledSync, id: b31f40d5-8da5-40e1-8d0d-a9211e445a6d, execution_plan_id: 1b89502b-6f0c-4af9-be4a-49a9e4312ced} state changed: scheduled 2021-11-16T00:00:02 [I|bac|15e4e6f8] Task {label: InventorySync::Async::InventoryScheduledSync, id: bbcc92d2-5bca-44a9-8a5b-a1abbd1a99d6, execution_plan_id: 05108322-0f33-4f94-9e06-b549ec6f6ffa} state changed: running 2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: planning 2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: planned 2021-11-16T00:00:02 [I|bac|bf6374ee] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 16bc5185-8fff-4493-8ebf-6c38a5176ddb, execution_plan_id: d992acb9-65f3-437c-b212-36397c205a93} state changed: scheduled 2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: running 2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: stopped result: su ccess 2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: stopped result: su ccess 2021-11-16T00:00:06 [I|app|cbe86607] Started GET "/redhat_access/r/insights/v1/branch_info" for 131.111.150.66 at 2021-11-16 00:00:06 +0000 2021-11-16T00:00:06 [I|app|cbe86607] Processing by InsightsCloud::Api::MachineTelemetriesController#branch_info as JSON 2021-11-16T00:00:06 [I|app|cbe86607] Completed 200 OK in 42ms (Views: 0.1ms | ActiveRecord: 5.6ms | Allocations: 11278) 2021-11-16T00:00:06 [I|app|b2add2eb] Started POST "/redhat_access/r/insights/platform/module-update-router/v1/event" for 131.111.150.66 at 2021-11-16 00:00:06 +0000 2021-11-16T00:00:06 [I|app|b2add2eb] Processing by InsightsCloud::Api::MachineTelemetriesController#forward_request as */* 2021-11-16T00:00:06 [I|app|b2add2eb] Parameters: {"core_version"=>"3.0.250", "exception"=>nil, "machine_id"=>"7ff8a626-4623-4952-b18a-dacd438d256e", "exit"=>0, "phase"=>"pre_update", "ended_at"=>"2021-11-16T00:00:06+00:00", "start ed_at"=>"2021-11-16T00:00:06+00:00", "core_path"=>"/var/lib/insights/last_stable.egg", "path"=>"platform/module-update-router/v1/event", "machine_telemetry"=>{"core_version"=>"3.0.250", "exception"=>nil, "machine_id"=>"7ff8a626-4623 -4952-b18a-dacd438d256e", "exit"=>0, "phase"=>"pre_update", "ended_at"=>"2021-11-16T00:00:06+00:00", "started_at"=>"2021-11-16T00:00:06+00:00", "core_path"=>"/var/lib/insights/last_stable.egg"}} 2021-11-16T00:00:08 [I|app|] Rails cache backend: File 2021-11-16T00:00:08 [I|app|b2add2eb] Completed 201 Created in 2238ms (Views: 0.1ms | ActiveRecord: 2.4ms | Allocations: 5747) 2021-11-16T00:00:09 [E|bac|15e4e6f8] PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_insights_facets_on_host_id" 15e4e6f8 | DETAIL: Key (host_id)=(1056) already exists. 15e4e6f8 | (ActiveRecord::RecordNotUnique) [RUBY STACK TRACE HERE] 15e4e6f8 | [ sidekiq ] 15e4e6f8 | [ concurrent-ruby ] 2021-11-16T00:00:09 [I|bac|15e4e6f8] Task {label: InventorySync::Async::InventoryScheduledSync, id: bbcc92d2-5bca-44a9-8a5b-a1abbd1a99d6, execution_plan_id: 05108322-0f33-4f94-9e06-b549ec6f6ffa} state changed: paused result: error I unregistered the host _again_ and resumed the task, which completed. I'll now reregister it again. This might have been down to the host not being deleted from the Insights Inventory before being resubscribed and "insights-client --register" being (re)run. Before reregistering it with Satellite (and running "insights-client --register" again) I also deleted its Insights Inventory entry. We'll see what happens tonight. I suppose it is the cause indeed - rh_cloud matches hosts by subscription-manager UUIDs, so if there are more than one host with the same sub-man UUID on the cloud side, it would fail on Sat side (I assume a sub-man UUID appears only once in the cloud response). I am still investigating this issue, @ @ben.argyle.ac.uk Can you please run the following commands for me? I want to understand the current state of the DB: Can you start a console session on you Satellite machine by running 'foreman-rake console' In the console please run the following command: Host.where(id: [1054, 1056]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') This should return the subscription uuids and cloud uuids for the problematic hosts. similar to this: => [[2, "my-cool-host", "subscription-uuid-a93b-ba57326a56a4", "cloud-uuid-1-a888-b400994f21c5"], [3, "my-second-host", "subscription-uuid-2-ba57326a56a4", "cloud-uuid-2-a888-b400994f21c5"]] Additionally, could you please attach the results of running a curl command: curl "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts/<cloud-uuid-1>,<cloud-uuid-2>" Replace the <username> and <password> with your console.redhat.com credentials and <cloud-uuid-1>,<cloud-uuid-2> with uuids from the query in the console. Thanks! Note that I had to unsubscribe/delete host 1057 due to PG errors and then resubscribe it, so it's now 1058. See Red Hat Support case 03040890 for further details: [root@satellite1 ~]# foreman-rake console Loading production environment (Rails 6.0.3.4) irb(main):001:0> Host.where(id: [1054, 1056]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [] irb(main):002:0> Host.where(id: [1056, 1057]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [] irb(main):003:0> Host.where(id: [1054, 1057]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [] irb(main):004:0> Host.where(id: [1056, 1057]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [] irb(main):005:0> Host.where(id: [1054, 1058]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [[1058, "migratetest-web1.<domain>", "1a56e482-ecdb-4cb6-917f-c9e27b8b6158", "8d8b4562-c34c-486e-a294-80f8304f6078"]] irb(main):006:0> Host.where(id: [1056, 1058]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [[1058, "migratetest-web1.<domain>", "1a56e482-ecdb-4cb6-917f-c9e27b8b6158", "8d8b4562-c34c-486e-a294-80f8304f6078"]] irb(main):007:0> Host.where(id: [1057, 1058]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [[1058, "migratetest-web1.<domain>", "1a56e482-ecdb-4cb6-917f-c9e27b8b6158", "8d8b4562-c34c-486e-a294-80f8304f6078"]] irb(main):008:0> So I don't think that's altogether useful. Do you want me to subscribe a new content host so I get a new PG duplicate issue? Below is the curl command: [root@satellite1 ~]# curl "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts/8d8b4562-c34c-486e-a294-80f8304f6078" {"total":1,"count":1,"page":1,"per_page":50,"results":[{"insights_id":"2f8f6c81-bc11-49a6-8238-0cc172a4391d","subscription_manager_id":"1a56e482-ecdb-4cb6-917f-c9e27b8b6158","satellite_id":"1a56e482-ecdb-4cb6-917f-c9e27b8b6158","bios_uuid":"4218ff00-358a-c599-05be-6ec2e53916ad","ip_addresses":["10.0.62.2"],"fqdn":"migratetest-web1.<domain>","mac_addresses":["00:50:56:98:55:a0"],"provider_id":null,"provider_type":null,"id":"8d8b4562-c34c-486e-a294-80f8304f6078","account":"<account number>","display_name":"migratetest-web1.<domain>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"<account number>","yupana_host_id":"5e9027b5-5201-48ea-80a8-b69e59d8f5b3","report_slice_id":"cfa5a64a-1a18-41fb-b4c6-1ef743c183ab","report_platform_id":"58d47af6-8a06-4015-9844-2471f8a5692e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"7.9","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-18T02:34:38.141148+00:00","stale_timestamp":"2021-11-19T07:34:38.100136+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-18T10:32:37.751545+00:00","stale_timestamp":"2021-11-19T16:31:32.660000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-19T16:31:32.660000+00:00","stale_warning_timestamp":"2021-11-26T16:31:32.660000+00:00","culled_timestamp":"2021-12-03T16:31:32.660000+00:00","created":"2021-11-17T09:13:19.036866+00:00","updated":"2021-11-18T10:32:37.751764+00:00"}]} [root@satellite1 ~]# I'm guessing this isn't particularly useful, sorry. I won't be able to comment again before 2021-11-23 so if there's more you want I'll aim to respond then. I subscribed a new host today. It'll be ID 1062 owing to two hosts being created whilst I was away (1059 and 1060) and 1060 needing to be unsubscribed and resubscribed as usual (thus becoming 1061). I'll have a/the PG issue tomorrow. I'll run the same commands tomorrow and see what I get. As expected I have the PG issue again: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_host_status_on_type_and_host_id" DETAIL: Key (type, host_id)=(InventorySync::InventoryStatus, 1062) already exists. I haven't deleted and resubscribed it yet (thus giving it ID 1063). Running a few of those commands I get: irb(main):001:0> Host.where(id: [1056, 1057]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [] irb(main):002:0> Host.where(id: [1058, 1059]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [[1058, "migratetest-web1.<domain.here>", "1a56e482-ecdb-4cb6-917f-c9e27b8b6158", "8d8b4562-c34c-486e-a294-80f8304f6078"], [1059, "csopselk2.<domain.here>", "47ce3c5c-bf9d-4669-995a-90e65c5e1e52", "dbe31264-4b6d-4097-ae5c-bbba24dca5ee"]] irb(main):003:0> Host.where(id: [1060, 1061]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [[1061, "csopselk1.<domain.here>", "c7b15bb0-4c6b-4839-91b5-c79a37f3521c", "5008c446-515d-4db4-822b-8dc7abc6fdf9"]] [root@satellite1 ~]# curl "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts/8d8b4562-c34c-486e-a294-80f8304f6078,dbe31264-4b6d-4097-ae5c-bbba24dca5ee,5008c446-515d-4db4-822b-8dc7abc6fdf9" {"total":3,"count":3,"page":1,"per_page":50,"results":[ {"insights_id":"2f8f6c81-bc11-49a6-8238-0cc172a4391d","subscription_manager_id":"1a56e482-ecdb-4cb6-917f-c9e27b8b6158","satellite_id":"1a56e482-ecdb-4cb6-917f-c9e27b8b6158","bios_uuid":"4218ff00-358a-c599-05be-6ec2e53916ad","ip_addresses":["10.0.62.2"],"fqdn":"migratetest-web1.<domain.here>","mac_addresses":["00:50:56:98:55:a0"],"provider_id":null,"provider_type":null,"id":"8d8b4562-c34c-486e-a294-80f8304f6078","account":"5688684","display_name":"migratetest-web1.<domain.here>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"5688684","yupana_host_id":"e1d3c2f0-a43b-4e6a-a805-1b0a27c542a3","report_slice_id":"d959e10d-3bfa-42d3-a4a7-b92b7152bbe0","report_platform_id":"547dcc38-1e2e-41bb-a0e2-50271f45b39e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"7.9","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-24T00:35:46.693690+00:00","stale_timestamp":"2021-11-25T05:35:46.650875+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-24T10:34:27.179737+00:00","stale_timestamp":"2021-11-25T16:33:03.165000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-25T16:33:03.165000+00:00","stale_warning_timestamp":"2021-12-02T16:33:03.165000+00:00","culled_timestamp":"2021-12-09T16:33:03.165000+00:00","created":"2021-11-17T09:13:19.036866+00:00","updated":"2021-11-24T10:34:27.180045+00:00"}, {"insights_id":"90de277a-583d-417d-a094-2d90d5e8ee06","subscription_manager_id":"47ce3c5c-bf9d-4669-995a-90e65c5e1e52","satellite_id":"47ce3c5c-bf9d-4669-995a-90e65c5e1e52","bios_uuid":"da7b1842-06b8-e47e-86a6-fa79d186e8e4","ip_addresses":["10.0.65.61"],"fqdn":"csopselk2.<domain.here>","mac_addresses":["00:50:56:98:b6:0d"],"provider_id":null,"provider_type":null,"id":"dbe31264-4b6d-4097-ae5c-bbba24dca5ee","account":"5688684","display_name":"csopselk2.<domain.here>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"5688684","yupana_host_id":"90fa5ffb-56eb-47b4-9830-aadb9b8abccc","report_slice_id":"d959e10d-3bfa-42d3-a4a7-b92b7152bbe0","report_platform_id":"547dcc38-1e2e-41bb-a0e2-50271f45b39e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"8.5","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-24T00:07:49.320493+00:00","stale_timestamp":"2021-11-25T05:07:49.274028+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-24T10:34:24.561689+00:00","stale_timestamp":"2021-11-25T16:33:03.165000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-25T16:33:03.165000+00:00","stale_warning_timestamp":"2021-12-02T16:33:03.165000+00:00","culled_timestamp":"2021-12-09T16:33:03.165000+00:00","created":"2021-11-22T13:13:13.102588+00:00","updated":"2021-11-24T10:34:24.562041+00:00"}, {"insights_id":"395cac43-2d1d-4c60-8f27-2b21d8a10c4a","subscription_manager_id":"c7b15bb0-4c6b-4839-91b5-c79a37f3521c","satellite_id":"c7b15bb0-4c6b-4839-91b5-c79a37f3521c","bios_uuid":"17a11842-096b-1eec-d805-2d54d82c0ccc","ip_addresses":["10.0.65.60"],"fqdn":"csopselk1.<domain.here>","mac_addresses":["00:50:56:98:38:f2"],"provider_id":null,"provider_type":null,"id":"5008c446-515d-4db4-822b-8dc7abc6fdf9","account":"5688684","display_name":"csopselk1.<domain.here>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"5688684","yupana_host_id":"200d4940-a14a-4e4d-8059-66b8b384b657","report_slice_id":"d959e10d-3bfa-42d3-a4a7-b92b7152bbe0","report_platform_id":"547dcc38-1e2e-41bb-a0e2-50271f45b39e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"8.5","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-24T00:18:14.971689+00:00","stale_timestamp":"2021-11-25T05:18:14.907454+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-24T10:34:23.629374+00:00","stale_timestamp":"2021-11-25T16:33:03.165000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-25T16:33:03.165000+00:00","stale_warning_timestamp":"2021-12-02T16:33:03.165000+00:00","culled_timestamp":"2021-12-09T16:33:03.165000+00:00","created":"2021-11-23T10:22:39.541880+00:00","updated":"2021-11-24T10:34:23.629627+00:00"} ]} Doesn't seem like it helps much. Is there anything else I can provide for you? Can we get the same query for 1062 and 1063? Basically I am looking for Satellite hosts that might share some properties with other hosts both on the cloud side and on the Satellite side. Ideally I would like to see also the whole dump of cloud hosts: curl "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts/?per_page=10000 > /tmp/rh_inventory_dump.txt". This will create the cloud dump and store it in the temp directory. Could you upload this file as an attachment to this case? In the meantime I am working on a workaround for your issue. As I said, I haven't deleted and resubscribed ID 1062 yet, so there isn't an ID 1063: irb(main):001:0> Host.where(id: [1062, 1063]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid') => [[1062, "equipment-sharing-live1.<domain.here>", "68056acb-6957-4f2f-bd49-c9250af88e9c", "f36e138a-d796-449c-9780-254e714cedeb"]] [root@satellite1 ~]# curl "https://<username:password>@console.redhat.com/api/inventory/v1/hosts/f36e138a-d796-449c-9780-254e714cedeb" {"total":1,"count":1,"page":1,"per_page":50,"results":[{"insights_id":"9e760ef7-11e3-49d4-bda9-b7a02afeb897","subscription_manager_id":"68056acb-6957-4f2f-bd49-c9250af88e9c","satellite_id":"68056acb-6957-4f2f-bd49-c9250af88e9c","bios_uuid":"dc541842-2a3d-e919-23a2-90a9fd833b5f","ip_addresses":["10.0.65.62"],"fqdn":"equipment-sharing-live1.<domain.here>","mac_addresses":["00:50:56:98:f5:66"],"provider_id":null,"provider_type":null,"id":"f36e138a-d796-449c-9780-254e714cedeb","account":"5688684","display_name":"equipment-sharing-live1.<domain.here>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"<account number>","yupana_host_id":"4f765a7c-27c7-481d-80b1-d757cb07a863","report_slice_id":"d959e10d-3bfa-42d3-a4a7-b92b7152bbe0","report_platform_id":"547dcc38-1e2e-41bb-a0e2-50271f45b39e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"8.5","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-24T01:17:55.569270+00:00","stale_timestamp":"2021-11-25T06:17:55.514466+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-24T10:34:27.198899+00:00","stale_timestamp":"2021-11-25T16:33:03.165000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-25T16:33:03.165000+00:00","stale_warning_timestamp":"2021-12-02T16:33:03.165000+00:00","culled_timestamp":"2021-12-09T16:33:03.165000+00:00","created":"2021-11-23T16:29:22.289747+00:00","updated":"2021-11-24T10:34:27.199177+00:00"}]} I tried the curl, but aside from the syntax being wrong (returned a 404 error due to the extra '/' char), when I did manage to get something that worked I got back "10000 is greater than the maximum of 100". However, I did see on another attempt {"total":652,"count":50,"page":1,"per_page":50 [...] so there are 652 records, even though I currently only have 501 content hosts registered with my Satellite. I assume I've deleted 151 hosts since I started using Insights? Please confirm this is the reason for the discrepancy. What I have done is curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=1" > /tmp/rh_inventory_dump.txt curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=2" >> /tmp/rh_inventory_dump.txt curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=3" >> /tmp/rh_inventory_dump.txt curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=4" >> /tmp/rh_inventory_dump.txt curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=5" >> /tmp/rh_inventory_dump.txt curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=6" >> /tmp/rh_inventory_dump.txt curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=7" >> /tmp/rh_inventory_dump.txt and then manually edited out the {"total":652,"count":100,"page":<page no>,"per_page":100,"results":[ stuff at the joins. And that's what's attached. I hope it proves useful. Created attachment 1843432 [details] Inventory dump, as requested in comment #15 Never mind. I see that some of the hosts listed are those that are outside of Satellite (we share our site subscription with another group who don't use Satellite). That would probably account for the number being higher than 501. First thank you for the dumps, it's really appreciated and helps us to dig into this issue! Well from looking at the dump I can see you have 501 hosts that belong to your Satellite indeed. On the other hand I can't see any duplication on neither Satellite or Cloud side. This leads to the only possible cause: more than one instance of sync task running simultaneously. To confirm this, can you please go to Monitor -> Tasks, press the "export" button and upload the tasks list. I am looking for related tasks that have overlapping execution times. I just hope I'm not causing duplication of effort given support case #03040890! How can this duplication of the sync task cause the PG error which is then _cured_ by reregistering the last subscribed content host? I can't understand the logic. In any event, please find attached my task export. Note that I haven't deleted/unsubscribed ID 1062 yet, or resumed the paused "Inventory scheduled sync" blocked by it. Created attachment 1843560 [details]
foreman tasks CSV export
I also once in a while the issue. There is only a single Task created, but that task spawns a concurrent Dynflow-steps InventorySync::Async::InventoryFullSync for each organization. And the output of this step for (in my case 2) organizations shows as output that it synced all hosts even that in my case the hosts are only assigned to org-id 3, also the task with org-id 1 (Default Organization) processed all hosts. Below the output of the 2 generaeted dynflow Steps, where youy can see they run concurrent and process both 77 hosts: ~~~ 3: InventorySync::Async::InventoryFullSync (success) [ 2.83s / 2.83s ] Queue: default Started at: 2021-11-25 00:00:15 UTC Ended at: 2021-11-25 00:00:18 UTC Real time: 2.83s Execution time (excluding suspended state): 2.83s Input: --- organization_id: 1 locale: en current_request_id: fc78af89-cbb1-46ad-a032-9ca265c10142 current_timezone: UTC current_user_id: 1 current_organization_id: current_location_id: Output: --- host_statuses: sync: 77 disconnect: 0 5: InventorySync::Async::InventoryFullSync (success) [ 3.47s / 3.47s ] Queue: default Started at: 2021-11-25 00:00:15 UTC Ended at: 2021-11-25 00:00:19 UTC Real time: 3.47s Execution time (excluding suspended state): 3.47s Input: --- organization_id: 3 locale: en current_request_id: fc78af89-cbb1-46ad-a032-9ca265c10142 current_timezone: UTC current_user_id: 1 current_organization_id: current_location_id: Output: --- host_statuses: sync: 77 disconnect: 0 ~~~ ~~~ irb(main):017:0> Host.unscoped.where(organization: 1).size => 0 irb(main):018:0> Host.unscoped.where(organization: 3).size => 77 irb(main):019:0> ~~~ Expected is that for organization_id 1 the output would be 'sync: 0' matching that there are no hosts assigned to org-id 1. Thanks a lot for the help. I have identified the issue, and now working on it and a way to get it delivered. The upstream solution is here: https://github.com/theforeman/foreman_rh_cloud/pull/668 Heroic work! Thank you. Is this something that will required an update of Satellite (to 6.10.z), or a script/similar I can run/do to my 6.9.6 install or the Postgres DB? Out of interest, what information lead you to the fix, please? It will be a code change, hence a package update will be required. Currently I can't tell in which Satellite versions this change will land and what upgrade process will be required. It was Peter's comment about different organizations that led me to look more into the way status records are generated. Understood. So the upshot for me is that it's likely I'm going to have this duplicate key issue with every content host I subscribe (until I then unsub and resub it) until this change makes it through to GA and I upgrade to the version of Satellite containing it. Frustrating, if that's the case, but if it is it'd be useful to know and then I can simply add that fact into my build process until then. The minor wrinkle will be that I wasn't going to upgrade Satellite to 6.10 (including the mandatory Pulp2->Pulp3 PG content upgrade) until this PG issue was fixed... (-: Can you therefore also confirm that I _don't_ have any Postgres database errors, and this is a functional issue rather than a data issue? If that's the case I'll begin planning for the upgrade to 6.10(.2), and start my Pulp 2 -> Pulp 3 migration. I applied both fixes https://github.com/theforeman/foreman_rh_cloud/pull/668 and https://github.com/theforeman/foreman/pull/8953 and can confirm that on 6.9.7 it is working now as expected, the default org that has no hosts assigned is now not uploading any hosts anymore. Also no db unique key constraint issues seen sofar I left the comment already in various other BZs to ask for a fix in 6.9.x series that does not force an upgrade to 6.10 pulp3, which is an supported and even recommneded to wait for 7.0 if planning a fresh installation, which is applicable to setup. See official documentation at https://access.redhat.com/documentation/en-us/red_hat_satellite/6.10/html-single/upgrading_and_updating_red_hat_satellite/index#upgrade_paths that avoiding pulp3 upgrades and go directly to rhel8+sat7+pulp3 is a supported: ~~~ For future upgrades following Satellite 7.0, you will be required to upgrade the operating system from RHEL 7 to RHEL 8 on your Satellite Servers and Capsules. You can upgrade the operating system in-place or through a cloning process. The latter includes migration of all data, configuration, and synced content. NOTE If you are planning to avoid the upgrade from Pulp 2 to Pulp 3 and deploy a new Satellite 6.10 infrastructure due to the Pulp 3 changes instead, you might want to wait for Satellite 7.0 to deploy with RHEL 8 directly. ~~~ @ben.argyle.ac.uk Yes, as far as I know, it's a functional issue, and it will be fixed in 6.10.3 (https://bugzilla.redhat.com/show_bug.cgi?id=2027786) if you want to track the specific change. Thanks a lot for your patience! I just want to make absolutely clear... I'll continue to get errors of the form PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_host_status_on_type_and_host_id" DETAIL: Key (type, host_id)=(InventorySync::InventoryStatus, XXXX) already exists. every time I add a new Content Host to Satellite until I upgrade to 6.10.3 (or 6.9.7?)? Just to note that since I upgraded to 6.9.7 I don't appear to be seeing these PG errors any more the day after registering new content hosts . Thank you! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5498 |