Bug 2021255

Summary:

Satellite schedules one recurring InventorySync::Async::InventoryScheduledSync per org but each task syncs all orgs, resulting in harmless but unnecessary tasks

Product:

Red Hat Satellite

Reporter:

Pablo Hess <phess>

Component:

RH Cloud - Inventory

Assignee:

Shimon Shtein <sshtein>

Status:

CLOSED ERRATA

QA Contact:

addubey

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.9.6

CC:

achadha, ahumbe, aruzicka, bbuckingham, ben.argyle, dmule, jpathan, peter.vreman, pmoravec, sshtein, zhunting

Target Milestone:

6.11.0

Keywords:

Triaged

Target Release:

Unused

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

5.0.29

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

2027786 (view as bug list)

Environment:

Last Closed:

2022-07-05 14:30:00 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Inventory dump, as requested in comment #15	none
foreman tasks CSV export	none

Description Pablo Hess 2021-11-08 16:52:18 UTC

Description of problem:
When multiple orgs exist on Satellite, each org that has auto inventory upload enabled will set up a recurring InventorySync::Async::InventoryScheduledSync task as recurring logic.
However, this task will upload the inventory of ALL orgs on a given satellite regardless of which org triggers it.
As a result, if one has 3 orgs with inventory auto upload enabled, a total of 3 tasks will be triggered at the same, each of which uploading the inventory of all 3 orgs.

Version-Release number of selected component (if applicable):
  foreman-2.3.1.24-1.el7sat.noarch
  tfm-rubygem-foreman_rh_cloud-3.0.26-1.el7sat.noarch


How reproducible:
100% of times.

Steps to Reproduce:
1. Create 2 orgs on a Satellite and configure the Red Hat Cloud plugin on Satellite.
2. Enable inventory auto-upload on both orgs.
3. Check the recurring logics on Satellite.

Actual results:
A separate "Inventory scheduled sync" recurring logic item will be created for each org. Each of these items, when triggered, will upload the full inventory.

Expected results:
Either have each task upload only a single org's inventory, or set up a single "Inventory scheduled sync" recurring logic for all orgs.

Comment 1 Pavel Moravec 2021-11-11 09:42:36 UTC

Adding important observation / aspect from a reproducer: the extra tasks do fail with PG violation error:

PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_host_status_on_type_and_host_id"
DETAIL:  Key (type, host_id)=(InventorySync::InventoryStatus, 1052) already exists.

Comment 2 Ben 2021-11-11 15:38:07 UTC

I have two orgs (although one has no content hosts, as it's not in use yet) within my Satellite (6.9.6).

It appears that both

"Synchronize Automatically" for Insights
AND
"Automatic inventory upload" for Red Hat Inventory
(both accessible from Satellite GUI "Configure")

are _global_ settings, rather than per-organisation. That is, changing into an organisation and then setting the Insights "Synchronize Automatically" to "ON" or "OFF", and/or changing the slider on the Red Hat Inventory "Automatic inventory upload", and then changing to the other organisation and going to the same pages will show the same changes. Changing them back and going to the first organisation will show that they've changed back for the first organisation, too.

Looking at my Recurring Logics seems to back this up as I only have one logic for "Inventory scheduled sync".

In any event, it doesn't feel like this issue is what's causing the PG::UniqueViolation error. I've had more than one organisation for months, had been at 6.9.5, and then 6.9.6 for (I think) a week or more, before this issue occurred. It seems more to have coincided with the accident partial subscription/registration and then deletion of a few RHEL hosts (Ansible mess-up). As such it feels more like a Postgres table/row mismatch or corruption. This error only crops up when I register/subscribe a new host. After doing so overnight I then get this PG error. If I then delete/unsubscribe that host and then reregister it the issue goes away and the following overnights there are no further PG errors. The next time I register/subscribe a new host the PG error will appear overnight. If I register/subscribe two or more hosts it's only the last one to be done that generates the error (found using "hammer host info --id 1052 | grep -i fqdn"). It feels like there's an off-by-one error somewhere in Postgres' data.

Comment 3 Shimon Shtein 2021-11-15 07:09:44 UTC

InventoryScheduledSync is indeed a singleton task that runs once a day.
It initiates InventoryFullSync task per organization (in parallel) which is responsible for writing the statuses into the database.

The duplicate status can be created by one of two reasons: Either you have two InventoryFullSync tasks for the same org that are running in parallel or there are duplicate uuids in subscription manager.
If the same task does not fail in _some_ cases, I would go with the former (too much parallel tasks).

Comment 4 Ben 2021-11-15 10:02:52 UTC

I definitely only have one InventoryFullSync task.  I would imagine I therefore have duplicate UUIDs.  How do I find that out?

Comment 5 Ben 2021-11-15 12:54:06 UTC

So I registered two hosts to Satellite on Friday evening and set

Satellite GUI "Configure" -> "Insights":
	Settings
	Synchronize Automatically to "OFF"

on Friday 2021-11-12 before registering the RHEL VMs with Satellite.

Did NOT disable
	
Satellite GUI "Configure" -> "Inventory Upload":
	"Automatic inventory upload"

InventorySync::Async::InventoryScheduledSync still ran, and produced a PG error.  I unregistered and reregistered the host based on the ID on "hammer host info --id 1054 | grep -i fqdn", resumed the task, and as usual it completed.  Next time we register a host I'll disable the latter and see if we get the PG error the next morning.

Comment 6 Ben 2021-11-16 08:36:21 UTC

So over night I got a _new/different_ PG error for the host I dealt with yesterday (ID 1054, reregistered as ID 1056).  This is the first time a reregistered host has caused a PG error:

2021-11-16T00:00:02 [I|bac|15e4e6f8] Task {label: InventorySync::Async::InventoryScheduledSync, id: bbcc92d2-5bca-44a9-8a5b-a1abbd1a99d6, execution_plan_id: 05108322-0f33-4f94-9e06-b549ec6f6ffa} state changed: planning
2021-11-16T00:00:02 [I|bac|15e4e6f8] Task {label: InventorySync::Async::InventoryScheduledSync, id: bbcc92d2-5bca-44a9-8a5b-a1abbd1a99d6, execution_plan_id: 05108322-0f33-4f94-9e06-b549ec6f6ffa} state changed: planned
2021-11-16T00:00:02 [I|bac|7fd8835a] Task {label: InventorySync::Async::InventoryScheduledSync, id: b31f40d5-8da5-40e1-8d0d-a9211e445a6d, execution_plan_id: 1b89502b-6f0c-4af9-be4a-49a9e4312ced} state changed: scheduled
2021-11-16T00:00:02 [I|bac|15e4e6f8] Task {label: InventorySync::Async::InventoryScheduledSync, id: bbcc92d2-5bca-44a9-8a5b-a1abbd1a99d6, execution_plan_id: 05108322-0f33-4f94-9e06-b549ec6f6ffa} state changed: running
2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: planning
2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: planned
2021-11-16T00:00:02 [I|bac|bf6374ee] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 16bc5185-8fff-4493-8ebf-6c38a5176ddb, execution_plan_id: d992acb9-65f3-437c-b212-36397c205a93} state changed: scheduled
2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: running
2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: stopped  result: su
ccess
2021-11-16T00:00:02 [I|bac|f8c3844b] Task {label: InsightsCloud::Async::InsightsClientStatusAging, id: 6a88e920-55ad-477a-bd06-47afeea5b205, execution_plan_id: 45df12cc-4a2a-4333-8cba-8b926ff9cdc9} state changed: stopped  result: su
ccess
2021-11-16T00:00:06 [I|app|cbe86607] Started GET "/redhat_access/r/insights/v1/branch_info" for 131.111.150.66 at 2021-11-16 00:00:06 +0000
2021-11-16T00:00:06 [I|app|cbe86607] Processing by InsightsCloud::Api::MachineTelemetriesController#branch_info as JSON
2021-11-16T00:00:06 [I|app|cbe86607] Completed 200 OK in 42ms (Views: 0.1ms | ActiveRecord: 5.6ms | Allocations: 11278)
2021-11-16T00:00:06 [I|app|b2add2eb] Started POST "/redhat_access/r/insights/platform/module-update-router/v1/event" for 131.111.150.66 at 2021-11-16 00:00:06 +0000
2021-11-16T00:00:06 [I|app|b2add2eb] Processing by InsightsCloud::Api::MachineTelemetriesController#forward_request as */*
2021-11-16T00:00:06 [I|app|b2add2eb]   Parameters: {"core_version"=>"3.0.250", "exception"=>nil, "machine_id"=>"7ff8a626-4623-4952-b18a-dacd438d256e", "exit"=>0, "phase"=>"pre_update", "ended_at"=>"2021-11-16T00:00:06+00:00", "start
ed_at"=>"2021-11-16T00:00:06+00:00", "core_path"=>"/var/lib/insights/last_stable.egg", "path"=>"platform/module-update-router/v1/event", "machine_telemetry"=>{"core_version"=>"3.0.250", "exception"=>nil, "machine_id"=>"7ff8a626-4623
-4952-b18a-dacd438d256e", "exit"=>0, "phase"=>"pre_update", "ended_at"=>"2021-11-16T00:00:06+00:00", "started_at"=>"2021-11-16T00:00:06+00:00", "core_path"=>"/var/lib/insights/last_stable.egg"}}
2021-11-16T00:00:08 [I|app|] Rails cache backend: File
2021-11-16T00:00:08 [I|app|b2add2eb] Completed 201 Created in 2238ms (Views: 0.1ms | ActiveRecord: 2.4ms | Allocations: 5747)
2021-11-16T00:00:09 [E|bac|15e4e6f8] PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_insights_facets_on_host_id"
 15e4e6f8 | DETAIL:  Key (host_id)=(1056) already exists.
 15e4e6f8 |  (ActiveRecord::RecordNotUnique)
[RUBY STACK TRACE HERE]
 15e4e6f8 | [ sidekiq ]
 15e4e6f8 | [ concurrent-ruby ]
2021-11-16T00:00:09 [I|bac|15e4e6f8] Task {label: InventorySync::Async::InventoryScheduledSync, id: bbcc92d2-5bca-44a9-8a5b-a1abbd1a99d6, execution_plan_id: 05108322-0f33-4f94-9e06-b549ec6f6ffa} state changed: paused  result: error

I unregistered the host _again_ and resumed the task, which completed.  I'll now reregister it again.

Comment 7 Ben 2021-11-16 08:41:33 UTC

This might have been down to the host not being deleted from the Insights Inventory before being resubscribed and "insights-client --register" being (re)run.  Before reregistering it with Satellite (and running "insights-client --register" again) I also deleted its Insights Inventory entry.  We'll see what happens tonight.

Comment 8 Shimon Shtein 2021-11-16 14:03:09 UTC

I suppose it is the cause indeed - rh_cloud matches hosts by subscription-manager UUIDs, so if there are more than one host with the same sub-man UUID on the cloud side, it would fail on Sat side (I assume a sub-man UUID appears only once in the cloud response).

Comment 10 Shimon Shtein 2021-11-18 16:17:34 UTC

I am still investigating this issue, @

Comment 11 Shimon Shtein 2021-11-18 16:41:25 UTC

@ben.argyle.ac.uk Can you please run the following commands for me? I want to understand the current state of the DB:

Can you start a console session on you Satellite machine by running 'foreman-rake console'

In the console please run the following command:

Host.where(id: [1054, 1056]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')


This should return the subscription uuids and cloud uuids for the problematic hosts.
similar to this:
=> [[2, "my-cool-host", "subscription-uuid-a93b-ba57326a56a4", "cloud-uuid-1-a888-b400994f21c5"],
    [3, "my-second-host", "subscription-uuid-2-ba57326a56a4", "cloud-uuid-2-a888-b400994f21c5"]]


Additionally, could you please attach the results of running a curl command:

curl "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts/<cloud-uuid-1>,<cloud-uuid-2>"

Replace the <username> and <password> with your console.redhat.com credentials and <cloud-uuid-1>,<cloud-uuid-2> with uuids from the query in the console.

Thanks!

Comment 12 Ben 2021-11-18 17:00:57 UTC

Note that I had to unsubscribe/delete host 1057 due to PG errors and then resubscribe it, so it's now 1058.  See Red Hat Support case 03040890 for further details:

[root@satellite1 ~]# foreman-rake console
Loading production environment (Rails 6.0.3.4)
irb(main):001:0> Host.where(id: [1054, 1056]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> []
irb(main):002:0> Host.where(id: [1056, 1057]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> []
irb(main):003:0> Host.where(id: [1054, 1057]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> []
irb(main):004:0> Host.where(id: [1056, 1057]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> []
irb(main):005:0> Host.where(id: [1054, 1058]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> [[1058, "migratetest-web1.<domain>", "1a56e482-ecdb-4cb6-917f-c9e27b8b6158", "8d8b4562-c34c-486e-a294-80f8304f6078"]]
irb(main):006:0> Host.where(id: [1056, 1058]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> [[1058, "migratetest-web1.<domain>", "1a56e482-ecdb-4cb6-917f-c9e27b8b6158", "8d8b4562-c34c-486e-a294-80f8304f6078"]]
irb(main):007:0> Host.where(id: [1057, 1058]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> [[1058, "migratetest-web1.<domain>", "1a56e482-ecdb-4cb6-917f-c9e27b8b6158", "8d8b4562-c34c-486e-a294-80f8304f6078"]]
irb(main):008:0>

So I don't think that's altogether useful.  Do you want me to subscribe a new content host so I get a new PG duplicate issue?

Below is the curl command:

[root@satellite1 ~]# curl "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts/8d8b4562-c34c-486e-a294-80f8304f6078"
{"total":1,"count":1,"page":1,"per_page":50,"results":[{"insights_id":"2f8f6c81-bc11-49a6-8238-0cc172a4391d","subscription_manager_id":"1a56e482-ecdb-4cb6-917f-c9e27b8b6158","satellite_id":"1a56e482-ecdb-4cb6-917f-c9e27b8b6158","bios_uuid":"4218ff00-358a-c599-05be-6ec2e53916ad","ip_addresses":["10.0.62.2"],"fqdn":"migratetest-web1.<domain>","mac_addresses":["00:50:56:98:55:a0"],"provider_id":null,"provider_type":null,"id":"8d8b4562-c34c-486e-a294-80f8304f6078","account":"<account number>","display_name":"migratetest-web1.<domain>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"<account number>","yupana_host_id":"5e9027b5-5201-48ea-80a8-b69e59d8f5b3","report_slice_id":"cfa5a64a-1a18-41fb-b4c6-1ef743c183ab","report_platform_id":"58d47af6-8a06-4015-9844-2471f8a5692e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"7.9","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-18T02:34:38.141148+00:00","stale_timestamp":"2021-11-19T07:34:38.100136+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-18T10:32:37.751545+00:00","stale_timestamp":"2021-11-19T16:31:32.660000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-19T16:31:32.660000+00:00","stale_warning_timestamp":"2021-11-26T16:31:32.660000+00:00","culled_timestamp":"2021-12-03T16:31:32.660000+00:00","created":"2021-11-17T09:13:19.036866+00:00","updated":"2021-11-18T10:32:37.751764+00:00"}]}
[root@satellite1 ~]#

I'm guessing this isn't particularly useful, sorry.  I won't be able to comment again before 2021-11-23 so if there's more you want I'll aim to respond then.

Comment 13 Ben 2021-11-23 17:18:22 UTC

I subscribed a new host today.  It'll be ID 1062 owing to two hosts being created whilst I was away (1059 and 1060) and 1060 needing to be unsubscribed and resubscribed as usual (thus becoming 1061).  I'll have a/the PG issue tomorrow.  I'll run the same commands tomorrow and see what I get.

Comment 14 Ben 2021-11-24 14:31:03 UTC

As expected I have the PG issue again:

PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_host_status_on_type_and_host_id"
DETAIL:  Key (type, host_id)=(InventorySync::InventoryStatus, 1062) already exists.

I haven't deleted and resubscribed it yet (thus giving it ID 1063).  Running a few of those commands I get:

irb(main):001:0> Host.where(id: [1056, 1057]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> []
irb(main):002:0> Host.where(id: [1058, 1059]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> [[1058, "migratetest-web1.<domain.here>", "1a56e482-ecdb-4cb6-917f-c9e27b8b6158", "8d8b4562-c34c-486e-a294-80f8304f6078"], 
    [1059, "csopselk2.<domain.here>", "47ce3c5c-bf9d-4669-995a-90e65c5e1e52", "dbe31264-4b6d-4097-ae5c-bbba24dca5ee"]]
irb(main):003:0> Host.where(id: [1060, 1061]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> [[1061, "csopselk1.<domain.here>", "c7b15bb0-4c6b-4839-91b5-c79a37f3521c", "5008c446-515d-4db4-822b-8dc7abc6fdf9"]]


[root@satellite1 ~]# curl "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts/8d8b4562-c34c-486e-a294-80f8304f6078,dbe31264-4b6d-4097-ae5c-bbba24dca5ee,5008c446-515d-4db4-822b-8dc7abc6fdf9"
{"total":3,"count":3,"page":1,"per_page":50,"results":[

{"insights_id":"2f8f6c81-bc11-49a6-8238-0cc172a4391d","subscription_manager_id":"1a56e482-ecdb-4cb6-917f-c9e27b8b6158","satellite_id":"1a56e482-ecdb-4cb6-917f-c9e27b8b6158","bios_uuid":"4218ff00-358a-c599-05be-6ec2e53916ad","ip_addresses":["10.0.62.2"],"fqdn":"migratetest-web1.<domain.here>","mac_addresses":["00:50:56:98:55:a0"],"provider_id":null,"provider_type":null,"id":"8d8b4562-c34c-486e-a294-80f8304f6078","account":"5688684","display_name":"migratetest-web1.<domain.here>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"5688684","yupana_host_id":"e1d3c2f0-a43b-4e6a-a805-1b0a27c542a3","report_slice_id":"d959e10d-3bfa-42d3-a4a7-b92b7152bbe0","report_platform_id":"547dcc38-1e2e-41bb-a0e2-50271f45b39e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"7.9","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-24T00:35:46.693690+00:00","stale_timestamp":"2021-11-25T05:35:46.650875+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-24T10:34:27.179737+00:00","stale_timestamp":"2021-11-25T16:33:03.165000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-25T16:33:03.165000+00:00","stale_warning_timestamp":"2021-12-02T16:33:03.165000+00:00","culled_timestamp":"2021-12-09T16:33:03.165000+00:00","created":"2021-11-17T09:13:19.036866+00:00","updated":"2021-11-24T10:34:27.180045+00:00"},

{"insights_id":"90de277a-583d-417d-a094-2d90d5e8ee06","subscription_manager_id":"47ce3c5c-bf9d-4669-995a-90e65c5e1e52","satellite_id":"47ce3c5c-bf9d-4669-995a-90e65c5e1e52","bios_uuid":"da7b1842-06b8-e47e-86a6-fa79d186e8e4","ip_addresses":["10.0.65.61"],"fqdn":"csopselk2.<domain.here>","mac_addresses":["00:50:56:98:b6:0d"],"provider_id":null,"provider_type":null,"id":"dbe31264-4b6d-4097-ae5c-bbba24dca5ee","account":"5688684","display_name":"csopselk2.<domain.here>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"5688684","yupana_host_id":"90fa5ffb-56eb-47b4-9830-aadb9b8abccc","report_slice_id":"d959e10d-3bfa-42d3-a4a7-b92b7152bbe0","report_platform_id":"547dcc38-1e2e-41bb-a0e2-50271f45b39e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"8.5","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-24T00:07:49.320493+00:00","stale_timestamp":"2021-11-25T05:07:49.274028+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-24T10:34:24.561689+00:00","stale_timestamp":"2021-11-25T16:33:03.165000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-25T16:33:03.165000+00:00","stale_warning_timestamp":"2021-12-02T16:33:03.165000+00:00","culled_timestamp":"2021-12-09T16:33:03.165000+00:00","created":"2021-11-22T13:13:13.102588+00:00","updated":"2021-11-24T10:34:24.562041+00:00"},

{"insights_id":"395cac43-2d1d-4c60-8f27-2b21d8a10c4a","subscription_manager_id":"c7b15bb0-4c6b-4839-91b5-c79a37f3521c","satellite_id":"c7b15bb0-4c6b-4839-91b5-c79a37f3521c","bios_uuid":"17a11842-096b-1eec-d805-2d54d82c0ccc","ip_addresses":["10.0.65.60"],"fqdn":"csopselk1.<domain.here>","mac_addresses":["00:50:56:98:38:f2"],"provider_id":null,"provider_type":null,"id":"5008c446-515d-4db4-822b-8dc7abc6fdf9","account":"5688684","display_name":"csopselk1.<domain.here>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"5688684","yupana_host_id":"200d4940-a14a-4e4d-8059-66b8b384b657","report_slice_id":"d959e10d-3bfa-42d3-a4a7-b92b7152bbe0","report_platform_id":"547dcc38-1e2e-41bb-a0e2-50271f45b39e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"8.5","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-24T00:18:14.971689+00:00","stale_timestamp":"2021-11-25T05:18:14.907454+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-24T10:34:23.629374+00:00","stale_timestamp":"2021-11-25T16:33:03.165000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-25T16:33:03.165000+00:00","stale_warning_timestamp":"2021-12-02T16:33:03.165000+00:00","culled_timestamp":"2021-12-09T16:33:03.165000+00:00","created":"2021-11-23T10:22:39.541880+00:00","updated":"2021-11-24T10:34:23.629627+00:00"}
]}

Doesn't seem like it helps much.  Is there anything else I can provide for you?

Comment 15 Shimon Shtein 2021-11-24 15:46:03 UTC

Can we get the same query for 1062 and 1063?

Basically I am looking for Satellite hosts that might share some properties with other hosts both on the cloud side and on the Satellite side.

Ideally I would like to see also the whole dump of cloud hosts: curl "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts/?per_page=10000 > /tmp/rh_inventory_dump.txt". This will create the cloud dump and store it in the temp directory. Could you upload this file as an attachment to this case?

In the meantime I am working on a workaround for your issue.

Comment 16 Ben 2021-11-24 16:31:51 UTC

As I said, I haven't deleted and resubscribed ID 1062 yet, so there isn't an ID 1063:

irb(main):001:0> Host.where(id: [1062, 1063]).joins(:subscription_facet, :insights).pluck('hosts.id', 'hosts.name', 'katello_subscription_facets.uuid', 'insights_facets.uuid')
=> [[1062, "equipment-sharing-live1.<domain.here>", "68056acb-6957-4f2f-bd49-c9250af88e9c", "f36e138a-d796-449c-9780-254e714cedeb"]]

[root@satellite1 ~]# curl "https://<username:password>@console.redhat.com/api/inventory/v1/hosts/f36e138a-d796-449c-9780-254e714cedeb"
{"total":1,"count":1,"page":1,"per_page":50,"results":[{"insights_id":"9e760ef7-11e3-49d4-bda9-b7a02afeb897","subscription_manager_id":"68056acb-6957-4f2f-bd49-c9250af88e9c","satellite_id":"68056acb-6957-4f2f-bd49-c9250af88e9c","bios_uuid":"dc541842-2a3d-e919-23a2-90a9fd833b5f","ip_addresses":["10.0.65.62"],"fqdn":"equipment-sharing-live1.<domain.here>","mac_addresses":["00:50:56:98:f5:66"],"provider_id":null,"provider_type":null,"id":"f36e138a-d796-449c-9780-254e714cedeb","account":"5688684","display_name":"equipment-sharing-live1.<domain.here>","ansible_host":null,"facts":[{"namespace":"yupana","facts":{"source":"Satellite","account":"<account number>","yupana_host_id":"4f765a7c-27c7-481d-80b1-d757cb07a863","report_slice_id":"d959e10d-3bfa-42d3-a4a7-b92b7152bbe0","report_platform_id":"547dcc38-1e2e-41bb-a0e2-50271f45b39e"}},{"namespace":"satellite","facts":{"organization_id":1,"satellite_version":"6.9.6","system_purpose_sla":"Standard","system_purpose_role":"Red Hat Enterprise Linux Server","distribution_version":"8.5","satellite_instance_id":"4665532b-dc66-44cd-8e11-cf3bcb95cfcf","is_hostname_obfuscated":false,"is_simple_content_access":false}}],"reporter":"yupana","per_reporter_staleness":{"puptoo":{"last_check_in":"2021-11-24T01:17:55.569270+00:00","stale_timestamp":"2021-11-25T06:17:55.514466+00:00","check_in_succeeded":true},"yupana":{"last_check_in":"2021-11-24T10:34:27.198899+00:00","stale_timestamp":"2021-11-25T16:33:03.165000+00:00","check_in_succeeded":true}},"stale_timestamp":"2021-11-25T16:33:03.165000+00:00","stale_warning_timestamp":"2021-12-02T16:33:03.165000+00:00","culled_timestamp":"2021-12-09T16:33:03.165000+00:00","created":"2021-11-23T16:29:22.289747+00:00","updated":"2021-11-24T10:34:27.199177+00:00"}]}

I tried the curl, but aside from the syntax being wrong (returned a 404 error due to the extra '/' char), when I did manage to get something that worked I got back "10000 is greater than the maximum of 100".  However, I did see on another attempt

{"total":652,"count":50,"page":1,"per_page":50 [...]

so there are 652 records, even though I currently only have 501 content hosts registered with my Satellite.  I assume I've deleted 151 hosts since I started using Insights?  Please confirm this is the reason for the discrepancy.  What I have done is

curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=1" > /tmp/rh_inventory_dump.txt
curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=2" >> /tmp/rh_inventory_dump.txt
curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=3" >> /tmp/rh_inventory_dump.txt
curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=4" >> /tmp/rh_inventory_dump.txt
curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=5" >> /tmp/rh_inventory_dump.txt
curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=6" >> /tmp/rh_inventory_dump.txt
curl -X GET "https://<username>:<password>@console.redhat.com/api/inventory/v1/hosts?per_page=100&page=7" >> /tmp/rh_inventory_dump.txt

and then manually edited out the 

{"total":652,"count":100,"page":<page no>,"per_page":100,"results":[ 

stuff at the joins.  And that's what's attached.  I hope it proves useful.

Comment 17 Ben 2021-11-24 16:33:07 UTC

Created attachment 1843432 [details]
Inventory dump, as requested in comment #15

Comment 18 Ben 2021-11-24 16:35:47 UTC

Never mind.  I see that some of the hosts listed are those that are outside of Satellite (we share our site subscription with another group who don't use Satellite).  That would probably account for the number being higher than 501.

Comment 19 Shimon Shtein 2021-11-24 20:31:12 UTC

First thank you for the dumps, it's really appreciated and helps us to dig into this issue!

Well from looking at the dump I can see you have 501 hosts that belong to your Satellite indeed.

On the other hand I can't see any duplication on neither Satellite or Cloud side. This leads to the only possible cause: more than one instance of sync task running simultaneously.

To confirm this, can you please go to Monitor -> Tasks, press the "export" button and upload the tasks list.
I am looking for related tasks that have overlapping execution times.

Comment 20 Ben 2021-11-25 10:23:21 UTC

I just hope I'm not causing duplication of effort given support case #03040890!

How can this duplication of the sync task cause the PG error which is then _cured_ by reregistering the last subscribed content host?  I can't understand the logic.

In any event, please find attached my task export.  Note that I haven't deleted/unsubscribed ID 1062 yet, or resumed the paused "Inventory scheduled sync" blocked by it.

Comment 21 Ben 2021-11-25 10:24:02 UTC

Created attachment 1843560 [details]
foreman tasks CSV export

Comment 22 Peter Vreman 2021-11-25 10:43:07 UTC

I also once in a while the issue. There is only a single Task created, but that task spawns a concurrent Dynflow-steps InventorySync::Async::InventoryFullSync for each organization. And the output of this step for (in my case 2) organizations shows as output that it synced all hosts even that in my case the hosts are only assigned to org-id 3, also the task with org-id 1 (Default Organization) processed all hosts.

Below the output of the 2 generaeted dynflow Steps, where youy can see they run concurrent and process both 77 hosts:
~~~
3: InventorySync::Async::InventoryFullSync (success) [ 2.83s / 2.83s ]
Queue: default
Started at: 2021-11-25 00:00:15 UTC
Ended at: 2021-11-25 00:00:18 UTC
Real time: 2.83s
Execution time (excluding suspended state): 2.83s

Input:
---
organization_id: 1
locale: en
current_request_id: fc78af89-cbb1-46ad-a032-9ca265c10142
current_timezone: UTC
current_user_id: 1
current_organization_id: 
current_location_id: 

Output:
---
host_statuses:
  sync: 77
  disconnect: 0

5: InventorySync::Async::InventoryFullSync (success) [ 3.47s / 3.47s ]
Queue: default

Started at: 2021-11-25 00:00:15 UTC
Ended at: 2021-11-25 00:00:19 UTC
Real time: 3.47s
Execution time (excluding suspended state): 3.47s

Input:
---
organization_id: 3
locale: en
current_request_id: fc78af89-cbb1-46ad-a032-9ca265c10142
current_timezone: UTC
current_user_id: 1
current_organization_id: 
current_location_id: 

Output:
---
host_statuses:
  sync: 77
  disconnect: 0
~~~

~~~
irb(main):017:0> Host.unscoped.where(organization: 1).size
=> 0
irb(main):018:0> Host.unscoped.where(organization: 3).size
=> 77
irb(main):019:0>
~~~

Expected is that for organization_id 1 the output would be 'sync: 0' matching that there are no hosts assigned to org-id 1.

Comment 25 Shimon Shtein 2021-11-29 13:37:32 UTC

Thanks a lot for the help.
I have identified the issue, and now working on it and a way to get it delivered.

The upstream solution is here: https://github.com/theforeman/foreman_rh_cloud/pull/668

Comment 26 Ben 2021-11-29 13:44:06 UTC

Heroic work!  Thank you.  Is this something that will required an update of Satellite (to 6.10.z), or a script/similar I can run/do to my 6.9.6 install or the Postgres DB?

Out of interest, what information lead you to the fix, please?

Comment 27 Shimon Shtein 2021-11-29 14:04:58 UTC

It will be a code change, hence a package update will be required. Currently I can't tell in which Satellite versions this change will land and what upgrade process will be required.

It was Peter's comment about different organizations that led me to look more into the way status records are generated.

Comment 28 Ben 2021-11-29 15:59:32 UTC

Understood.  So the upshot for me is that it's likely I'm going to have this duplicate key issue with every content host I subscribe (until I then unsub and resub it) until this change makes it through to GA and I upgrade to the version of Satellite containing it.

Frustrating, if that's the case, but if it is it'd be useful to know and then I can simply add that fact into my build process until then.  The minor wrinkle will be that I wasn't going to upgrade Satellite to 6.10 (including the mandatory Pulp2->Pulp3 PG content upgrade) until this PG issue was fixed... (-:

Comment 29 Ben 2021-12-01 12:33:26 UTC

Can you therefore also confirm that I _don't_ have any Postgres database errors, and this is a functional issue rather than a data issue?  If that's the case I'll begin planning for the upgrade to 6.10(.2), and start my Pulp 2 -> Pulp 3 migration.

Comment 30 Peter Vreman 2021-12-08 13:38:41 UTC

I applied both fixes https://github.com/theforeman/foreman_rh_cloud/pull/668 and https://github.com/theforeman/foreman/pull/8953 and can confirm that on 6.9.7 it is working now as expected, the default org that has no hosts assigned is now not uploading any hosts anymore. Also no db unique key constraint issues seen sofar

I left the comment already in various other BZs to ask for a fix in 6.9.x series that does not force an upgrade to 6.10 pulp3, which is an supported and even recommneded to wait for 7.0 if planning a fresh installation, which is applicable to setup.
See official documentation at https://access.redhat.com/documentation/en-us/red_hat_satellite/6.10/html-single/upgrading_and_updating_red_hat_satellite/index#upgrade_paths that avoiding pulp3 upgrades and go directly to rhel8+sat7+pulp3 is a supported:
~~~
For future upgrades following Satellite 7.0, you will be required to upgrade the operating system from RHEL 7 to RHEL 8 on your Satellite Servers and Capsules. You can upgrade the operating system in-place or through a cloning process. The latter includes migration of all data, configuration, and synced content.

NOTE
If you are planning to avoid the upgrade from Pulp 2 to Pulp 3 and deploy a new Satellite 6.10 infrastructure due to the Pulp 3 changes instead, you might want to wait for Satellite 7.0 to deploy with RHEL 8 directly.
~~~

Comment 31 Shimon Shtein 2021-12-12 11:26:47 UTC

@ben.argyle.ac.uk Yes, as far as I know, it's a functional issue, and it will be fixed in 6.10.3 (https://bugzilla.redhat.com/show_bug.cgi?id=2027786) if you want to track the specific change.

Thanks a lot for your patience!

Comment 32 Ben 2021-12-15 10:11:38 UTC

I just want to make absolutely clear... I'll continue to get errors of the form

PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_host_status_on_type_and_host_id"
DETAIL:  Key (type, host_id)=(InventorySync::InventoryStatus, XXXX) already exists.

every time I add a new Content Host to Satellite until I upgrade to 6.10.3 (or 6.9.7?)?

Comment 34 Ben 2022-01-24 11:36:31 UTC

Just to note that since I upgraded to 6.9.7 I don't appear to be seeing these PG errors any more the day after registering new content hosts .  Thank you!

Comment 38 errata-xmlrpc 2022-07-05 14:30:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498