Bug 1603166
| Summary: | Fail upgrading Satellite 6.3 to 6.4 beta | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Juan Manuel Parrilla Madrid <jparrill> | ||||||
| Component: | Tasks Plugin | Assignee: | satellite6-bugs <satellite6-bugs> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Peter Ondrejka <pondrejk> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.4 | CC: | aruzicka, inecas, jgiordan, jparrill, pep | ||||||
| Target Milestone: | Unspecified | Keywords: | Triaged, Upgrades | ||||||
| Target Release: | Unused | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2019-09-03 18:58:27 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Juan Manuel Parrilla Madrid
2018-07-19 10:59:58 UTC
Created attachment 1459993 [details]
Error upgradind satellite to 6.4
More details:
[root@sat ~]# sudo su - postgres -s /bin/bash -c "psql -d foreman -c '\d foreman_tasks_tasks'"
Table "public.foreman_tasks_tasks"
Column | Type | Modifiers
----------------+-----------------------------+-----------
id | character varying(255) |
type | character varying(255) | not null
label | character varying(255) |
started_at | timestamp without time zone |
ended_at | timestamp without time zone |
state | character varying(255) | not null
result | character varying(255) | not null
external_id | character varying(255) |
parent_task_id | character varying(255) |
start_at | timestamp without time zone |
start_before | timestamp without time zone |
action | character varying |
Indexes:
"index_foreman_tasks_id_state" btree (id, state)
"index_foreman_tasks_tasks_on_ended_at" btree (ended_at)
"index_foreman_tasks_tasks_on_external_id" btree (external_id)
"index_foreman_tasks_tasks_on_id" btree (id)
"index_foreman_tasks_tasks_on_label" btree (label)
"index_foreman_tasks_tasks_on_parent_task_id" btree (parent_task_id)
"index_foreman_tasks_tasks_on_result" btree (result)
"index_foreman_tasks_tasks_on_start_at" btree (start_at)
"index_foreman_tasks_tasks_on_start_before" btree (start_before)
"index_foreman_tasks_tasks_on_started_at" btree (started_at)
"index_foreman_tasks_tasks_on_state" btree (state)
"index_foreman_tasks_tasks_on_type" btree (type)
"index_foreman_tasks_tasks_on_type_and_label" btree (type, label)
[root@sat ~]# sudo su - postgres -s /bin/bash -c "psql -d foreman -c '\d foreman_tasks_locks'"
Table "public.foreman_tasks_locks"
Column | Type | Modifiers
---------------+------------------------+------------------------------------------------------------------
id | integer | not null default nextval('foreman_tasks_locks_id_seq'::regclass)
task_id | character varying(255) | not null
name | character varying(255) | not null
resource_type | character varying(255) |
resource_id | integer |
exclusive | boolean |
Indexes:
"foreman_tasks_locks_pkey" PRIMARY KEY, btree (id)
"index_foreman_tasks_locks_name_resource_type_resource_id" btree (name, resource_type, resource_id)
"index_foreman_tasks_locks_on_exclusive" btree (exclusive)
"index_foreman_tasks_locks_on_name" btree (name)
"index_foreman_tasks_locks_on_resource_type_and_resource_id" btree (resource_type, resource_id)
"index_foreman_tasks_locks_on_task_id" btree (task_id)
Ok, seems like thre was some tasks stuck in the database and the locks of these tasks stays there. Verifying that there is not running tasks and also the services stopped I perform this both shell commands: ``` sudo su - postgres -s /bin/bash -c "psql -d foreman -c 'delete from foreman_tasks_locks where id = 5592;'" sudo su - postgres -s /bin/bash -c "psql -d foreman -c 'delete from foreman_tasks_locks where id = 5593;'" ``` This is my case, the task id could vary. Ivan, Does foreman-maintain check for stuck tasks? In other words, should the documented workflow using foreman-maintain, help ensure users do not encounter this behavior? If so, this may be 'notabug'. I don't think that the unique issue and running tasks are actually related: the lock objects stay in the database even after the task finishes. I think though that this might be quite rare one, but we still should do make the migration more resilient to handle this corner case. Juan: could you share with us the content of the `foreman_tasks_locks` table before the upgrade for analysis of what data cause the issue and how frequent that might be? Created attachment 1460858 [details]
pg_dump of foreman_tasks_locks
I hit this too.
I was performing the upgrade from foreman-maintain, which indeed offered me to clean up tasks, which I accepted: this reported to have deleted 702 tasks, but apparently the tasks_locks table wasn't cleaned.
I'm also getting the same type of PG error:
[DEBUG 2018-07-19T11:15:58 main] PG::UniqueViolation: ERROR: could not create unique index "foreman_tasks_locks_pkey"
[DEBUG 2018-07-19T11:15:58 main] DETAIL: Key (id)=(5595) is duplicated.
Attaching a pg_dump of the foreman_tasks_locks table
I ran into this same issue with the same 2 tasks, deleted them and re-kicked off the install. My logs are here http://people.redhat.com/jgiordan/files/sat_migrate_logs.tar Looking at the database log, it looks quite odd, as even the table seems to have duplicate id records, even though it had the pkey defined on the column. It seems there has been a bug in postgres that could lead to it https://www.postgresql.org/about/news/1506/, but has been fixed for some time. The generic workaround is to: su - postgres -c 'psql -d foreman -c "delete from foreman_tasks_locks where id in (select id from foreman_tasks_locks group by id having count(id) > 1);"' Given we've not seen this so far with other large customer databases, I would suggest leaving this BZ open for now. If it turns out, that this would be more probable than it looks like right now, we could release a KCS article on this + implement a check to foreman-maintain to perform the cleanup before the upgrade. The Satellite Team is attempting to provide an accurate backlog of bugzilla requests which we feel will be resolved in the next few releases. We do not believe this bugzilla will meet that criteria, and have plans to close it out in 1 month. This is not a reflection on the validity of the request, but a reflection of the many priorities for the product. If you have any concerns about this, feel free to contact Red Hat Technical Support or your account team. If we do not hear from you, we will close this bug out. Thank you. Thank you for your interest in Satellite 6. We have evaluated this request, and while we recognize that it is a valid request, we do not expect this to be implemented in the product in the foreseeable future. This is due to other priorities for the product, and not a reflection on the request itself. We are therefore closing this out as WONTFIX. If you have any concerns about this, please do not reopen. Instead, feel free to contact Red Hat Technical Support. Thank you. |