Bug 2033128

Summary: Satellite Pulp2 to Pulp3 Migration Failed for Pulp Task Groups
Product: Red Hat Satellite Reporter: myoder
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED DUPLICATE QA Contact: Vladimír Sedmík <vsedmik>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.9.6CC: ahumbe, dalley, mjia
Target Milestone: UnspecifiedKeywords: Triaged, Upgrades
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-21 01:48:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description myoder 2021-12-16 01:33:52 UTC
Description of problem:

Pulp 2 to pulp 3 migration failed with the following error and stack trace in the dynflow console:

~~~
 Katello::Errors::Pulp3Error

5 subtask(s) failed for task group /pulp/api/v3/task-groups/68c6f9ab-a3ed-40fb-adb9-ced0e24ede1e/.

---
- "/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/app/lib/actions/pulp3/abstract_async_task.rb:102:in
  `block in check_for_errors'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/app/lib/actions/pulp3/abstract_async_task.rb:100:in
  `each'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/app/lib/actions/pulp3/abstract_async_task.rb:100:in
  `check_for_errors'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/app/lib/actions/pulp3/abstract_async_task.rb:133:in
  `poll_external_task'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/action/polling.rb:100:in
  `poll_external_task_with_rescue'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/action/polling.rb:22:in
  `run'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/action/cancellable.rb:14:in
  `run'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/app/lib/actions/pulp3/abstract_async_task.rb:10:in
  `run'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/action.rb:571:in
  `block (3 levels) in execute_run'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/middleware/stack.rb:27:in
  `pass'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/middleware.rb:19:in
  `pass'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/middleware.rb:32:in
  `run'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/middleware/stack.rb:23:in
  `call'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/middleware/stack.rb:27:in
  `pass'"
- "/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/middleware.rb:19:in
  `pass'"
~~~


From the /var/log/messages I see 5 errors with pulpcore for publishing repositories, which I believe are the subtasks the original error is complaining about.  I included some of the stack trace from the errors:

~~~
Dec 15 11:42:27 upunilxsat01 pulpcore-worker-1: pulp: pulp_rpm.app.tasks.publishing:INFO: Publishing: repository=rhel6_x86_64_jwebv3_view-JBoss_Web_Server_3_RHEL_6_Server_RPMs_x86_64_6Server, version=4
Dec 15 11:42:27 upunilxsat01 pulpcore-worker-1: pulp: rq.worker:ERROR: Traceback (most recent call last):
Dec 15 11:42:27 upunilxsat01 pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Dec 15 11:42:27 upunilxsat01 pulpcore-worker-1: rv = job.perform()
Dec 15 11:42:27 upunilxsat01 pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Dec 15 11:42:27 upunilxsat01 pulpcore-worker-1: self._result = self._execute()
Dec 15 11:42:27 upunilxsat01 pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
--
Dec 15 14:38:37 upunilxsat01 pulpcore-worker-1: pulp: pulp_rpm.app.tasks.publishing:INFO: Publishing: repository=rhel7_x86_64_jboss_view-JBoss_Web_Server_3_RHEL_7_Server_RPMs_x86_64_7Server, version=3
Dec 15 14:38:38 upunilxsat01 pulpcore-worker-1: pulp: rq.worker:ERROR: Traceback (most recent call last):
Dec 15 14:38:38 upunilxsat01 pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Dec 15 14:38:38 upunilxsat01 pulpcore-worker-1: rv = job.perform()
Dec 15 14:38:38 upunilxsat01 pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Dec 15 14:38:38 upunilxsat01 pulpcore-worker-1: self._result = self._execute()
Dec 15 14:38:38 upunilxsat01 pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
--
Dec 15 15:02:30 upunilxsat01 pulpcore-worker-4: pulp: pulp_rpm.app.tasks.publishing:INFO: Publishing: repository=rhel7_x86_64_opt_view-Red_Hat_Ansible_Engine_2_5_RPMs_for_Red_Hat_Enterprise_Linux_7_Server_x86_64, version=2
Dec 15 15:02:30 upunilxsat01 pulpcore-worker-4: pulp: rq.worker:ERROR: Traceback (most recent call last):
Dec 15 15:02:30 upunilxsat01 pulpcore-worker-4: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Dec 15 15:02:30 upunilxsat01 pulpcore-worker-4: rv = job.perform()
Dec 15 15:02:30 upunilxsat01 pulpcore-worker-4: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Dec 15 15:02:30 upunilxsat01 pulpcore-worker-4: self._result = self._execute()
Dec 15 15:02:30 upunilxsat01 pulpcore-worker-4: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
--
Dec 15 15:02:43 upunilxsat01 pulpcore-worker-4: pulp: pulp_rpm.app.tasks.publishing:INFO: Publishing: repository=rhel7_x86_64_opt_view-Red_Hat_Ansible_Engine_2_6_RPMs_for_Red_Hat_Enterprise_Linux_7_Server_x86_64, version=2
Dec 15 15:02:43 upunilxsat01 pulpcore-worker-4: pulp: rq.worker:ERROR: Traceback (most recent call last):
Dec 15 15:02:43 upunilxsat01 pulpcore-worker-4: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Dec 15 15:02:43 upunilxsat01 pulpcore-worker-4: rv = job.perform()
Dec 15 15:02:43 upunilxsat01 pulpcore-worker-4: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Dec 15 15:02:43 upunilxsat01 pulpcore-worker-4: self._result = self._execute()
Dec 15 15:02:43 upunilxsat01 pulpcore-worker-4: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
--
Dec 15 15:02:51 upunilxsat01 pulpcore-worker-4: pulp: pulp_rpm.app.tasks.publishing:INFO: Publishing: repository=rhel7_x86_64_opt_view-Red_Hat_Ansible_Engine_2_7_RPMs_for_Red_Hat_Enterprise_Linux_7_Server_x86_64, version=1
Dec 15 15:02:51 upunilxsat01 pulpcore-worker-4: pulp: rq.worker:ERROR: Traceback (most recent call last):
Dec 15 15:02:51 upunilxsat01 pulpcore-worker-4: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Dec 15 15:02:51 upunilxsat01 pulpcore-worker-4: rv = job.perform()
Dec 15 15:02:51 upunilxsat01 pulpcore-worker-4: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Dec 15 15:02:51 upunilxsat01 pulpcore-worker-4: self._result = self._execute()
Dec 15 15:02:51 upunilxsat01 pulpcore-worker-4: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
~~~



The "satellite-maintain content migration-stats" command said the migration was going to take 8 hours and 49 minutes:

   Estimated migration time based on yum  content: 8 hours, 49 minutes


But the migration had been running for about 22 hours before the failure.


From the dynflow console we see 110 Distribution Creations did not finish:

~~~
task_groups:
- pulp_href: "/pulp/api/v3/task-groups/68c6f9ab-a3ed-40fb-adb9-ced0e24ede1e/"
  description: Migration Sub-tasks
  all_tasks_dispatched: true
  waiting: 0
  skipped: 0
  running: 0
  completed: 450
  canceled: 0
  failed: 5
  group_progress_reports:
  - message: Repo version creation
    code: create.repo_version
    total: 3166
    done: 3166
  - message: Distribution creation
    code: create.distribution
    total: 3840
    done: 3730
~~~

From what I can tell, the progress reports section of the dynflow console were all ok except for the rpm content, where 1 did not complete:

~~~
  - message: Migrating rpm content to Pulp 3
    code: migrating.rpm.content
    state: completed
    total: 312465
    done: 312464
~~~


I did not find anything in the foreman/production.log, except for the error for failing task group (the 5 errors from the /var/log/messages were not appearing in the production.log):

~~~
2021-12-15T17:15:31 [E|bac|] 5 subtask(s) failed for task group /pulp/api/v3/task-groups/68c6f9ab-a3ed-40fb-adb9-ced0e24ede1e/. (Katello::Errors::Pulp3Error)
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/app/lib/actions/pulp3/abstract_async_task.rb:102:in `block in check_for_errors'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/app/lib/actions/pulp3/abstract_async_task.rb:100:in `each'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/app/lib/actions/pulp3/abstract_async_task.rb:100:in `check_for_errors'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/app/lib/actions/pulp3/abstract_async_task.rb:133:in `poll_external_task'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/action/polling.rb:100:in `poll_external_task_with_rescue'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/action/polling.rb:22:in `run'
~~~

Version-Release number of selected component (if applicable):
Red Hat Satellite 6.9.6

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 myoder 2021-12-16 01:35:25 UTC
This is the output from the Satellite console when running the pulp2 to pulp3 migration:


Checking for valid Katello configuraton.
Starting task.
2021-12-15 10:51:49 -0600: Repo version creation 752/3166
2021-12-15 11:44:55 -0600: Repo version creation 1136/3166
2021-12-15 13:32:26 -0600: Distribution creation 2486/3840
2021-12-15 14:30:33 -0600: Repo version creation 2351/3166
2021-12-15 14:44:25 -0600: Repo version creation 2442/3166
2021-12-15 16:34:26 -0600: Distribution creation 3443/3840
2021-12-15 16:36:57 -0600: Repo version creation 2962/3166
2021-12-15 17:13:10 -0600: Distribution creation 3728/3840
2021-12-15 17:15:30 -0600: Distribution creation 3728/3840Migration failed, You will want to investigate: https://upunilxsat01.unix.rgbk.com/foreman_tasks/tasks/2384f51f-00d4-4ac6-b430-69fcd723302d
rake aborted!
ForemanTasks::TaskError: Task 2384f51f-00d4-4ac6-b430-69fcd723302d: Katello::Errors::Pulp3Error: 5 subtask(s) failed for task group /pulp/api/v3/task-groups/68c6f9ab-a3ed-40fb-adb9-ced0e24ede1e/.
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.40/lib/katello/tasks/pulp3_migration.rake:42:in `block (2 levels) in <top (required)>'
/opt/rh/rh-ruby25/root/usr/share/gems/gems/rake-12.3.3/exe/rake:27:in `<top (required)>'
Tasks: TOP => katello:pulp3_migration
(See full trace by running task with --trace)
                                                                      [FAIL]
Failed executing foreman-rake katello:pulp3_migration, exit status 1
--------------------------------------------------------------------------------
Scenario [Prepare content for Pulp 3] failed.

The following steps ended up in failing state:

  [content-prepare]

Resolve the failed steps and rerun
the command. In case the failures are false positives,
use --whitelist="content-prepare"

Comment 2 Daniel Alley 2022-08-29 22:58:28 UTC
The actual issue is the repo publish failure, nothing to do with task groups.

As-is however, I'm not sure this is actionable as there are no attachments and the interesting parts of the logs are cut off.  Will leave open for now, though.

Comment 9 Daniel Alley 2022-10-21 01:48:30 UTC

*** This bug has been marked as a duplicate of bug 2027127 ***