Bug 1279502 - Pulp tasks randomly stuck at waiting or running
Summary: Pulp tasks randomly stuck at waiting or running
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.1.4
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Tazim Kolhar
URL:
Whiteboard:
Depends On: 1281546 1281551
Blocks: 1122832
TreeView+ depends on / blocked
 
Reported: 2015-11-09 15:17 UTC by Brian Bouterse
Modified: 2022-07-09 07:29 UTC (History)
28 users (show)

Fixed In Version: python-qpid-0.30-7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-15 09:20:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to cause qpid.messaging issue to reproduce (210 bytes, patch)
2015-12-02 17:03 UTC, Brian Bouterse
no flags Details | Diff
pending state (90.98 KB, image/png)
2015-12-03 10:01 UTC, Tazim Kolhar
no flags Details
selector.patch (1.11 KB, patch)
2015-12-04 14:08 UTC, Brian Bouterse
no flags Details | Diff
new selector patch against 0.30-7 (1.49 KB, patch)
2015-12-07 17:17 UTC, Ken Giusti
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 1363 0 High CLOSED - NOTABUG tasks randomly stuck at waiting or running 2016-04-19 12:04:06 UTC
Red Hat Knowledge Base (Solution) 2044393 0 None None None Never
Red Hat Product Errata RHSA-2015:2622 0 normal SHIPPED_LIVE Moderate: Satellite 6.1.5 bug fix update 2015-12-15 14:17:45 UTC

Internal Links: 1264330

Description Brian Bouterse 2015-11-09 15:17:34 UTC
A sat6 system will be working normally. It will process many tasks for a long period of time. Then randomly a single Pulp process (a pulp worker, the resource manager, etc) will halt, seeming to deadlock. If a task is in the running state it will stay running. If it is in the waiting state, it will stay at waiting.

This is very likely to be due to fork safety issues between celery's forking behavior and qpid.messaging's threading and file descriptor usage.

workaround: restarting the stuck process will mark the task as cancelled if its already running, and work will resume normally until it happens randomly in the future.

Comment 1 pulp-infra@redhat.com 2015-11-09 15:30:25 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 2 pulp-infra@redhat.com 2015-11-09 15:30:28 UTC
The Pulp upstream bug priority is at High. Updating the external tracker on this bug.

Comment 6 Brian Bouterse 2015-11-12 17:02:16 UTC
Regarding the safety of cancellation as a workaround:

For pulp's purposes cancellation is always safe. A katello developer says that if a Pulp task is cancelled, Katello will realize it was cancelled and handle it as if the task failed. Depending on the task type Katello will complete the dynflow task and indicate part of it failed, or Katello will halt the dynflow task waiting for the user to take an action. In all cases, the user can resume the Katello dynflow operation, or start a new one and things will be OK.

Also, regarding explicit cancellation versus worker restart, restarting the pulp services (pulp_*) will cause tasks that Pulp will not pick up after restart to cancel. Generally this will be easier than cancelling tasks one by one.

Comment 11 Tazim Kolhar 2015-11-30 15:28:12 UTC
Hi,

please provide verification steps

thanks and regards,
Tazim

Comment 13 Pavel Moravec 2015-12-02 13:38:24 UTC
(In reply to Tazim Kolhar from comment #11)
> Hi,
> 
> please provide verification steps
> 
> thanks and regards,
> Tazim

Hi Ken,
do you know reproducer for the python-qpid bug? Either Satellite's reproducer (doubt you know it :) ), or a python script or usage of python-qpid in words (such that I can come up with reprodure script tomorrow)?

Comment 14 Ken Giusti 2015-12-02 14:51:37 UTC
Hi, 

Unfortunately, I didn't find a simple reproducer for this.  The fix in python-qpid-0.30-7 is hard to trigger.  I've tried sending various signals to the selector process, but these all get routed to the main thread instead - which I think is python's expected behavior.

I ended up modifying the code to insert an raise statement that would artificially trigger the bug and prove the fix.

Frantisek originally uncovered this bug a while ago (we hadn't back ported it to qpid 0.30 - this bug fixes that).

The original bug:  

https://bugzilla.redhat.com/show_bug.cgi?id=1249608

Sadly the original bug didn't have a reproducer associated with it

:(

Sorry!

Comment 15 Brian Bouterse 2015-12-02 17:03:57 UTC
Created attachment 1101566 [details]
patch to cause qpid.messaging issue to reproduce

patch onto /usr/lib/python2.7/site-packages/qpid/selector.py

Comment 16 Brian Bouterse 2015-12-02 17:07:59 UTC
Recalling from an e-mail based discussion the QE plan for this bug is to manually patch the qpid/selector.py file to produce the issue on startup. This is because the issue occurs so infrequently.

I saved a diff from when I made this modification, and uploaded it as Comment 15. Here are some steps to reproduce:

To reproduce the original problem:
1. Install the non-hotfixed version of python-qpid
2. Apply the patch from comment 15
3. katello restart all services
4. have katello issue a sync
5. observe the katello sync never completes

To verify the fixed issue:
1. Install the hotfixed version of python-qpid
2. Apply the patch from comment 15 (causes the issue to show itself)
3. katello restart all services
4. have katello issue a sync
5. observe the katello sync never completes
6. observe additional log statements that the hotfix version logs with indicating the root cause for the stall

This fix adds logging and error handling, that is the purpose of the hotfix.

Comment 17 Tazim Kolhar 2015-12-03 09:59:37 UTC
FAILEDQA:
rq.redhat.com
[root@ibm-hs22-02 log]# rpm -qa | grep foreman
puppet-foreman_scap_client-0.3.3-9.el7sat.noarch
foreman-compute-1.7.2.47-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.4-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
foreman-gce-1.7.2.47-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.10-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.24-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
ibm-hs22-02.rhts.eng.brq.redhat.com-foreman-client-1.0-1.noarch
ibm-hs22-02.rhts.eng.brq.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-discovery-image-3.0.5-1.el7sat.noarch
foreman-ovirt-1.7.2.47-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.22-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.5-1.el7sat.noarch
foreman-debug-1.7.2.47-1.el7sat.noarch
ruby193-rubygem-foreman_openscap-0.3.2.10-1.el7sat.noarch
foreman-selinux-1.7.2.17-1.el7sat.noarch
foreman-postgresql-1.7.2.47-1.el7sat.noarch
foreman-vmware-1.7.2.47-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
foreman-proxy-1.7.2.6-1.el7sat.noarch
ibm-hs22-02.rhts.eng.brq.redhat.com-foreman-proxy-1.0-2.noarch
foreman-1.7.2.47-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.15.7-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-libvirt-1.7.2.47-1.el7sat.noarch

steps:
1. Install the hotfixed version of python-qpid
2. Apply the patch from comment 15 (causes the issue to show itself)
# diff selector.py edited_selector.py 
165a166,172
> 
> foo = None
> KEN =	0	
> 	if Selector.foo is not None and KEN > 20:
> 		# if KEN > 10:
> 		raise Exception("I hope this hangs")
> 		KEN += 1

# diff selector.py edited_selector.py > selector.patch

# patch selector.py < selector.patch
patching file selector.py

# diff selector.py edited_selector.py 

3. katello restart all services
# katello-service

4. have katello issue a sync
5. observe the katello sync never completes
6. observe additional log statements that the hotfix version logs with indicating the root cause for the stall
I have checked 2 log files:
production.log
2015-12-03 10:50:34 [I] Completed 200 OK in 122ms (Views: 105.4ms | ActiveRecord: 5.7ms)
2015-12-03 10:55:03 [I] Processing by ForemanTasks::TasksController#show as */*
2015-12-03 10:55:03 [I]   Parameters: {"id"=>"b9a107e1-3210-4a8b-a0d8-f8c2f460d020"}
2015-12-03 10:55:03 [I]   Rendered /opt/rh/ruby193/root/usr/share/gems/gems/foreman-tasks-0.6.15.7/app/views/foreman_tasks/tasks/_details.html.erb (253.0ms)
2015-12-03 10:55:03 [I]   Rendered /opt/rh/ruby193/root/usr/share/gems/gems/foreman-tasks-0.6.15.7/app/views/foreman_tasks/tasks/_running_steps.html.erb (0.2ms)
2015-12-03 10:55:03 [I]   Rendered /opt/rh/ruby193/root/usr/share/gems/gems/foreman-tasks-0.6.15.7/app/views/foreman_tasks/tasks/_errors.html.erb (4.2ms)
2015-12-03 10:55:03 [I]   Rendered /opt/rh/ruby193/root/usr/share/gems/gems/foreman-tasks-0.6.15.7/app/views/foreman_tasks/tasks/_raw.html.erb (4.1ms)
2015-12-03 10:55:03 [I]   Rendered /opt/rh/ruby193/root/usr/share/gems/gems/foreman-tasks-0.6.15.7/app/views/foreman_tasks/tasks/_locks.html.erb (1.5ms)
2015-12-03 10:55:03 [I]   Rendered /opt/rh/ruby193/root/usr/share/gems/gems/foreman-tasks-0.6.15.7/app/views/foreman_tasks/tasks/show.html.erb (266.5ms)
2015-12-03 10:55:03 [I] Completed 200 OK in 285ms (Views: 264.9ms | ActiveRecord: 6.2ms)

So there is nothing related to pulp 

and tried to check /var/log/pulp
this directory is empty

As stated, this fix adds logging and error handling, that is the purpose of the hotfix.

However in UI it shows that task as pending
I have attached screenshot

I will move it to verified if log files track the error message ..
As per my understanding that is expected

Comment 18 Tazim Kolhar 2015-12-03 10:01:05 UTC
Created attachment 1101732 [details]
pending state

Comment 19 Tazim Kolhar 2015-12-03 10:24:24 UTC
just here step 3 was 
3. katello restart all services
# katello-service restart

Comment 22 Tazim Kolhar 2015-12-03 17:49:42 UTC
Hi, 

Quick update
I just applied the patch again so that there are no errors
# rpm -qa | grep python-qpid
python-qpid-qmf-0.30-5.el7.x86_64
python-qpid-0.30-7.el7.noarch


# diff selector.py edited_selector.py 
50a51,52
>   foo = None
>   KEN =	0	
165a168,172
> 
> if Selector.foo is not None and KEN > 20:
>    # if KEN > 10:
>    raise Exception("I hope this hangs") 
>    KEN += 1

# diff selector.py edited_selector.py > selector.patch


# patch selector.py < selector.patch
patching file selector.py

# katello-service restart
Redirecting to /bin/systemctl restart  mongod.service

Redirecting to /bin/systemctl restart  qpidd.service

Redirecting to /bin/systemctl restart  qdrouterd.service

Redirecting to /bin/systemctl restart  tomcat.service

Restarting elasticsearch (via systemctl):  [  OK  ]
Redirecting to /bin/systemctl restart  foreman-proxy.service

Redirecting to /bin/systemctl restart  pulp_celerybeat.service

Redirecting to /bin/systemctl restart  pulp_resource_manager.service

Redirecting to /bin/systemctl restart  pulp_workers.service

Redirecting to /bin/systemctl restart  httpd.service

Redirecting to /bin/systemctl restart  foreman-tasks.service

Success!

Sync a Repo
Repo gets synced

# journalctl -f -l
-- Logs begin at Wed 2015-12-02 19:54:38 CET. --
Dec 03 18:43:08 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[30730]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[1ec32a21-5886-4de5-95aa-922295b96d2c] succeeded in 0.0360025199916s: None
Dec 03 18:43:08 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[30780]: celery.worker.job:INFO: Task pulp.server.managers.consumer.applicability.regenerate_applicability_for_repos[acfab534-2ec0-4225-9871-55bae0a2dcb4] succeeded in 0.0420732710045s: None
Dec 03 18:43:08 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[30780]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[ed981d5b-dc36-4781-a7c7-8a94176d74ab] succeeded in 0.00836860100389s: None
Dec 03 18:43:08 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[31066]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Dec 03 18:43:09 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[30730]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[6755bde9-e63e-4820-b83a-844c6a0fff60]
Dec 03 18:43:09 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[30780]: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.publish.publish[a5057baa-abcf-4f97-8802-7f41bca61220]
Dec 03 18:43:09 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[30780]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[3ff7eb7b-9ab6-4768-98a4-78815235c8c2]
Dec 03 18:43:09 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[30730]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[6755bde9-e63e-4820-b83a-844c6a0fff60] succeeded in 0.0363339879987s: None
Dec 03 18:43:09 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[30780]: celery.worker.job:INFO: Task pulp.server.managers.repo.publish.publish[a5057baa-abcf-4f97-8802-7f41bca61220] succeeded in 0.147294651004s: {'exception': None, 'repo_id': u'Default_Organization-sat_prod2-sat_tools_repo2', 'traceback': None, 'started':...
Dec 03 18:43:09 ibm-hs22-02.rhts.eng.brq.redhat.com pulp[30780]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[3ff7eb7b-9ab6-4768-98a4-78815235c8c2] succeeded in 0.00637840099807s: None

So this time it is definitely failing even after correctly
applying the patch 


Thanks and Regards,
Tazim

Comment 23 Brian Bouterse 2015-12-03 21:02:50 UTC
This looks well organized, and the logs being quoted contain the pulp logs so that is the place to look.

I believe the patch is not causing the erroneous case to occur. I know this because when you Sync a repo, the repo gets synced. If the patch were doing what we expect the repo would not sync.

I'm going to make a new patch based on the hotfix release to ensure the patch does what we need it to and applies cleanly.

Comment 24 isnuryusuf-gls 2015-12-04 01:11:07 UTC
Not work for me

my selector.py only have 153 Line

# on Sat and Caps
rpm -qa | grep python-qpid
python-qpid-qmf-0.30-5.el7.x86_64
python-qpid-proton-0.9-4.el7.x86_64
python-qpid-0.30-7.el7.noarch


# diff selector.py edited_selector.py 
50a51,52
> 	foo = None
> 	KEN = 0
153a156,160
> 
>  if Selector.foo is not None and KEN > 20:
>    # if KEN > 10:
>     raise Exception("I hope this hangs") 
>     KEN += 1


# diff selector.py edited_selector.py > selector.patch
# cat selector.patch 
50a51,52
> 	foo = None
> 	KEN = 0
153a156,160
> 
>  if Selector.foo is not None and KEN > 20:
>    # if KEN > 10:
>     raise Exception("I hope this hangs") 
>     KEN += 1


# patch selector.py < selector.patch
patching file selector.py


Restarting katello service both capsule and satellite
# katello-service restart


Try sync again
# hammer capsule content synchronize --id 2
[Foreman] Username: admin
[Foreman] Password for admin: 
Could not synchronize capsule content:
  There was an issue with the backend service pulp: 500 Internal Server Error


on production.log
2015-12-04 08:07:45 [I] Processing by Katello::Api::V2::CapsuleContentController#sync as application/json;version=2
2015-12-04 08:07:45 [I]   Parameters: {"api_version"=>"v2", "id"=>"2", "capsule_content"=>{}}
2015-12-04 08:07:45 [I] Authorized user admin(Admin User)
2015-12-04 08:07:46 [W] 500 Internal Server Error

Comment 26 Brian Bouterse 2015-12-04 14:08:55 UTC
Created attachment 1102282 [details]
selector.patch

Comment 27 Brian Bouterse 2015-12-04 14:12:33 UTC
The selector.patch attached was made against python-qpid-0.32-12. The idea is that it will cause an Exception to be raised in a forked child process (ie: a pulp worker) after the selector thread loops 20 times. The 20 is important because if the selector thread exits too early the foreground thread handles it correctly.

In the affected version of (<python-qpid-0.30-7) I expect pulp to hang when it syncs, and no logging to be made. In the fixed version (>=python-qpid-0.30-7) I expect pulp to still hang with lower probability and when it does it logs that the selector thread encountered an exception.

What we can't know is if the case that this library handles better was the case that caused the thread to exit in the customer environment. This verification is really about verifying that the logging occurs when the selector thread exits. This patch forces that logging to occur.

If this patch can't reproduce the logging then we need to asks the Qpid team for input. Generally we want to force the selector thread to exit after 20 tries in a forked child process. We don't want any exception raised in the parent process. This patch attempts to do that.

Comment 28 Brian Bouterse 2015-12-04 14:14:43 UTC
Oh one other important thing. When you apply the patch, you also need to delete the selector.pyc and selector.pyo files in site-packages/qpid/selector.py[o,c]. This will ensure that Python reads your code from selector.py at runtime.

Comment 29 Mike McCune 2015-12-04 14:26:05 UTC
going to move this ON_QA with the updated patch + instructions.

Comment 30 isnuryusuf-gls 2015-12-04 15:21:52 UTC
i got error using the latest patch on attachment

# patch selector.py < selector.patch
patching file selector.py
Hunk #1 succeeded at 44 (offset -3 lines).
Hunk #2 succeeded at 56 (offset -3 lines).
Hunk #3 FAILED at 111.
Hunk #4 FAILED at 131.
2 out of 4 hunks FAILED -- saving rejects to file selector.py.rej

Comment 31 Brian Bouterse 2015-12-04 17:49:15 UTC
See comment 27. The patch may not apply cleanly to MRG which is what satellite runs. The patch provided is against the fedora based version of python-qpid. I recommend the Qpid team give some input on porting it to MRG for verification inside satellite6.

Comment 32 Tazim Kolhar 2015-12-07 12:54:36 UTC
FAILEDQA:

# rpm -qa | grep foreman
foreman-compute-1.7.2.48-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.4-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.24-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.10-1.el7sat.noarch
foreman-debug-1.7.2.48-1.el7sat.noarch
foreman-postgresql-1.7.2.48-1.el7sat.noarch
ruby193-rubygem-foreman_openscap-0.3.2.10-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
foreman-gce-1.7.2.48-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.22-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.5-1.el7sat.noarch
puppet-foreman_scap_client-0.3.3-9.el7sat.noarch
foreman-selinux-1.7.2.17-1.el7sat.noarch
foreman-vmware-1.7.2.48-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
foreman-proxy-1.7.2.6-1.el7sat.noarch
ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-discovery-image-3.0.5-3.el7sat.noarch
foreman-1.7.2.48-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.15.7-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-libvirt-1.7.2.48-1.el7sat.noarch
ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-1.0-1.noarch
foreman-ovirt-1.7.2.48-1.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch

steps:
# rpm -qa | grep python-qpid
python-qpid-qmf-0.30-5.el7.x86_64
python-qpid-0.30-7.el7.noarch

delete the selector.pyc and selector.pyo files in site-packages/qpid/selector.py[o,c]. 

Based on this attachment:  https://bugzilla.redhat.com/attachment.cgi?id=1101566


# diff selector.py edited_selector.py
45a46
> foo = None
50a52
>   KEN = 0
165a168,172
> 
> if Selector.foo is not None and KEN > 20:
>   # if KEN > 10:
>   raise Exception("I hope this hangs")
> KEN += 1

# diff selector.py edited_selector.py > selector.patch
#  patch selector.py < selector.patch
patching file selector.py

# katello-service restart
Redirecting to /bin/systemctl restart  mongod.service

Redirecting to /bin/systemctl restart  qpidd.service

Redirecting to /bin/systemctl restart  qdrouterd.service

Redirecting to /bin/systemctl restart  tomcat.service

Restarting elasticsearch (via systemctl):  [  OK  ]
Redirecting to /bin/systemctl restart  foreman-proxy.service

Redirecting to /bin/systemctl restart  pulp_celerybeat.service

Redirecting to /bin/systemctl restart  pulp_resource_manager.service

Redirecting to /bin/systemctl restart  pulp_workers.service

Redirecting to /bin/systemctl restart  httpd.service

Redirecting to /bin/systemctl restart  foreman-tasks.service

Success!

Repo Sync
Repo Sync successful

# journalctl -f -l
-- Logs begin at Sun 2015-12-06 23:15:51 EST. --
Dec 07 07:46:06 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[3958]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[4dba3afa-d791-4d34-b154-0e93e36eb3ca] succeeded in 0.0344908410007s: None
Dec 07 07:46:06 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[4026]: celery.worker.job:INFO: Task pulp.server.managers.consumer.applicability.regenerate_applicability_for_repos[8676d86f-e6eb-4656-a24b-cd2c6fa32086] succeeded in 0.0441919360019s: None
Dec 07 07:46:06 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[4026]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[1f62080f-c066-45ab-92e9-73793e044ce3] succeeded in 0.0102365489984s: None
Dec 07 07:46:06 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[4296]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Dec 07 07:46:08 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[3958]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[dcbc0deb-68ba-4026-9688-2d2afebd18bf]
Dec 07 07:46:08 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[4026]: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.publish.publish[54797729-3baf-4b08-b693-a9c3ac7ce027]
Dec 07 07:46:08 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[4026]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[d886ab99-5846-4bc1-829d-2d2e1d5ea270]
Dec 07 07:46:08 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[3958]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[dcbc0deb-68ba-4026-9688-2d2afebd18bf] succeeded in 0.0342465310023s: None
Dec 07 07:46:08 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[4026]: celery.worker.job:INFO: Task pulp.server.managers.repo.publish.publish[54797729-3baf-4b08-b693-a9c3ac7ce027] succeeded in 0.191622496997s: {'exception': None, 'repo_id': u'Default_Organization-sat_prod-sat_tools_repo', 'traceback': None, 'started':...
Dec 07 07:46:08 ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com pulp[4026]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[d886ab99-5846-4bc1-829d-2d2e1d5ea270] succeeded in 0.0106485670003s: None

Comment 33 Tazim Kolhar 2015-12-07 14:09:18 UTC
The pulp version used for testing:
# rpm -qa | grep pulp
rubygem-smart_proxy_pulp-1.0.1.2-1.el7sat.noarch
python-pulp-rpm-common-2.6.0.17-1.el7sat.noarch
pulp-rpm-plugins-2.6.0.17-1.el7sat.noarch
python-pulp-docker-common-0.2.5-1.el7sat.noarch
python-pulp-common-2.6.0.17-1.el7sat.noarch
pulp-server-2.6.0.17-1.el7sat.noarch
pulp-puppet-plugins-2.6.0.17-1.el7sat.noarch
pulp-puppet-tools-2.6.0.17-1.el7sat.noarch
pulp-nodes-parent-2.6.0.17-1.el7sat.noarch
pulp-katello-0.5-1.el7sat.noarch
python-isodate-0.5.0-4.pulp.el7sat.noarch
python-pulp-bindings-2.6.0.17-1.el7sat.noarch
pulp-nodes-common-2.6.0.17-1.el7sat.noarch
python-kombu-3.0.24-10.pulp.el7sat.noarch
pulp-docker-plugins-0.2.5-1.el7sat.noarch
python-pulp-puppet-common-2.6.0.17-1.el7sat.noarch
pulp-selinux-2.6.0.17-1.el7sat.noarch

Comment 34 Ken Giusti 2015-12-07 17:17:07 UTC
Created attachment 1103324 [details]
new selector patch against 0.30-7

Comment 36 Eric Helms 2015-12-08 17:05:50 UTC
With the latest selector patch, I have been able to confirm this bug with latest even on something as simple as creation of a repository. The output I saw when running a repository create with hammer:

[root@katello-sat qpid]# hammer --username admin --password C3hEWEqSsPs9BFN2 repository create --name "Katello" --product-id 1 --content-type yum --url https://fedorapeople.org/groups/katello/releases/yum/nightly/katello/RHEL/7Server/x86_64/
Could not create the repository:
  Error: Request Timeout 


My /var/log/messages output:

Dec  8 11:44:21 localhost pulp: gofer.messaging.adapter.qpid.connection:INFO: connecting: URL: ssl://katello-sat.example.com|SSL: ca: /etc/pki/katello/certs/katello-default-ca.crt|key: None|certificate: /etc/pki/katello/qpid_client_striped.crt|host-validation: False
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=1 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=2 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=3 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=4 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=5 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=6 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=7 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=8 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=9 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=10 pid=12082
Dec  8 11:44:21 localhost pulp: gofer.messaging.adapter.qpid.connection:INFO: connected: ssl://katello-sat.example.com
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=11 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=12 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=13 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=14 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=15 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=16 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=17 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=18 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=19 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=20 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: loop count=21 pid=12082
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: Test patch: RAISING EXCEPTION NOW!
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: (12082-91552) qpid.messaging I/O Thread fatal error: This exception should be logged
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: (12082-91552) Traceback (most recent call last):
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: (12082-91552)   File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 116, in wrap_run
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: (12082-91552)     self.run()
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: (12082-91552)   File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 148, in run
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: (12082-91552)     raise Exception("This exception should be logged")
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: (12082-91552) Exception: This exception should be logged
Dec  8 11:44:21 localhost pulp: qpid.messaging:ERROR: (12082-91552)
Dec  8 11:44:26 localhost systemd: Stopping Foreman jobs daemon...
Dec  8 11:44:31 localhost foreman-tasks: Dynflow Executor: stop in progress
Dec  8 11:44:31 localhost foreman-tasks: dynflow_executor: trying to stop process with pid 22634...
Dec  8 11:44:32 localhost foreman-tasks: dynflow_executor: process with pid 22634 successfully stopped.
Dec  8 11:44:32 localhost systemd: Starting Foreman jobs daemon...
Dec  8 11:44:37 localhost foreman-tasks: Dynflow Executor: start in progress
Dec  8 11:44:38 localhost foreman-tasks: dynflow_executor: process with pid 12200 started.
Dec  8 11:44:38 localhost foreman-tasks: Waiting for the executor to be ready...
Dec  8 11:44:41 localhost qpidd[11516]: 2015-12-08 11:44:41 [Protocol] error Connection qpid.[fe80::5054:ff:fe16:3459%eth0]:5671-[fe80::5054:ff:fe16:3459%eth0]:44671 timed out: closing
Dec  8 11:44:41 localhost qpidd: 2015-12-08 11:44:41 [Protocol] error Connection qpid.[fe80::5054:ff:fe16:3459%eth0]:5671-[fe80::5054:ff:fe16:3459%eth0]:44671 timed out: closing

Comment 41 sthirugn@redhat.com 2015-12-11 22:18:54 UTC
Verified in Sat 6.1.5 Compose 4.

Comment 43 errata-xmlrpc 2015-12-15 09:20:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2622

Comment 44 pulp-infra@redhat.com 2016-01-29 22:00:25 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

Comment 45 pulp-infra@redhat.com 2016-04-19 12:04:07 UTC
The Pulp upstream bug status is at CLOSED - NOTABUG. Updating the external tracker on this bug.


Note You need to log in before you can comment on or make changes to this bug.