807332 – provider sync of two repos in diff. products fails with huge dump

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 807332 - provider sync of two repos in diff. products fails with huge dump

Summary: provider sync of two repos in diff. products fails with huge dump

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	API
Sub Component:
Version:	6.0.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	Unspecified
Assignee:	Bryan Kearney
QA Contact:	Corey Welton
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-03-27 14:32 UTC by Garik Khachikyan
Modified:	2019-09-25 20:44 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-08-22 18:31:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch against repo_sync.py in pulp-1.0.0-8 moving callback to separate thread (916 bytes, patch) 2012-03-30 09:40 UTC, Ivan Necas	no flags	Details \| Diff
View All

Description Garik Khachikyan 2012-03-27 14:32:22 UTC

Description of problem:
Recent Katello changes brought the system to be failing on sync of provider that has two different products where a repo is defined for each.

Version-Release number of selected component (if applicable):
mod_wsgi-3.3-3.pulp.el6.x86_64
katello-common-0.2.19-1.git.0.d1a03df.el6.noarch
katello-selinux-0.2.4-1.git.0.b03a73e.el6.noarch
katello-repos-testing-0.2.1-1.el6.noarch
qpid-cpp-server-ssl-0.12-6.el6.x86_64
candlepin-0.5.26-1.el6.noarch
pulp-common-1.0.0-8.el6.noarch
qpid-cpp-server-0.12-6.el6.x86_64
katello-glue-pulp-0.2.19-1.git.0.d1a03df.el6.noarch
pulp-1.0.0-8.el6.noarch
qpid-cpp-client-ssl-0.12-6.el6.x86_64
katello-qpid-broker-key-pair-1.0-1.noarch
katello-cli-common-0.2.17-1.git.0.c83d28b.el6.noarch
katello-cli-0.2.17-1.git.0.c83d28b.el6.noarch
m2crypto-0.21.1.pulp-7.el6.x86_64
python-oauth2-1.5.170-2.pulp.el6.noarch
qpid-cpp-client-0.12-6.el6.x86_64
katello-glue-candlepin-0.2.19-1.git.0.d1a03df.el6.noarch
pulp-selinux-server-1.0.0-8.el6.noarch
katello-qpid-client-key-pair-1.0-1.noarch
katello-agent-1.0.3-1.git.0.cccd0b4.el6.noarch
python-qpid-0.12-1.el6.noarch
katello-certs-tools-1.1.5-1.git.0.f153109.el6.noarch
candlepin-tomcat6-0.5.26-1.el6.noarch
katello-glue-foreman-0.2.19-1.git.0.d1a03df.el6.noarch
katello-0.2.19-1.git.0.d1a03df.el6.noarch
katello-repos-0.2.1-1.el6.noarch
katello-configure-0.2.16-1.git.0.9e604ab.el6.noarch
katello-candlepin-cert-key-pair-1.0-1.noarch

How reproducible:
always

Steps to Reproduce:
client remember --option org --value ACME_Corporation
provider create --name Provider1
product create --name zoo --provider Provider1
repo create --name zoo --product zoo --url http://inecas.fedorapeople.org/fakerepos/zoo3/
product create --name pulp --provider Provider1
repo create --name pulp --product pulp --url http://repos.fedorapeople.org/repos/pulp/pulp/v1/stable/6Server/x86_64/
provider synchronize --name Provider1

  
Actual results:
waiting for a while with 0% progress there is dumping really a huge error log to console.

Expected results:
no errors, no issues on syncing

Additional info:
Individual repo sync however works!

Comment 1 Ivan Necas 2012-03-27 14:48:06 UTC

Could you provide the stack trace you're seeing. I could not reproduce using the steps.

Comment 2 Ivan Necas 2012-03-28 10:38:23 UTC

I was able to reproduce with this conditions:

1. setting post_sync_url in /etc/pulp/pulp.conf
2. running katello in devel mode (or running it on a slower machine)

This didn't occurred on my not-virtualized laptop. So it seems very likely a dead-lock, that ends up in request time-out when requesting the sync status.

Comment 3 Ivan Necas 2012-03-28 13:33:43 UTC

Better error handling commited in 26ee22ccaf2da68b2039d6d9015d542b1dcb7e37 . The timeout problem is still there.

Comment 4 Ivan Necas 2012-03-28 14:50:36 UTC

The root cause of the problem is described in Pulp bug I've filed [1]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=807720

Comment 5 Justin Sherrill 2012-03-28 21:05:22 UTC

Spoke with jconner and he tried a modification to pulp: 

http://git.fedorahosted.org/git/?p=pulp.git;a=commitdiff;h=68389d55e97053049dfd9f5ab20301e07823c6fe

and basically now pulp will not wait for a response before closing the connection.  

I tested with ivan's simple server script and now the pulp-admin command returns as expected.  However, when doing this with a katello instance, apache does not seem to forward the request through to the thin processes (it simply returns a 403).  

So we either need to find another solution, find a way to force apache to forward the request, or instead have pulp go directly to a think process (port 5000 for example), instead of apache.

Thoughts?

Comment 6 Ivan Necas 2012-03-29 14:34:06 UTC

I'm not a big fan of pointing it directly to thin process. I don't have much experience with setting up apache this way. If Pulp could postpone the callback somewhere, where isn't the lock that causes the status calls not being served. It's strange another calls, like `repo list` work ok, even when waiting for the callback.

Comment 7 Justin Sherrill 2012-03-29 16:36:50 UTC

Ivan,  I'm not a huge fan of that either but this is a temporary solution (pulp is adding an item to their backlog to come up with a more permanent solution).  Speaking with jconner it seems to that it would take a good deal more work to move something like this out of the main task queue.  So while I don't like it either, I'm ok with pointing to port 5000 for v1.  

We would also need to change the authorization logic, as there would be no HTTP_X_FORWARDED_FOR header.

Comment 8 Ivan Necas 2012-03-30 09:40:33 UTC

Created attachment 573926 [details]
patch against repo_sync.py in pulp-1.0.0-8 moving callback to separate thread

What if we moved the callback request into separate thread and wait for the response from there, so that it wouldn't block the original process? I've tried a patch against repo_sync.py in pulp-1.0.0-8 (see attachment) and it seemed to work. @jason - any thoughts on that?

Comment 9 Jason Connor 2012-03-30 19:37:30 UTC

Ok, I've pushed a modified version of the supplied patch that fires the actual post request off in a separate thread.
The fix is in two commits:
2c76534652cd59082f9cc1589e11abdef3f4a6a2
1180e17fa46b96018ec4e58af3972c5e597a14df

Comment 12 Jeff Ortel 2012-04-03 18:49:39 UTC

build: 1.0.3

Comment 13 Corey Welton 2012-04-10 20:13:25 UTC

[root@se-blade ~]# katello --username admin --password admin provider synchronize --name Provider1
Provider [ Provider1 ] synchronized                                   

QA Verified, following steps above and subsequently confirming in UI.

Tested on brew build of 0.1.309-1.el6 which has package pulp-1.0.4-1.el6.noarch

Comment 16 Mike McCune 2013-08-16 18:20:15 UTC

getting rid of 6.0.0 version since that doesn't exist

Note You need to log in before you can comment on or make changes to this bug.