Bug 886909 - Uploading same manifest to same org: 500 error, 'Unable to create base dir for sync: /var/cache/candlepin/sync'
Summary: Uploading same manifest to same org: 500 error, 'Unable to create base dir f...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Subscription Asset Manager
Classification: Retired
Component: katello
Version: 1.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 1.3
Assignee: Katello Bug Bin
QA Contact: SAM QE List
URL:
Whiteboard:
Depends On:
Blocks: 886931 sam20-tracker 996556
TreeView+ depends on / blocked
 
Reported: 2012-12-13 14:48 UTC by Jeff Weiss
Modified: 2014-11-09 22:52 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 886931 996556 (view as bug list)
Environment:
Last Closed: 2014-06-27 20:56:20 UTC
Embargoed:
jweiss: automate_bug+


Attachments (Terms of Use)
zipped logs (598.87 KB, application/x-gzip)
2012-12-13 14:48 UTC, Jeff Weiss
no flags Details

Description Jeff Weiss 2012-12-13 14:48:42 UTC
Created attachment 663008 [details]
zipped logs

Description of problem:


Version-Release number of selected component (if applicable):
Katello Version: 1.3.0-1.git.52.4f131bd.el6_3

How reproducible:


Steps to Reproduce:
1. upload a manifest
2. upload the same manifest again to the same org
3. 
  
Actual results:
error notif, Unable to create base dir for sync: /var/cache/candlepin/sync

Expected results:
an error notif saying the manifest is the same as the one that's already there.

Additional info:
see attached logs, delayed_job.log

2012-12-12T17:35:45-0500: [Worker(delayed_job.0 host:10-16-120-228.dhcp.rhq.lab.eng.bos.redhat.com pid:13790)] PERMANENTLY removing Provider#queue_import_manifest because of 1 consecutive failures.
Resources::Candlepin::Owner: 500 Internal Server Error {"displayMessage":"Unable to create base dir for sync: /var/cache/candlepin/sync"} (POST /candlepin/owners/dup-manifest-1212-173534-903/imports)
/usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/abstract_response.rb:48:in `return!'
/usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:220:in `process_result'
/usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:169:in `transmit'
/usr/lib/ruby/1.8/net/http.rb:543:in `start'
/usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:166:in `transmit'
/usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:60:in `execute'
/usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:31:in `execute'
/usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/resource.rb:63:in `post'
/usr/share/katello/lib/http_resource.rb:109:in `post'
/usr/share/katello/lib/resources/candlepin.rb:268:in `import'
/usr/share/katello/app/models/glue/provider.rb:184:in `owner_import'
/usr/share/katello/app/models/glue.rb:153:in `send'
/usr/share/katello/app/models/glue.rb:153:in `execute'
/usr/share/katello/app/models/glue.rb:104:in `process'
/usr/share/katello/app/models/glue.rb:100:in `each'
/usr/share/katello/app/models/glue.rb:100:in `process'
/usr/share/katello/app/models/glue.rb:32:in `on_save'
/usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:220:in `_conditional_callback_around_2818'
/usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:414:in `_run_save_callbacks'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:273:in `create_or_update'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/persistence.rb:60:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/validations.rb:49:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/attribute_methods/dirty.rb:30:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:292:in `with_transaction_returning_status'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/database_statements.rb:139:in `transaction'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:207:in `transaction'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:290:in `with_transaction_returning_status'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!'
/usr/share/katello/app/models/glue/provider.rb:213:in `queue_import_manifest'
/usr/share/katello/app/models/async_operation.rb:49:in `send'
/usr/share/katello/app/models/async_operation.rb:49:in `perform'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/backend/base.rb:87:in `invoke_job'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:120:in `run'
/usr/lib/ruby/1.8/timeout.rb:67:in `timeout'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:120:in `run'
/usr/lib/ruby/1.8/benchmark.rb:308:in `realtime'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:119:in `run'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:177:in `reserve_and_run_one_job'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:104:in `work_off'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:103:in `times'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:103:in `work_off'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:78:in `start'
/usr/lib/ruby/1.8/benchmark.rb:308:in `realtime'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:77:in `start'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:74:in `loop'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/worker.rb:74:in `start'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/command.rb:104:in `run'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/command.rb:83:in `run_process'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons/application.rb:249:in `call'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons/application.rb:249:in `start_proc'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons/daemonize.rb:197:in `call'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons/daemonize.rb:197:in `call_as_daemon'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons/application.rb:253:in `start_proc'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons/application.rb:293:in `start'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons/controller.rb:70:in `run'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons.rb:195:in `run_proc'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons/cmdline.rb:109:in `call'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons/cmdline.rb:109:in `catch_exceptions'
/usr/lib/ruby/gems/1.8/gems/daemons-1.1.4/lib/daemons.rb:194:in `run_proc'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/command.rb:81:in `run_process'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/command.rb:75:in `daemonize'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/command.rb:73:in `times'
/usr/lib/ruby/gems/1.8/gems/delayed_job-2.1.4/lib/delayed/command.rb:73:in `daemonize'
script/delayed_job:5

Comment 3 Tom McKay 2012-12-13 15:23:03 UTC
Need to get katello-debug output with these sort of BZs and versions of products (candlepin, pulp, etc.)

Comment 4 Jeff Weiss 2012-12-13 15:35:53 UTC
katello-debug output is already attached.

the versions of cp pulp etc are whatever is required by the version of katello, but here they are:

/scpc:root.120.93: #$ rpm -qa | grep candlepin*
candlepin-selinux-0.7.22-1.el6.noarch
candlepin-0.7.22-1.el6.noarch
candlepin-tomcat6-0.7.22-1.el6.noarch
katello-candlepin-cert-key-pair-1.0-1.noarch
katello-glue-candlepin-1.3.0-1.git.56.71ad281.el6_3.noarch
/scpc:root.120.93: #$ rpm -qa | grep pulp
python-oauth2-1.5.170-2.pulp.el6.noarch
pulp-1.1.12-1.el6.noarch
m2crypto-0.21.1.pulp-7.el6.x86_64
katello-glue-pulp-1.3.0-1.git.56.71ad281.el6_3.noarch
mod_wsgi-3.3-3.pulp.el6.x86_64
pulp-common-1.1.12-1.el6.noarch
pulp-selinux-server-1.1.12-1.el6.noarch
/scpc:root.120.93: #$ rpm -qa | grep foreman
katello-glue-foreman-1.3.0-1.git.56.71ad281.el6_3.noarch
rubygem-foreman_api-0.0.8-1.el6.noarch
foreman-postgresql-1.0.1-40.fc9ba74.el6.noarch
foreman-1.0.1-40.fc9ba74.el6.noarch
/scpc:root.120.93: #$

Comment 5 Devan Goodwin 2012-12-13 15:56:41 UTC
I can't seem to find any such error in the catalina.out in the attached zip, are we sure this is from the right server?

Could you check and report on the perms of everything leading to /var/cache/candlepin/sync (assuming they do exist).

Was tomcat recently upgraded on this system?

Comment 6 Devan Goodwin 2012-12-13 16:00:21 UTC
Logged in myself to check, perms are are looking fine on 10.16.120.93 at least, still can't find evidence of that exception on this system though.

The original report shows Dec 12, all the logs on the system look to start on Dec 13th.

Need some info on this one, something's not adding up.

Thanks!

Comment 7 Devan Goodwin 2012-12-13 17:36:34 UTC
This was a continuous integration system, next run the error did not surface. CI system is apparently doing some things with how the rpms are installed  and then upgraded before katello-configure is run, rpm upgrades could be involved here but I'm far from certain.

Problem does not surface for developers, and as far as we know nobody's seen this in a regular installation either.

The exception is thrown if /var/cache/candlepin/sync does not exist and we cannot create it (which would be attempted as the tomcat user candlepin is running as).

/var/cache/candlepin is owned by the candlepin-tomcat6 package. 


%defattr(644,tomcat,tomcat,775)
%{_localstatedir}/cache/%{name}

Nothing has changed around this in a very long time. (7-8 months)

Holding until we have a reproducer or environment where it's happening.

Comment 8 Devan Goodwin 2012-12-13 17:42:39 UTC
Can replicate on a dev server if I move /var/cache/candlepin and try to import.

The fact that nothing is getting logged for candlepin is looking like a bug in our code, if you don't have debug on you won't have any indication of the error in our logs, only the displayMessage in response sent to caller.

This isn't the cause just an explanation of why we're getting nothing in catalina.out.

As for why the CI system suddenly had no /var/cache/candlepin, still unsure, and holding until we see it again.

Comment 9 Devan Goodwin 2012-12-13 19:45:59 UTC
Bug surfaced again, in both set of logs I now notice something like this:

Dec 12 17:35:45 [http-8443-4] INFO  org.candlepin.resource.OwnerResource - Importing archive /var/cache/tomcat6/temp/pfx1606609630550696439sfx for owner man-reuse-1212-173535-220
Dec 12 17:35:45 [http-8443-3] INFO  org.candlepin.resource.OwnerResource - Importing archive /var/cache/tomcat6/temp/pfx4444889783006217487sfx for owner dup-manifest-1212-173534-903
Dec 12 17:35:45 [http-8443-4] INFO  org.candlepin.sync.Importer - Extracting archive to: /var/cache/candlepin/sync/import1738555957853794164969369865921
Dec 12 17:35:45 [http-8443-4] INFO  org.candlepin.sync.Importer - Extracting archive to: /var/cache/candlepin/sync/import1738555957853794164969369865921
Dec 12 17:35:45 [http-8443-3] INFO  org.candlepin.resource.OwnerResource - Import attempt completed for owner dup-manifest-1212-173534-903
Dec 12 17:35:45 [http-8443-4] WARN  org.candlepin.sync.Importer - Incompatible rules: import version 0.5.11-1 older than our version 0.7.22.
Dec 12 17:35:45 [http-8443-4] WARN  org.candlepin.sync.Importer - Manifest data will be imported without rules import.
Dec 12 17:35:45 [http-8443-3] INFO  org.candlepin.resource.OwnerResource - Deleting all subscriptions from manifests for owner: dup-manifest-1212-173534-903
Dec 12 17:35:46 [http-8443-4] INFO  org.candlepin.resource.OwnerResource - Import attempt completed for owner man-reuse-1212-173535-220

The issue appears to not be related to importing twice into same org but rather two imports coming in at exactly the same second and trying to extract into the same directory.

Working on test reproducer and fix now.

Severity of the bug in the real world is probably quite low, simultaneous imports would be quite hard to trigger, and a repeat attempt would likely work fine. (no serious consequences)

Comment 10 Jeff Weiss 2012-12-13 20:23:39 UTC
I am not sure how likely a user is to automate importing of manifests.  But if they did, I would guess they would likely just script it with bash and katello cli.

However if they wanted to be clever and make it go faster, they would fork each import command, and then they *would* be likely to hit this problem.

Comment 11 Devan Goodwin 2012-12-14 14:14:44 UTC
From email discussion with Jesus:

 I noticed that we're using System.nanoTime() to create unique filenames. This javadoc looks suspicious. I wonder if we've hit a case where it actually came back with the same number?

http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#nanoTime()


Also the two log statements stating "Importing archive" are correctly handled by 2 threads (-4 and -3).

Dec 12 17:35:45 [http-8443-4] INFO  org.candlepin.resource.OwnerResource - Importing archive /var/cache/tomcat6/temp/pfx1606609630550696439sfx for owner man-reuse-1212-173535-220
Dec 12 17:35:45 [http-8443-3] INFO  org.candlepin.resource.OwnerResource - Importing archive /var/cache/tomcat6/temp/pfx4444889783006217487sfx for owner dup-manifest-1212-173534-903

Now notice the lines that indicate "Extracting archive" are showing the SAME thread (-4).

Dec 12 17:35:45 [http-8443-4] INFO  org.candlepin.sync.Importer - Extracting archive to: /var/cache/candlepin/sync/import1738555957853794164969369865921
Dec 12 17:35:45 [http-8443-4] INFO  org.candlepin.sync.Importer - Extracting archive to: /var/cache/candlepin/sync/import1738555957853794164969369865921

Unfortunately, I had four threads going making temp files and each doing 100,000 couldn't get a collision. That's not to say that over time we wouldn't get one, but it's very odd.

One other idea is verify that the Importer code doesn't have any local variables that it is maintaining in this area (nothing popped out at me) and double check Guice to make sure we're not starting it in a dumb manner.

Comment 12 Devan Goodwin 2012-12-14 14:25:35 UTC
I think we need to postpone to focus on some other more pressing work, but definitely something here we need to investigate particularly if there's a threading issue.

Comment 14 Bryan Kearney 2013-06-10 14:41:09 UTC
There is now a friendly error message which is shown. Moving this to ON_QA.

Comment 17 sthirugn@redhat.com 2013-08-13 17:17:27 UTC
Now I see an user message in UI: "Subscription manifest import for provider 'Red Hat' skipped
Reason: Manifest subscriptions unchanged from previous"

But /var/log/katello/delayed_jobs.log still shows the error as mentioned in this BZ.  I think we should not log the exception in delayed_jobs.log if we are catching it and throwing a friendly user message in UI.

Package tested:
* candlepin-0.8.19-1.el6sam.noarch
* candlepin-scl-1-5.el6_4.noarch
* candlepin-scl-quartz-2.1.5-5.el6_4.noarch
* candlepin-scl-rhino-1.7R3-1.el6_4.noarch
* candlepin-scl-runtime-1-5.el6_4.noarch
* candlepin-selinux-0.8.19-1.el6sam.noarch
* candlepin-tomcat6-0.8.19-1.el6sam.noarch
* elasticsearch-0.19.9-8.el6sat.noarch
* katello-candlepin-cert-key-pair-1.0-1.noarch
* katello-certs-tools-1.4.2-2.el6sat.noarch
* katello-cli-1.4.3-5.el6sat.noarch
* katello-cli-common-1.4.3-5.el6sat.noarch
* katello-common-1.4.3-6.el6sam_splice.noarch
* katello-configure-1.4.4-2.el6sat.noarch
* katello-glue-candlepin-1.4.3-6.el6sam_splice.noarch
* katello-glue-elasticsearch-1.4.3-6.el6sam_splice.noarch
* katello-headpin-1.4.3-6.el6sam_splice.noarch
* katello-headpin-all-1.4.3-6.el6sam_splice.noarch
* katello-selinux-1.4.4-2.el6sat.noarch
* thumbslug-0.0.32-1.el6sam.noarch
* thumbslug-selinux-0.0.32-1.el6sam.noarch

Comment 18 Bryan Kearney 2013-08-20 18:18:37 UTC
Moving to NEW to get it looked at.

Comment 20 Bryan Kearney 2013-10-30 17:14:09 UTC
Per coment 17, the fix should be to not log the 409 to stdout, since that is a valid use case:

2013-10-30T12:27:21-0400: [Worker(delayed_job host:sam13 pid:1635)] Provider#queue_import_manifest failed with RestClient::Conflict: Resources::Candlepin::Owner: 409 Conflict {"displayMessage":"Import is the same as existing data","requestUuid":"17f3fb96-525f-4c3c-8d43-721578bcd05e","conflicts":["MANIFEST_SAME"]} (POST /candlepin/owners/ACME_Corporation/imports) - 0 failed attempts
2013-10-30T12:27:21-0400: [Worker(delayed_job host:sam13 pid:1635)] PERMANENTLY removing Provider#queue_import_manifest because of 1 consecutive failures.
Resources::Candlepin::Owner: 409 Conflict {"displayMessage":"Import is the same as existing data","requestUuid":"17f3fb96-525f-4c3c-8d43-721578bcd05e","conflicts":["MANIFEST_SAME"]} (POST /candlepin/owners/ACME_Corporation/imports)

Comment 22 sthirugn@redhat.com 2014-06-27 20:56:20 UTC
Please ignore my comment in Comment 17 requesting to not log the exception.

In SAM 1.4 importing the same manifest again provides the following UI error which is good enough:
Subscription manifest import for provider 'Red Hat' skipped
Reason: Manifest subscriptions unchanged from previous


Note You need to log in before you can comment on or make changes to this bug.