Bug 812499

Summary: Unable to remove "Red Hat Content Provider" if something goes wrong
Product: Red Hat Satellite Reporter: Justin Clift <jclift>
Component: WebUIAssignee: Katello Bug Bin <katello-bugs>
Status: CLOSED NOTABUG QA Contact: Katello QA List <katello-qa-list>
Severity: high Docs Contact:
Priority: low    
Version: 6.0.0CC: bkearney, jturner, kwade, mmccune, tcarlin
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-12 22:56:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Screenshot showing the error message and missing view contents. none

Description Justin Clift 2012-04-14 03:12:29 UTC
Created attachment 577437 [details]
Screenshot showing the error message and missing view contents.

Description of problem:

  Suffered a power outage of a CloudForms SE server, whilst it was adding a RH Content Provider manifest.

  Upon the box restarting again (filesystem recovery was ok), whenever the user goes to the "Red Hat Content Provider" page, the normal contents are missing.  Instead a "500 Internal Server Error" is the content for the page.

  This seems to give no way through the web UI, for removing a (likely) broken RH Content Provider definition.

  Screenshots attached.

Version-Release number of selected component (if applicable):

  It's a recent puddle build:

  $ rpm -qa | grep -i katello
  katello-selinux-0.1.10-1.el6.noarch
  katello-0.1.309-1.el6.noarch
  katello-cli-common-0.1.107-1.el6.noarch
  katello-candlepin-cert-key-pair-1.0-1.noarch
  katello-qpid-broker-key-pair-1.0-1.noarch
  katello-cli-0.1.107-1.el6.noarch
  katello-common-0.1.309-1.el6.noarch
  katello-glue-candlepin-0.1.309-1.el6.noarch
  katello-glue-foreman-0.1.309-1.el6.noarch
  katello-certs-tools-1.0.4-1.el6.noarch
  katello-configure-0.1.107-1.el6.noarch
  katello-glue-pulp-0.1.309-1.el6.noarch
  katello-all-0.1.309-1.el6.noarch
  katello-qpid-client-key-pair-1.0-1.noarch


How reproducible:

  Unknown.

Steps to Reproduce:
1. Being adding a RH Content Provider manifest... whilst in the installation process (waiting for Katello to finish processing the manifest), suffer a power outage.
2. Restart the box.
3. Go to the RH Content Providers tab.  It should be "500 Internal Server Error" instead of normal contents.


Expected results:

  Normal RH Content Provider tab contents should be there.

Additional info:

  I took a snapshot of the disk for this BZ, so can extract logs or whatever if needed.

Comment 1 James Laska 2012-04-18 15:11:28 UTC
Nice bug!  The concerning thing with this bug is there is no workaround.

Escalating as a blocker for visibility.  If we can identify a workaround, I support resolving this in a future release, and adding a 1.0 release note.

Comment 2 Mike McCune 2012-04-18 17:45:52 UTC
FYI you can not delete the Red Hat Provider.  it is baked into every org in CFSE and is created during Org creation time.  

you will have to reset your database using:

/usr/share/katello/script/katello-reset-dbs

or restore from backup.

there is no way to remove the Red Hat Content Provider even during normal operations.  since it is hard to predict the state of your database during an outage like you experienced it would be hard to know exactly what it would take to correct the situation your DB is in.

In 1.1 we could look into how to handle this type of situation better but there isn't a whole lot we can do for 1.0.*

Comment 3 Mike McCune 2012-05-09 15:36:37 UTC
For 1.1 we should look into better transaction management for long running jobs that can rollback and recover from situations like this where there is a power outage or some massive breakage during execution.

Lets investigate how to cleanup and recover broken data.

Comment 4 Justin Clift 2012-05-09 16:49:17 UTC
If there's a way to reset (just) the RH Provider (without everything else), that might make for a decent workaround.  The admin would just need to do (in this case) the manifest import again.

Though, it kind of sounds like this would be future work, with Mike's mentioned "more transactional" approach being a more complete (likely better) goal. ;)

Comment 6 Lukas Zapletal 2013-02-05 09:48:47 UTC
There is one simple script/tool which compares repositories which are in Katello and in Pulp and prints what needs to be deleted to put Katello-Pulp back in sync. You can run in like that:

RAILS_ENV=production /usr/share/katello/script/katello-check

We can extend this script if the output is not helpful so GSS can take actions when this happens. Please note this tool is not documented and it is not intended to be used by users.

Ping me if it does not work or makes no sense to you and I can investigate the box directly extending this script with this special case.

Comment 7 Lukas Zapletal 2013-02-11 15:29:55 UTC
I can't reproduce. There is no general advice how to recover - there can be so many states during things like manifest import. It depends on when you suffer power failure. It can be data inconsistency in Candlepin or Pulp or both.

I really cannot investigate all the possibilities. We need to improve our orchestration code and totally change our approach to orchestration. If you encounter any data inconsistency, we need access to the box to investigate particular case.

So the general advice is: backup and recover in this case.

Comment 9 Lukas Zapletal 2014-03-12 10:26:58 UTC
Together with org deletion, this is still relevant, but we are chainging our orchestration layer and this should be re-evaluated after the migration is done.

Comment 10 Bryan Kearney 2014-03-12 22:56:37 UTC
Providers have been hidden. This is no longer relevant.