Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1244513

Summary: rpm rebuild --what to do to recover from database crash
Product: [Retired] Fedora Documentation Reporter: Leslie Satenstein <lsatenstein>
Component: system-administrator's-guideAssignee: Stephen Wadeley <swadeley>
Status: CLOSED UPSTREAM QA Contact: Fedora Docs QA <docs-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: develCC: jsilhan, lsatenstein, me, mluscon, packaging-team-maint, pnemade, rholy, swadeley, tim.lauridsen, vmukhame
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-28 13:40:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Leslie Satenstein 2015-07-19 14:05:59 UTC
Description of problem:

Part way through dnf update the application emitted an error message along with instructions to perform  a dnf rebuild.

dnf rebuild is not in the man pages, or in Fedora Documents that I looked at.



Version-Release number of selected component (if applicable):


How reproducible:

Random or when dnf encounters a database inconsistancy

Steps to Reproduce:
1.
2.
3.

Actual results:

No information within Fedora documentation. Found information on the web

Expected results:

dnf rebuild should be part of the internal dnf options and should also be included in the man/info pages. 

Additional info:

Found this information on the web...

http://www.cyberciti.biz/tips/rebuilding-corrupted-rpm-database.html


rm -rf /var

# cd /var/lib
# rm __db*

Rebuild RPM database:
# rpm --rebuilddb
# rpmdb_verify Packages

Comment 1 Stephen Wadeley 2015-07-20 06:56:34 UTC
Hello Leslie

Thank you for raising this bug.

Comment 2 Stephen Wadeley 2015-07-20 07:08:47 UTC
Hello Jan Silhan and colleagues


Can you please give me some advice on how to deal with the error mentioned in comment 0?

Thank you

Comment 3 Pete Travis 2015-08-28 23:04:40 UTC
I think this is referring to the message "Delta RPM rebuild failed" - a condition which dnf already handles, and there is already an open bug for better behavior in that case.  The actual option `dnf rebuild` does not exist and there is nothing in the upstream dnf code suggesting it.  

That said, there may be a situation where dnf can detect a corrupt rpmdb and offer to rebuild it.  I'm reassigning this as an RFE to dnf.  If Jan bites, we'll update the documentation accordingly.

Comment 4 Honza Silhan 2015-09-07 17:23:23 UTC
we actually print log:
"...To diagnose the problem, try running: ''rpm -Va --nofiles --nodigest'
You probably have corrupted RPMDB, running rpm --rebuilddb"

Are we on the same page and talking about transaction check error after transaction confirmation and not "Delta RPM rebuild failed"?

Your suggestions to improve this message are welcomed.

It's not a good idea to do rpmdb rebuild automatically by DNF itself. It could cause more harm than good. The problems are usually more complicated ones needed to be resolved by human. I'd prefer to document dealing with corrupted database in examples. Once we identify most database issues then maybe some community plugin will be made to detect and fix some of the cases but it's really problematic.

Stephen, try to execute the commands above.

Comment 5 Leslie Satenstein 2015-09-08 01:30:03 UTC
Hi All,

In all documentation that I have ever written or used, there was always an appendix with the error numbers and the affiliated error message.

I think that it would be great to have dnf messages identified as follows:

dnf0000I   Informational message 0000I blah blah blah..
dnf0000E   Informational message 0000E this is the error and this is what
           corrective action you must take.
dnf0000W   A warning message that the developer should review. The application
           may have taken some recovery action on its own. If the xxx could 
           refer to a different appliction where space consumption 
           threshold reached. 
log0000W   log capacity at 85% used, 15% remaining. Take corrective action.

The I's indicate no user action. The W's are warnings to take a pending action and the E's refer to Errors and immediate actions.  Return codes could be categorized as 0 for no errors or (I type messages). 1 to 10 for Warnings.
and 100 plus for Errors.

Think about implementing this type of documentation improvement for Fedora 26 or 27.

Comment 6 Stephen Wadeley 2015-09-14 08:22:03 UTC
Hello Leslie

Re. comment 4

Please try those commands.

Re: comment 5

I think it is better if programs print out error messages rather then give you a code you have to look up.

A table of codes with corresponding error messages would be classified as reference material. Those with more experience than I say we should avoid reference material where possible in the guides and leave that to man pages. Guides should concentrate on tasks. There are exceptions, for example tables showing old commands versus new commands, or information you need in planning a task. That sort of reference material can add value.

Comment 7 Leslie Satenstein 2015-09-28 13:40:24 UTC
I wish to close this bug.

The interesting thing about winning software and technical systems, is that good systems always have a « code » followed with an accompanying brief error message For example:

dnfnnnnA  run dnf rebuild rpm --rebuilddb followed by  rpmdb_verify Packages  
     
If the support person needed more information, he could then look up dnfnnnnA via the man pages or other guide. If the end-user was Romanian or other for which there is  no translated version, the code would be most useful for individuals (support) to understand the why and what to do. This code idea helps  technical support, as the user could quote the « code »


So, as a suggestion  Letters I,A,W,E       
dnfnnnnnI follows with an informative message  with Retcode 0
dnfnnnnnA follows with the requirements for User action Retcode 0 if action taken, otherwise exit with retcode 4

dnfnnnnnW is a warning message, which may be ignored or reviewed for further action. Retcode 4

dnfnnnnnE The system experienced an error with "blah blah blah" Corrective action required. Retcode 16 

As Linux is growing in number of platforms and number of systems (docker) etc. Linux needs some structure for error messages.