Bug 2132011

Summary: [RFE] improving rpm database collection and yum/rpm resiliency with sos
Product: Red Hat Enterprise Linux 8 Reporter: jcastran
Component: sosAssignee: Jan Jansky <jjansky>
Status: NEW --- QA Contact: Supportability QE <supportability-qe>
Severity: high Docs Contact:
Priority: high    
Version: 8.6CC: agk, bmr, kwalker, plambri, pmoravec, prjagtap, pusharma, rdulhani, sbradley, theute
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jcastran 2022-10-04 13:12:09 UTC
Requesting some updates with sos.

Currently we have a plugin (rpm.rpmdb) which is off by default and only copies all of /var/lib/rpm into the sosreport.

My suggestions for improvement are:

1. When rpm/yum plugin fails or timeout's, automatically grab the rpm database
1b. Decrease the timeout for rpm/yum plugins failing

2. The rpmdb plugin should first attempt to tar the entire database for a more usable format for engineers

    RHEL 7 and lower
    # tar cjf rpm_db-$(hostname).tar.bz2 /var/lib/{rpm,yum}

    RHEL 8 and higher
    # tar cJhf rpm_db-$(hostname).tar.xz /var/lib/{rpm,dnf} /etc/{dnf*,os-release}


- - - - - - - - - 
If there is any concern over size I would secondarily request we make this generate the tar outside of the sos
Example:
 sosreport > detects rpm issues > generates /var/tmp/rpmdb.tar.xz

Or just have it inside the rpm database.

Comment 9 Pavel Moravec 2022-10-19 18:09:43 UTC
Indeed, the sizes of the /var/lib/rpm directory are quite big, "du -hs /var/lib/rpm" from various Fedoras or RHELs:
216M	/var/lib/rpm
182M	/var/lib/rpm
273M	/var/lib/rpm
277M	/var/lib/rpm
223M	/var/lib/rpm
41M	/var/lib/rpm

So I would be also against collecting that by default (I think this is consensus here) and mildly against collecting it after timeouted plugin.

Comment 13 Pavel Moravec 2022-12-29 15:40:45 UTC
Also with the big size of /var/lib/rpm, I tend to agree with Jake and not collect the directory content after plugin timeout. So we can currently offer:

- decreasing rpm plugin timeout to e.g. 1 minute (as the commands are usually very quick and can exceed this timeout only in case of a lock - which prevent commands execution also on the default 5m plugin timeout - so the timeout decrease can speed up things in case of locking problems
- add a plugin option - disabled by default - to grab whole directory content without a size limit

Or do I miss some idea or option?

Comment 15 Pavel Moravec 2023-07-26 14:36:22 UTC
(In reply to Pavel Moravec from comment #13)
> Also with the big size of /var/lib/rpm, I tend to agree with Jake and not
> collect the directory content after plugin timeout. So we can currently
> offer:
> 
> - decreasing rpm plugin timeout to e.g. 1 minute (as the commands are
> usually very quick and can exceed this timeout only in case of a lock -
> which prevent commands execution also on the default 5m plugin timeout - so
> the timeout decrease can speed up things in case of locking problems
> - add a plugin option - disabled by default - to grab whole directory
> content without a size limit
> 
> Or do I miss some idea or option?

Hello,
please let us know your preferences (or suggest another idea).

Comment 16 Kyle Walker 2023-08-02 16:09:01 UTC
The reality today is that if a sosreport doesn't have an operable RPM database, the Support Delivery teams ask for the /var/lib/rpm directory.

The raised concern around the size of the directory is valid, and indicates to me that we should not pursue just a blanket gathering of the contents. That being said, the request here is to get the teams the data they need on the initial capture when that invalid state is encountered. I don't see how having a timeout-bound fallback behaviour of gathering the /var/lib/rpm directory is anything less than perfect for this purpose.