Bug 2132011 - [RFE] improving rpm database collection and yum/rpm resiliency with sos
Summary: [RFE] improving rpm database collection and yum/rpm resiliency with sos
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sos
Version: 8.6
Hardware: All
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Jan Jansky
QA Contact: Supportability QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-04 13:12 UTC by jcastran
Modified: 2023-08-02 16:09 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-136281 0 None None None 2022-10-12 20:43:24 UTC

Description jcastran 2022-10-04 13:12:09 UTC
Requesting some updates with sos.

Currently we have a plugin (rpm.rpmdb) which is off by default and only copies all of /var/lib/rpm into the sosreport.

My suggestions for improvement are:

1. When rpm/yum plugin fails or timeout's, automatically grab the rpm database
1b. Decrease the timeout for rpm/yum plugins failing

2. The rpmdb plugin should first attempt to tar the entire database for a more usable format for engineers

    RHEL 7 and lower
    # tar cjf rpm_db-$(hostname).tar.bz2 /var/lib/{rpm,yum}

    RHEL 8 and higher
    # tar cJhf rpm_db-$(hostname).tar.xz /var/lib/{rpm,dnf} /etc/{dnf*,os-release}


- - - - - - - - - 
If there is any concern over size I would secondarily request we make this generate the tar outside of the sos
Example:
 sosreport > detects rpm issues > generates /var/tmp/rpmdb.tar.xz

Or just have it inside the rpm database.

Comment 9 Pavel Moravec 2022-10-19 18:09:43 UTC
Indeed, the sizes of the /var/lib/rpm directory are quite big, "du -hs /var/lib/rpm" from various Fedoras or RHELs:
216M	/var/lib/rpm
182M	/var/lib/rpm
273M	/var/lib/rpm
277M	/var/lib/rpm
223M	/var/lib/rpm
41M	/var/lib/rpm

So I would be also against collecting that by default (I think this is consensus here) and mildly against collecting it after timeouted plugin.

Comment 13 Pavel Moravec 2022-12-29 15:40:45 UTC
Also with the big size of /var/lib/rpm, I tend to agree with Jake and not collect the directory content after plugin timeout. So we can currently offer:

- decreasing rpm plugin timeout to e.g. 1 minute (as the commands are usually very quick and can exceed this timeout only in case of a lock - which prevent commands execution also on the default 5m plugin timeout - so the timeout decrease can speed up things in case of locking problems
- add a plugin option - disabled by default - to grab whole directory content without a size limit

Or do I miss some idea or option?

Comment 15 Pavel Moravec 2023-07-26 14:36:22 UTC
(In reply to Pavel Moravec from comment #13)
> Also with the big size of /var/lib/rpm, I tend to agree with Jake and not
> collect the directory content after plugin timeout. So we can currently
> offer:
> 
> - decreasing rpm plugin timeout to e.g. 1 minute (as the commands are
> usually very quick and can exceed this timeout only in case of a lock -
> which prevent commands execution also on the default 5m plugin timeout - so
> the timeout decrease can speed up things in case of locking problems
> - add a plugin option - disabled by default - to grab whole directory
> content without a size limit
> 
> Or do I miss some idea or option?

Hello,
please let us know your preferences (or suggest another idea).

Comment 16 Kyle Walker 2023-08-02 16:09:01 UTC
The reality today is that if a sosreport doesn't have an operable RPM database, the Support Delivery teams ask for the /var/lib/rpm directory.

The raised concern around the size of the directory is valid, and indicates to me that we should not pursue just a blanket gathering of the contents. That being said, the request here is to get the teams the data they need on the initial capture when that invalid state is encountered. I don't see how having a timeout-bound fallback behaviour of gathering the /var/lib/rpm directory is anything less than perfect for this purpose.


Note You need to log in before you can comment on or make changes to this bug.