RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1683877 - sosreport timed out with collecting /var/lib/neutron with more than 800000 files in openstack_neutron plugin
Summary: sosreport timed out with collecting /var/lib/neutron with more than 800000 fi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sos
Version: 7.6
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Pavel Moravec
QA Contact: Miroslav Hradílek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-28 02:47 UTC by Masahiro Matsuya
Modified: 2023-09-07 19:46 UTC (History)
12 users (show)

Fixed In Version: sos-3.7-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-06 13:15:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github sosreport sos pull 1598 0 'None' closed [openstack_neutron] skip collecting /var/lib/neutron/lock 2020-11-27 09:05:30 UTC
Red Hat Product Errata RHEA-2019:2295 0 None None None 2019-08-06 13:16:10 UTC

Comment 2 Pavel Moravec 2019-03-01 11:46:20 UTC
sosreport shall be more robust, e.g. by having some (configurable) limit how many files to collect by single self.add_copy_spec or by one plugin - to prevent collecting files that just takes time and space but nobody will review (until some Insights rule requires all of them?). But that could be tricky to find out proper value for such limit due to many reasons.

Also, increasing timeout generally might not be good solution - if the plugin gets stuck somewhere, it would take much more time to detect it.

Out of the 800k files, are all of them required to be collected? Can't we e.g. skip the *sock files at all (and do they form majority of the files there)? If so, a simple self.add_forbidden_path("/var/lib/neutron/*sock") would be a solution.

Or is it sufficient to collect just some file patterns from the dir? Could we enumerate then and collect those instead of whole /var/lib/neutron ?

Or more general question: what of these 800k files are worth to be collected - all? or just some file patterns? or skip some file patterns? (such that at the end, we collect reasonable small but sufficient number of files)?

Comment 4 Alex Stupnikov 2019-03-16 15:32:19 UTC
Hello Pavel.

/var/lib/neutron folder contain important information about SDN infrastructure: for example, dnsmasq/keepalived configuration files and temporary data are stored there. We need such information in sosreports to troubleshoot DHCP assignments and HA issues and can't exclude this directory from sosreports. But it is also true that we don't need its full contents.

I would like to propose simple logic: create a single file that will list all empty files in /var/lib/neutron dir (lock, sock, etc) with appropriate comment at its head that will clarify its contents.

BR, Alex.

Comment 5 Pavel Moravec 2019-03-18 07:34:38 UTC
(In reply to Alex Stupnikov from comment #4)
> Hello Pavel.
> 
> /var/lib/neutron folder contain important information about SDN
> infrastructure: for example, dnsmasq/keepalived configuration files and
> temporary data are stored there. We need such information in sosreports to
> troubleshoot DHCP assignments and HA issues and can't exclude this directory
> from sosreports. But it is also true that we don't need its full contents.

Could you identify what content is needed, then? I.e. what file patterns cover all needed data, or what file patterns we should blacklist in file collection as they are not required and they can cause the boost of files in the dir?

> 
> I would like to propose simple logic: create a single file that will list
> all empty files in /var/lib/neutron dir (lock, sock, etc) with appropriate
> comment at its head that will clarify its contents.

You mean sosreport should:
- first collect output of "ls -lRa /var/lib/neutron" (or "find /var/lib/neutron" or some other cmd to determine also file type?) - or *exactly* collect list of empty files? (what will be the purpose, then? sos will timeout, you will still miss even filenames of interesting files in the dir)
- even then copy (still whole?) content of the directory - such that if this will timeout due to excessive amount of files, we/you will have at least listing of them

I see this as sort of workaround:
- you still will have to wait for plugin timeout to happen (redundantly slow sosreport run)
- sosreport might skip collecting some important files from that dir
- optionally, sosreport might skip collecting some other important files, asked to be collected elsewhere in the plugin


So as a better fix, I would suggest identifying what file patterns must be collected or what file patterns to exclude from collection.

Comment 6 Alex Stupnikov 2019-03-18 12:55:12 UTC
Hello Pavel.

First, I would like to provide my opinion about contents of /var/lib/neutron folder and will set NEEDINFO flags for my network-related peers in other locations.

- /var/lib/neutron/lock folder. We don't need this files, a simple list will be enough.
- /var/lib/neutron/dhcp folder. We need full contents to troubleshoot DHCP agents.
- /var/lib/neutron/external/pids and /var/lib/neutron/ns-metadata-proxy/ folders. It is better to have this folders to check ns-metadata-proxy settings.
- /var/lib/neutron/external/ha_confs folder. We need all subfolders and pid files from this folder to troubleshoot keepalived.

It looks like we need most of the files anyway and can't get significant benefits from dropping some of them.

I like your workaround: most of our customers don't have huge number of neutron entities and they will not be affected by this bug anyway, so we have to give special treatment for those who do. It could even make sense to check a number of files in /var/lib/neutron folder and exclude it from collection if their number is too high.

BR, Alex S.

Comment 7 Chris Fields 2019-03-18 16:36:15 UTC
In comment #3 we said that "/var/lib/neutron/*sock" does not have the majority of the 800K files.  But, we did not say what does as far as I can tell.  Masahiro do you have the answer to that question - where are the majority of these files coming from?  A potential solution could be to avoid the problem directory or file pattern in all sosreports OR avoid only when file count is too high OR maybe some other condition.

Comment 8 Masahiro Matsuya 2019-03-19 07:11:33 UTC
Hello,

The following is the counts of all files under each major directories in /var/lib/neutron.


876324 /var/lib/neutron

--------------------------------
   978 /var/lib/neutron/dhcp

   164 /var/lib/neutron/external

   874 /var/lib/neutron/ha_confs

    69 /var/lib/neutron/lbaas

874229 /var/lib/neutron/lock

     8 /var/lib/neutron/ns-metadata-proxy
---------------------------------

/var/lib/neutron itself has two socket files.
 srwxr-xr-x.      1 neutron neutron        0 Feb 12 19:18 keepalived-state-change
 srw-r--r--.      1 neutron neutron        0 Feb 12 19:17 metadata_proxy


  978 + 164 + 874 + 69 + 874229 + 8 + 2 = 876324

The majority of the files are in /var/lib/neutron/lock.

Comment 9 Chris Fields 2019-03-19 16:24:11 UTC
Thanks Masahiro.  It looks to me like the simplest solution, based on Alex's suggestion in comment #6, is to give a list of files in /var/lib/neutron/lock folder without collecting them.  Can we do that?  

thanks

Chris Fields
GSS

Comment 10 Pavel Moravec 2019-03-19 17:30:46 UTC
(In reply to Chris Fields from comment #9)
> Thanks Masahiro.  It looks to me like the simplest solution, based on Alex's
> suggestion in comment #6, is to give a list of files in
> /var/lib/neutron/lock folder without collecting them.  Can we do that?  
> 
> thanks
> 
> Chris Fields
> GSS

It is possible e.g. via patch like:

diff --git a/sos/plugins/openstack_neutron.py b/sos/plugins/openstack_neutron.py
index 9ae741f3..d9f444c5 100644
--- a/sos/plugins/openstack_neutron.py
+++ b/sos/plugins/openstack_neutron.py
@@ -41,7 +41,12 @@ class OpenStackNeutron(Plugin):
             self.var_puppet_gen + "/etc/default/neutron-server",
             self.var_puppet_gen + "/etc/my.cnf.d/tripleo.cnf"
         ])
+        # copy whole /var/lib/neutron except for potentially huge lock subdir;
+        # rather take list of files in the dir
         self.add_copy_spec("/var/lib/neutron/")
+        self.add_forbidden_path("/var/lib/neutron/lock")
+        self.add_cmd_option("ls -laZR /var/lib/neutron/lock")
+
         if self.get_option("verify"):
             self.add_cmd_output("rpm -V %s" % ' '.join(self.packages))
 

(let me know if different list than "ls -laZR /var/lib/neutron/lock" is required, this can be too descriptive).

Note that by a rule of thumb, that command running on 800k directory files will take approx. 25 seconds, if it is acceptable for you.

With a silence here for some time, I will propose the patch to upstream post 3.7 madness (in a week or two).

Comment 11 Chris Fields 2019-03-19 21:18:45 UTC
Thanks Pavel, sounds good to me.

Comment 12 Pavel Moravec 2019-03-21 13:08:48 UTC
POSTed to upstream, just on time it will be in 7.7, still :)

Comment 16 errata-xmlrpc 2019-08-06 13:15:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2295


Note You need to log in before you can comment on or make changes to this bug.