Bug 1683877
Summary: | sosreport timed out with collecting /var/lib/neutron with more than 800000 files in openstack_neutron plugin | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Masahiro Matsuya <mmatsuya> |
Component: | sos | Assignee: | Pavel Moravec <pmoravec> |
Status: | CLOSED ERRATA | QA Contact: | Miroslav HradĂlek <mhradile> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.6 | CC: | agk, akaris, astupnik, bmr, cfields, dhill, gavin, jraju, mmatsuya, plambri, pmoravec, sbradley |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | sos-3.7-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-06 13:15:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 2
Pavel Moravec
2019-03-01 11:46:20 UTC
Hello Pavel. /var/lib/neutron folder contain important information about SDN infrastructure: for example, dnsmasq/keepalived configuration files and temporary data are stored there. We need such information in sosreports to troubleshoot DHCP assignments and HA issues and can't exclude this directory from sosreports. But it is also true that we don't need its full contents. I would like to propose simple logic: create a single file that will list all empty files in /var/lib/neutron dir (lock, sock, etc) with appropriate comment at its head that will clarify its contents. BR, Alex. (In reply to Alex Stupnikov from comment #4) > Hello Pavel. > > /var/lib/neutron folder contain important information about SDN > infrastructure: for example, dnsmasq/keepalived configuration files and > temporary data are stored there. We need such information in sosreports to > troubleshoot DHCP assignments and HA issues and can't exclude this directory > from sosreports. But it is also true that we don't need its full contents. Could you identify what content is needed, then? I.e. what file patterns cover all needed data, or what file patterns we should blacklist in file collection as they are not required and they can cause the boost of files in the dir? > > I would like to propose simple logic: create a single file that will list > all empty files in /var/lib/neutron dir (lock, sock, etc) with appropriate > comment at its head that will clarify its contents. You mean sosreport should: - first collect output of "ls -lRa /var/lib/neutron" (or "find /var/lib/neutron" or some other cmd to determine also file type?) - or *exactly* collect list of empty files? (what will be the purpose, then? sos will timeout, you will still miss even filenames of interesting files in the dir) - even then copy (still whole?) content of the directory - such that if this will timeout due to excessive amount of files, we/you will have at least listing of them I see this as sort of workaround: - you still will have to wait for plugin timeout to happen (redundantly slow sosreport run) - sosreport might skip collecting some important files from that dir - optionally, sosreport might skip collecting some other important files, asked to be collected elsewhere in the plugin So as a better fix, I would suggest identifying what file patterns must be collected or what file patterns to exclude from collection. Hello Pavel. First, I would like to provide my opinion about contents of /var/lib/neutron folder and will set NEEDINFO flags for my network-related peers in other locations. - /var/lib/neutron/lock folder. We don't need this files, a simple list will be enough. - /var/lib/neutron/dhcp folder. We need full contents to troubleshoot DHCP agents. - /var/lib/neutron/external/pids and /var/lib/neutron/ns-metadata-proxy/ folders. It is better to have this folders to check ns-metadata-proxy settings. - /var/lib/neutron/external/ha_confs folder. We need all subfolders and pid files from this folder to troubleshoot keepalived. It looks like we need most of the files anyway and can't get significant benefits from dropping some of them. I like your workaround: most of our customers don't have huge number of neutron entities and they will not be affected by this bug anyway, so we have to give special treatment for those who do. It could even make sense to check a number of files in /var/lib/neutron folder and exclude it from collection if their number is too high. BR, Alex S. In comment #3 we said that "/var/lib/neutron/*sock" does not have the majority of the 800K files. But, we did not say what does as far as I can tell. Masahiro do you have the answer to that question - where are the majority of these files coming from? A potential solution could be to avoid the problem directory or file pattern in all sosreports OR avoid only when file count is too high OR maybe some other condition. Hello, The following is the counts of all files under each major directories in /var/lib/neutron. 876324 /var/lib/neutron -------------------------------- 978 /var/lib/neutron/dhcp 164 /var/lib/neutron/external 874 /var/lib/neutron/ha_confs 69 /var/lib/neutron/lbaas 874229 /var/lib/neutron/lock 8 /var/lib/neutron/ns-metadata-proxy --------------------------------- /var/lib/neutron itself has two socket files. srwxr-xr-x. 1 neutron neutron 0 Feb 12 19:18 keepalived-state-change srw-r--r--. 1 neutron neutron 0 Feb 12 19:17 metadata_proxy 978 + 164 + 874 + 69 + 874229 + 8 + 2 = 876324 The majority of the files are in /var/lib/neutron/lock. Thanks Masahiro. It looks to me like the simplest solution, based on Alex's suggestion in comment #6, is to give a list of files in /var/lib/neutron/lock folder without collecting them. Can we do that? thanks Chris Fields GSS (In reply to Chris Fields from comment #9) > Thanks Masahiro. It looks to me like the simplest solution, based on Alex's > suggestion in comment #6, is to give a list of files in > /var/lib/neutron/lock folder without collecting them. Can we do that? > > thanks > > Chris Fields > GSS It is possible e.g. via patch like: diff --git a/sos/plugins/openstack_neutron.py b/sos/plugins/openstack_neutron.py index 9ae741f3..d9f444c5 100644 --- a/sos/plugins/openstack_neutron.py +++ b/sos/plugins/openstack_neutron.py @@ -41,7 +41,12 @@ class OpenStackNeutron(Plugin): self.var_puppet_gen + "/etc/default/neutron-server", self.var_puppet_gen + "/etc/my.cnf.d/tripleo.cnf" ]) + # copy whole /var/lib/neutron except for potentially huge lock subdir; + # rather take list of files in the dir self.add_copy_spec("/var/lib/neutron/") + self.add_forbidden_path("/var/lib/neutron/lock") + self.add_cmd_option("ls -laZR /var/lib/neutron/lock") + if self.get_option("verify"): self.add_cmd_output("rpm -V %s" % ' '.join(self.packages)) (let me know if different list than "ls -laZR /var/lib/neutron/lock" is required, this can be too descriptive). Note that by a rule of thumb, that command running on 800k directory files will take approx. 25 seconds, if it is acceptable for you. With a silence here for some time, I will propose the patch to upstream post 3.7 madness (in a week or two). Thanks Pavel, sounds good to me. POSTed to upstream, just on time it will be in 7.7, still :) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2295 |