Bug 1743425
| Summary: | Ceph logs not captured in an OpenStack deployment | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Goutham Pacha Ravi <gouthamr> |
| Component: | sos | Assignee: | Pavel Moravec <pmoravec> |
| Status: | CLOSED ERRATA | QA Contact: | Miroslav HradĂlek <mhradile> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.7 | CC: | agk, bmr, gfidente, jjansky, johfulto, lkuchlan, mhradile, plambri, pmoravec, sbradley, tbarron, tshefi, vimartin |
| Target Milestone: | rc | Keywords: | OtherQA |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | sos-3.9-2.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-09-29 20:55:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Goutham Pacha Ravi
2019-08-19 23:19:48 UTC
Here's a related bug on the OpenStack side to persist logs onto the host ceph/controller nodes: https://bugs.launchpad.net/tripleo/+bug/1721841 By default, system journal logs are not persistent; which makes things harder if the logs have been purged between an incident, and the sos report collection. As I dont speak Ceph/OpenStack/.. language, could you please clarify: - what logs are missing to be collected? Cf. with "add_copy_spec" from https://github.com/sosreport/sos/blob/master/sos/plugins/ceph.py . - what should trigger collection of those logs (but does not do so now)? Is there some package on "Some ceph processes (MDS, nfs-ganesha) are deployed on OpenStack controller nodes." that's presence can be used as the trigger? (sosreport consists of plugins like ceph. Each plugin collects independent data on other plugins and the plugin is automatically triggered by 1) presence of a file, 2) presence of a package, 3) kernel module loaded) Hi Pavel, (In reply to Pavel Moravec from comment #3) > As I dont speak Ceph/OpenStack/.. language, could you please clarify: > > - what logs are missing to be collected? Cf. with "add_copy_spec" from > https://github.com/sosreport/sos/blob/master/sos/plugins/ceph.py . Ceph processes deployed with ceph-ansible do not persist log files in local storage (yet, please see https://bugs.launchpad.net/tripleo/+bug/1721841). I think we'll need to "add_journal" and grab the logs for now, since they're being written to the host's journal: $ sudo journalctl CONTAINER_NAME=ceph-mds-$HOSTNAME $ sudo journalctl CONTAINER_NAME=ceph-mon-$HOSTNAME $ sudo journalctl CONTAINER_NAME=ceph-mgr-$HOSTNAME $ sudo journalctl CONTAINER_NAME=ceph-nfs-pacemaker > - what should trigger collection of those logs (but does not do so now)? Is > there some package on "Some ceph processes (MDS, nfs-ganesha) are deployed > on OpenStack controller nodes." that's presence can be used as the trigger? > (sosreport consists of plugins like ceph. Each plugin collects independent > data on other plugins and the plugin is automatically triggered by 1) > presence of a file, 2) presence of a package, 3) kernel module loaded) Yes, I think the trigger would be testing the presence of the systemd units: $ sudo systemctl status ceph-nfs@pacemaker $ sudo systemctl status ceph-mds@$HOSTNAME $ sudo systemctl status ceph-mon@$HOSTNAME $ sudo systemctl status ceph-mgr@$HOSTNAME Thanks for prompt feedback. So to confirm the change: ceph plugin will newly: - be enabled (i.e. automatically run when ..) _also_ by presence of _either_ of the service ceph-nfs@pacemaker or ceph-mds@$HOSTNAME or ceph-mon@$HOSTNAME or ceph-mgr@$HOSTNAME - collect those four journalctl commands - is the command "journalctl CONTAINER_NAME=ceph-nfs-pacemaker" equivalent to "journalctl --unit CONTAINER_NAME=ceph-nfs-pacemaker" or similar? Can be the CONTAINER_NAME=.. rephrased by either option --unit / --boot / --since / --until / --lines / --output / --identifier (this is what add_journal method supports as arguments) ? (In reply to Pavel Moravec from comment #5) > Thanks for prompt feedback. So to confirm the change: ceph plugin will newly: > > - be enabled (i.e. automatically run when ..) _also_ by presence of _either_ > of the service ceph-nfs@pacemaker or ceph-mds@$HOSTNAME or > ceph-mon@$HOSTNAME or ceph-mgr@$HOSTNAME > - collect those four journalctl commands > - is the command "journalctl CONTAINER_NAME=ceph-nfs-pacemaker" equivalent > to "journalctl --unit CONTAINER_NAME=ceph-nfs-pacemaker" or similar? Can be > the CONTAINER_NAME=.. rephrased by either option --unit / --boot / --since / > --until / --lines / --output / --identifier (this is what add_journal method > supports as arguments) ? The systemd units are however named *slightly* differently, for --unit, we'll need to use: $ sudo journanctl --unit ceph-nfs@pacemaker $ sudo journanctl --unit ceph-mds@$HOSTNAME $ sudo journanctl --unit ceph-mon@$HOSTNAME $ sudo journanctl --unit ceph-mgr@$HOSTNAME You might want to look at bug 1710548 Upstream PR raised, tentatively scheduled to 7.9. Hello, thanks for the availability to test the fix. Please use below repository / package: A yum repository for the build of sos-3.9-2.el7 (task 28860092) is available at: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9/2.el7/ You can install the rpms locally by putting this .repo file in your /etc/yum.repos.d/ directory: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9/2.el7/sos-3.9-2.el7.repo RPMs and build logs can be found in the following locations: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9/2.el7/noarch/ The full list of available rpms is: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9/2.el7/noarch/sos-3.9-2.el7.src.rpm http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9/2.el7/noarch/sos-3.9-2.el7.noarch.rpm The repository will be available for the next 60 days. Scratch build output will be deleted earlier, based on the Brew scratch build retention policy. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sos bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4034 |