2108228 – sosreport logs from ODF cluster mangled

Bug 2108228 - sosreport logs from ODF cluster mangled

Summary: sosreport logs from ODF cluster mangled

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	5.2
Hardware:	All
OS:	All
Priority:	high
Severity:	low
Target Milestone:	---
Target Release:	6.1
Assignee:	Patrick Donnelly
QA Contact:	Pawan
Docs Contact:	Akash Raj
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-07-18 16:10 UTC by Patrick Donnelly
Modified:	2023-12-03 04:25 UTC (History)
CC List:	16 users (show)
Fixed In Version:	ceph-17.2.6-29.el9cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-06-15 09:15:33 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	57923	None	None	None	2022-10-26 00:07:18 UTC
Red Hat Bugzilla	1996599	medium	CLOSED	Extra newlines in logged container output #242	2024-12-20 20:47:58 UTC
Red Hat Bugzilla	2107110	medium	CLOSED	[GSS] MDS is behind on trimming and it's degraded	2023-12-08 04:29:36 UTC
Red Hat Issue Tracker	RHCEPH-5489	None	None	None	2022-10-25 14:21:27 UTC
Red Hat Product Errata	RHSA-2023:3623	None	None	None	2023-06-15 09:16:56 UTC

Description Patrick Donnelly 2022-07-18 16:10:24 UTC

Description of problem (please be detailed as possible and provide log
snippests):

I'm looking at an sosreport for a customer case, 03260374, from BZ#2107110. The log is completely mangled:

/cases/03260374/0150-sosreport-pqcn01s3346-03260374-2022-07-18-phtvxej.tar.xz/sosreport-pqcn01s3346-03260374-2022-07-18-phtvxej/sos_strings/container_log/host.var.log.containers.rook-ceph-mon-f-7f5657845f-52tzc_openshift-storage_mon-6a521205633787df3fcf6195cca02eea43857c23be0ab079f03c1842910b80ac.log.tailed


> 2022-07-18T14:29:22.526642827+00:00 stderr P cluster 2022-07-
> 2022-07-18T14:29:22.526646823+00:00 stderr P 18T14:29:22.439682
> 2022-07-18T14:29:22.526650738+00:00 stderr P +0000 mon.b (mon.0) 249139
> 2022-07-18T14:29:22.526654507+00:00 stderr F  : cluster [DBG] fsmap ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby
> 2022-07-18T14:29:22.526658291+00:00 stderr P cluster
> 2022-07-18T14:29:22.526662146+00:00 stderr P 2022-07-18T14
> 2022-07-18T14:29:22.526665917+00:00 stderr P :29:22.454390
> 2022-07-18T14:29:22.526669648+00:00 stderr P +0000 mon.b (mon.0
> 2022-07-18T14:29:22.526673500+00:00 stderr P ) 249140 : cluster [DBG]
> 2022-07-18T14:29:22.526677415+00:00 stderr F fsmap ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
> 2022-07-18T14:29:23.537571122+00:00 stderr P debug
> 2022-07-18T14:29:23.537611424+00:00 stderr P 2022-07-18T14:29:23.536+0000 7f8a188b2700  0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch
> 2022-07-18T14:29:23.537622756+00:00 stderr F
> 2022-07-18T14:29:23.537675913+00:00 stderr P debug
> 2022-07-18T14:29:23.537689376+00:00 stderr P 2022-07-18T14:29:23.536+0000 7f8a188b2700  0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished
> 2022-07-18T14:29:23.537699437+00:00 stderr F
> 2022-07-18T14:29:23.542529664+00:00 stderr P cluster
> 2022-07-18T14:29:23.542575753+00:00 stderr P
> 2022-07-18T14:29:23.542593114+00:00 stderr P 2022
> 2022-07-18T14:29:23.542601668+00:00 stderr P -
> 2022-07-18T14:29:23.542610583+00:00 stderr P 07
> 2022-07-18T14:29:23.542618917+00:00 stderr P -
> 2022-07-18T14:29:23.542626823+00:00 stderr P 18
> 2022-07-18T14:29:23.542634676+00:00 stderr P T
> 2022-07-18T14:29:23.542642573+00:00 stderr P 14
> 2022-07-18T14:29:23.542650372+00:00 stderr P :

It's virtually unusable at this point.


Version of all relevant components (if applicable):

4.9

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 2 Travis Nielsen 2022-10-17 17:02:56 UTC

Rook configures ceph to capture the logs with the pod logs, but beyond that it's all up to standard K8s and the container runtime to manage the actual logs. I can only suspect an issue in the container runtime that would affect the logs like this, but not much we can do at the ODF/Rook layer. 

Patrick If we can get consistent repro steps it would help narrow it down, but likely was just seen in this one sos report?

Comment 5 Patrick Donnelly 2022-10-26 00:14:54 UTC

https://github.com/ceph/ceph/pull/48623

Comment 6 Veera Raghava Reddy 2022-12-18 05:41:46 UTC

PR is merged upstream. Proposing this to be backported to 6.1

Comment 30 errata-xmlrpc 2023-06-15 09:15:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3623

Comment 31 Red Hat Bugzilla 2023-12-03 04:25:07 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.