RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2043233 - SoS cleaner does not clean all sensitive things
Summary: SoS cleaner does not clean all sensitive things
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sos
Version: 8.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Pavel Moravec
QA Contact: Daniel Záležák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-20 20:43 UTC by toasty
Modified: 2023-09-18 04:30 UTC (History)
12 users (show)

Fixed In Version: sos-4.4-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-16 21:32:51 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github sosreport sos pull 2857 0 None open [cleaner] Use compiled regex lists for parsers by default 2022-02-28 07:10:23 UTC
Red Hat Issue Tracker RHELPLAN-109029 0 None None None 2022-01-20 20:55:18 UTC

Description toasty 2022-01-20 20:43:06 UTC
Description of problem:

After installing sos version 4.0-12.el8.4  on an IdM server, I run "sos report --clean" or "sos --clean" and I get a cleaned sos report in a tar file.  When I untar the cleaned  report, I find sensitive information left in the file content, directory names, and file names.



Version-Release number of selected component (if applicable):


    4.0-12.el8.4


How reproducible:


    Reproducible every time for an IdM server on RHEL 8.4


Steps to Reproduce (example.com is not the real domain since the real one is sensitive)


     yum -y install sos

    sos clean --domains example.com,ca.example.com,nj.example.com --keywords example.com, ca.example.com, nj.example.com sosreport.tar

    Number 2 generates a cleaned sosreport.tar.xz and a sosreport_private_map

    Untarred the sosreport.tar.xz

    cd into the sosreport directory

    grep -ir example.com . # EXAMPLE-COM was not obfuscated

    grep -ir 'dc=example,dc=com' . # dc=example,dc=com was not obfuscated

    grep -Er '192.168.' # 192.168.x.in-addr.arpa >8. records returned where x is a number after '192.168.'

    find . -type d > directories.txt

    grep -ir example.com directories.txt #EXAMPLE-COM was not obfuscated in the directories

    grep -ir 15.3.168.192 directories.txt #Returns 15.3.168.192.in-addr.arpa directories

    find . type f > filesnames.txt

    grep -ir example.com filenames.txt # EXAMPLE-COM was not obfuscated in the filenames


Actual results:


     domains and ip addresses are not getting obfuscated.


Expected results:


    all sensitive data should be obfuscated


Additional info:

None

Comment 2 Pavel Moravec 2022-01-20 21:04:00 UTC
There have been numerous improvements since sos 4.0 in cleaner. Could you please try current 8.6 candidate build sos-4.2-11 (available on brew or at https://people.redhat.com/pmoravec/sos-4.2-11/)?

Let us know if that build does not obfuscate something.

Comment 3 Bret 2022-01-21 13:13:04 UTC
Do you have this version available for RHEL 8.4?  Our client has all their machines on 8.4.

Comment 4 Pavel Moravec 2022-01-21 13:28:39 UTC
(In reply to Bret from comment #3)
> Do you have this version available for RHEL 8.4?  Our client has all their
> machines on 8.4.

Not exactly. Any sos-4.* can be used on any RHEL8, though officially we ship:
- sos-4.0-* on RHEL8.4
- sos-4.1-* on 8.5
- sos-4.2-* will be on 8.6

Does I understand the comments it is requested to deliver a fix in 8.4.z? Could you please elaborate why a fix in 8.6 (or newer) does not suffice / provide business justification for the z-stream (EUS even?) request?

Commenting individual points:

>     grep -ir example.com . # EXAMPLE-COM was not obfuscated
Since example-com neither EXAMPLE-COM was provided among keywords (not sure how much case-sensitive are we). While I understand Kerberos naming convention / equivalence, why sos-clean should assume any string.with.dots keyword must be a domain and even more a krb realm(?) ?

> 
>     grep -ir 'dc=example,dc=com' . # dc=example,dc=com was not obfuscated
> 
>     grep -Er '192.168.' # 192.168.x.in-addr.arpa >8. records returned where
> x is a number after '192.168.'
Those two should be obfuscated (and imho 4.2 does obfuscate them properly).

> 
>     find . -type d > directories.txt
> 
>     grep -ir example.com directories.txt #EXAMPLE-COM was not obfuscated in
> the directories
Again, example.com != EXAMPLE-COM .

> 
>     grep -ir 15.3.168.192 directories.txt #Returns 15.3.168.192.in-addr.arpa
> directories
File/directory-names should be obfuscated as well, again imho sos-4.2 does so.

> 
>     find . type f > filesnames.txt
> 
>     grep -ir example.com filenames.txt # EXAMPLE-COM was not obfuscated in
> the filenames
Again example.com != EXAMPLE-COM .


Let us know if you really need a 8.4 z-stream request (that we dont ave capacity to deliver in next 3 months, though), and how sos-4.2-11 behaves in those use cases (that will simplify our identification what fix needs to be backported to 4.0, if we would deliver a z-stream; plus it will also verify if 4.2 behaves properly or not).

Comment 5 Bret 2022-01-21 15:14:16 UTC
We do need a z-stream request for RHEL 8.4 and RHEL 8.5. We have several production servers that use RHEL 8.4 that use RH products IdM, Ansible Tower, Satellite, etc.  In order to do an OS upgrade,  the minimal steps that must happen are below:

1. successful testing of the products for the OS upgrade
2. getting the approval from the client to perform the OS upgrade.

This takes a great deal of time.  We are currently slated to upgrade IdM to 8.5 in the near future but not all servers will be upgraded due to the amount of testing needed.

Thanks for all your time

Comment 6 Bret 2022-01-27 13:59:52 UTC
Hi, I did the following on an IdM RHEL 8.4 box:

1. Downloaded the latest code from the "master/main" branch from https://github.com/sosreport/sos
2. Created a virtual environment
3. sourced the virtual environment
4. Ran 'pip install -r requirements.txt'
5. python3 setup.py install
6. Ran the sos report cleaning using the example below ('example.com' and the ip addresses have been obfuscated)

sos report --clean --domains location1.example.com,location3.example.com,location2.example.com,example.com --keywords location1.example.com,location3.example.com,location2.example.com,example.com,EXAMPLE-COM,50.168.192,13.168.192,13.16.172,42.168.192,13.17.172,180.168.192,10.168.192,10.16.172,10.17.172,"dc=example,dc=com"

7. #6 ran for a full day before I killed it.


Maybe I'm doing something wrong but was trying to get the latest sos cleaning on RHEL 8.4 until a z-stream request is created for RHEL 8.4 and RHEL 8.5.

Thanks for your time

Comment 7 Pavel Moravec 2022-02-01 07:58:40 UTC
(In reply to Bret from comment #6)
> Hi, I did the following on an IdM RHEL 8.4 box:
> 
> 1. Downloaded the latest code from the "master/main" branch from
> https://github.com/sosreport/sos
> 2. Created a virtual environment
> 3. sourced the virtual environment
> 4. Ran 'pip install -r requirements.txt'
> 5. python3 setup.py install
> 6. Ran the sos report cleaning using the example below ('example.com' and
> the ip addresses have been obfuscated)
> 
> sos report --clean --domains
> location1.example.com,location3.example.com,location2.example.com,example.
> com --keywords
> location1.example.com,location3.example.com,location2.example.com,example.
> com,EXAMPLE-COM,50.168.192,13.168.192,13.16.172,42.168.192,13.17.172,180.168.
> 192,10.168.192,10.16.172,10.17.172,"dc=example,dc=com"
> 
> 7. #6 ran for a full day before I killed it.
> 
> 
> Maybe I'm doing something wrong but was trying to get the latest sos
> cleaning on RHEL 8.4 until a z-stream request is created for RHEL 8.4 and
> RHEL 8.5.
> 
> Thanks for your time

Did the upstream sos already have these PRs merged?

https://github.com/sosreport/sos/pull/2823
https://github.com/sosreport/sos/pull/2826

They improve cleaner performance significantly, both merged 2 weeks ago.

Optionally, can you give me access to such reproducer machine and let me debug / gcore the running sosreport few times?

Comment 8 Bret 2022-02-01 12:06:54 UTC
Both were merged before I pulled the latest from github.com/sosreport/sos.  I can't give you access to the RHEL 8.4 FIPS DISA STIG machine because it is in a sensitive environment.

Comment 9 Pavel Moravec 2022-02-01 13:30:36 UTC
Could you please:
- run the sos report --clean with an extra argument "-vvv"
- wait few hours
- stop it
- /var/tmp/sos.* directory with work-in-progress content will remain on the disk; please provide the tmp* files from it, esp. the key one starting with:

2022-02-01 14:04:12,223 DEBUG: set sysroot to 'None' (default)

Let see what files obfuscation takes so much time.

(well there *are* areas of improvement, as e.g. I easily spotted https://github.com/sosreport/sos/issues/2839 just during wrting this comment..)

Comment 10 Bret 2022-02-01 17:58:24 UTC
I'm running it on one of the RHEL 8.4 FIPS and DISA STIG machines.  However, if those folders contain sensitive information, I won't be able to upload them.  The best I could do is get on a call where you asked me questions regarding the output.

Comment 11 Pavel Moravec 2022-02-02 09:35:22 UTC
(In reply to Bret from comment #10)
> I'm running it on one of the RHEL 8.4 FIPS and DISA STIG machines.  However,
> if those folders contain sensitive information, I won't be able to upload
> them.  The best I could do is get on a call where you asked me questions
> regarding the output.

There is a chicken-egg problem as in general I dont know what to fix if I miss any kind of reproducer or data to see the problem on.

I realized it should be sufficient to provide me - at least for start snippet of *one* such tmp* file, that will contain logs like:

..
2022-02-01 14:06:38,237 DEBUG: [cleaner:sosreport-pmoravec-rhel8-2022-02-01-sjjppwd] Obfuscating sos_commands/boot/ls_-lanR_.boot
2022-02-01 14:06:38,343 DEBUG: [cleaner:sosreport-pmoravec-rhel8-2022-02-01-sjjppwd] Obfuscating sos_commands/boot/lsinitrd
2022-02-01 14:06:39,054 DEBUG: [cleaner:sosreport-pmoravec-rhel8-2022-02-01-sjjppwd] Obfuscating sos_commands/boot/ls_-lanR_.sys.firmware
..

Just these log entries matching:

grep "\[cleaner" tmp*

are required, to understand why cleaner took so much time in summary. Either there were too many files being obfuscated, or some files took too long time to scrub (like the /etc/hostid I spotted), or their combination.

My goal is to identify this cause and have some example input demonstrating the same slowness that I can play with.

Such logs snippet can contain the only potentially sensitive information like filenames or commands executed. If I can get that logs snippet, great. If not, please analyze the logs snippet to let me know what files cleaning took too huge time.

Such analysis is very iterative over the logs snippets, I dont see much way how to do it over a call without an access to the log itself.

(well, I can prepare a script that will identify that, though it would consume some extra time..)

Comment 12 Bret 2022-02-03 12:00:42 UTC
Waiting on client to approve the logs being sent outside their organization.

Comment 14 Bret 2022-02-03 12:34:21 UTC
As a reminder, the server is a RHEL 8.4 FIPS and DISA STIG box.

Comment 15 Pavel Moravec 2022-02-06 14:46:19 UTC
cleaning journalctl_--no-pager_--catalog_--boot took 2 hours and journalctl_--no-pager took 1h45m. Until sosreport was called/configured with a huge logsize or all-logs, cleaning at most 100MB file is terribly slow here.


(what were "effective options now" in the same logfile, please? esp. log-size or all-logs)


systemctl_show_service_--all takes 1h14m while the content should be very few MBs the most - this sounds more strange than the above.


sos_reports/sos.json - I think there is a big space of improvement here for sos: we should replace e.g. hostnames or IP addresses in command parameters or filenames being collected. All such replacements are already kept in generated parsers' maps that we should "just" attempt to apply - but we dont need to attempt to identify *new* strings to obfuscate, which costs some time.


var/log/krb5kdc.log took 1h15m to clean - assuming the collected file has max 25MB (until log size was changed), this is too much. Maybe too many domain names strings being identified and replaced..?


var/log/ipaupgrade.log and var/log/ipareplica-install.log - same arguments like for krb5kdc.log .


var/log/secure* - maybe too many IP addresses causes many minutes of cleaning..?



Would it be possible to share *either* *obfuscated* file from the above? Or at least "grep obfuscated <file>" output? (I understand it might be denied but any more particular input of reproducer / training data is valuable)

Comment 16 Pavel Moravec 2022-02-06 14:49:15 UTC
Jake,

1) is my idea of "sos.json (and similar) does not need to detect *new* stings to obfuscate" sound and can it improve cleaner time?

2) (very side-question) any idea why the two IP addresses were failed as "does not appear to be an IPv4 or IPv6 interface" ?

Comment 17 Jake Hunsaker 2022-02-06 18:16:02 UTC
(In reply to Pavel Moravec from comment #16)
> Jake,
> 
> 1) is my idea of "sos.json (and similar) does not need to detect *new*
> stings to obfuscate" sound and can it improve cleaner time?
> 

`sos.json` is treated the same as any other file. It gets opened and read line by line, with each line being fed to `parse_line()` for each parser. `parse_line()` does the regex matching for anything that *looks* like a hostname/IP addresses/etc. In the case of hostnames, the determination goes something like this:

raw match -> does it look like it's in a domain we're cleaning? -> is it in a subdomain -> is it a new subdomain? -> is it a new host?

then we make a secondary pass for all known subdomains and hostnames in the string as a failsafe.

We could potentially add some logic to skip the first check and go straight to the secondary check, but honestly I'm not sure how much that would buy us.

> 2) (very side-question) any idea why the two IP addresses were failed as
> "does not appear to be an IPv4 or IPv6 interface" ?

We use the ipaddress library for this determination - specifically that error is raised from the `ipaddress.ip_interface()` call. The original line that caused the error would be helpful, but I'm not sure off the top of my head as I'm not super familiar with the inner workings of that library.

Comment 18 Pavel Moravec 2022-02-06 21:57:00 UTC
Hmm, right, that optimisation wont gain much.

Esp. if my playing with very limited reproducer found few inefficiencies at one place.

Cleaning a normal sos.json took me 15 seconds. I identified that 10 seconds is caused by SoSHostnameParser which of 9 seconds is caused by execution of https://github.com/sosreport/sos/blob/main/sos/cleaner/parsers/hostname_parser.py#L103-L113

(when I simply commented out these lines, I got 9s speedup of cleanup the *one* file).

That code is *very* inefficient, since it - for each line of each file, so really MANY times, it recalculates a few *static* data that shall be sufficient to generate *once* (*) per whole cleanup

1)
hosts = [h for h in self.mapping.dataset.keys() if '.' in h]   # we can generate the list just once (plus whenever mapping is changed)

2)
sorted(hosts, reverse=True, key=lambda x: len(x)):    # we shall store hosts already sorted

3)
fqdn = host
for c in '.-':
    fqdn = fqdn.replace(c, '_')     # we should have pregenerated list of *pairs* (host, fqdn), reverse-sorted by len of host, and skip re-regeneration of this

4)
sorted(self.short_names, reverse=True):    # again, something to pre-generate once


(*) not "once generated", but "once plus whenever self.mapping is updated" - but that is very few times per whole cleaner execution


I will try to come up with an improvement in the next few days (I hope).

Comment 19 Jake Hunsaker 2022-02-07 03:45:47 UTC
Yeah, this should be re-written in the same vein that we did for #2823. In fact we can probably "merge" the domain and short name checks into the same loop.

Comment 20 Bret 2022-02-07 15:35:21 UTC
We recently upgraded to RHEL 8.5 for the IdM cluster with sos release 4.1-9.el8_5.  The software seems to clean files within about 2-3 hours.  Also, everything seems to be pretty good with obfuscation.  However, this ticket or another should still explore why the latest upstream release of sos took so long and had to be killed.  

Thank you so much for all your time and support.

Comment 23 Bret 2022-06-14 12:31:39 UTC
Hi Pavel,

I won't be able to reproduce the same environment.  We were operating on RHEL 8.4 and I compiled the upstream release of sos at the time.  We have now moved to RHEL 8.6.  I can still take a look at providing the obfuscated "systemctl output" but would need the exact command you want output from.

v/r

Bret

Comment 24 Pavel Moravec 2022-08-03 20:23:14 UTC
Hello,
for the sake of RHEL8.8, we will deliver a cleaner performance improvement fix (the latest known / well described open issue).

To help sos QE resources, would you be able to verify if a candidate package does fix the bug properly?

I expect a candidate build to be ready in several weeks and there will be no rush to execute the verification.

Thanks in advance for potential cooperation.

Comment 26 Bret 2022-09-16 14:05:42 UTC
Hi Pavel,

The customer is on RHEL 8.6 with IdM.  I'm not sure when the customer will move on to RHEL 8.8.  However, when that time comes, I will try to run an sos report and clean the IdM server to see if performance is good.

Sorry for the delay but I have little control over when they migrate to 8.8.


v/r

Bret Mullinix

Comment 27 Pavel Moravec 2022-09-18 20:19:26 UTC
(In reply to Bret from comment #26)
> Hi Pavel,
> 
> The customer is on RHEL 8.6 with IdM.  I'm not sure when the customer will
> move on to RHEL 8.8.  However, when that time comes, I will try to run an
> sos report and clean the IdM server to see if performance is good.
> 
> Sorry for the delay but I have little control over when they migrate to 8.8.
> 
> 
> v/r
> 
> Bret Mullinix

sos package is quite independent on RHEL minor version, so the 8.8 candidate build can be safely run on 8.6 for sure. And we would welcome any such *check*, to verify there is no leftover in the bugfix.

If the point is "does Red Hat support it?" then the official statement is "not now, but if needed, it should not be a problem to make it supportable".

Comment 28 Bret 2022-09-19 12:39:03 UTC
Hi Pavel,

When the release 8.8 is available, can you give me instructions on installation (I would prefer to do this in a virtual environment so as not to affect the client's server) and the location of the install?  

Thanks for your time.


v/r

Bret

Comment 29 Pavel Moravec 2022-09-19 17:54:14 UTC
Sure I will notify you once 8.8 is available.

Comment 32 Pavel Moravec 2022-11-30 18:34:39 UTC
Hello,
could I kidnly ask for verification on some real system? We can just (time-consumingly) test it on a mocked system which is not the preffered way..

Or does #c28 imply you are waiting for RHEL8.8 GA to test it? (then, at least for internal purposes, one can use nightly builds of RHEL8.8 and test it against it - if the stuff for sos cleaner can be reproduced there the right way)

Comment 33 Bret 2022-11-30 20:51:08 UTC
Hi Pavel,

We are working on upgrading IdM to 8.9 this week.  However, we are getting some issues upgrading.   After the upgrade, we can try to test it.

Thanks for your time

v/r

Bret

Comment 34 Bret 2022-12-01 14:52:50 UTC
Hi Pavel,

We ran this on an IdM server for RHEL 6 trying to troubleshoot the upgrade to 8.9.  The sos report cleaning started around 2:00 PM EST and ran through the night.  This morning we came back and it was stuck.  The version of sos is 4.2-22.el8_6.

Thanks for your time

v/r

Bret

Comment 35 Bret 2022-12-01 14:53:47 UTC
Correction:

RHEL 8.6 not RHEL 6

Comment 36 Bret 2022-12-01 14:57:07 UTC
Can you add O'Neill Joseph (ojoseph) to the bug as the primary POC?  He is handling IdM upgrades.

Comment 38 Bret 2022-12-08 13:03:38 UTC
We still have not updated past RHEL 8.6 for our IdM nodes.  Waiting on fix.  O'Neill Joseph is now working on IdM and the sos reports.  Please add him as primary POC for this bug.

Comment 40 Bret 2022-12-08 13:19:04 UTC
Hi Pavel, 

We don't have RHEL 8.4 on IdM anymore.  We are in a middle of an upgrade to IdM.  Can you please ping O'Neill Joseph for any information you need?  I have moved on to other applications.

thanks for your time

v/r

Bret

Comment 42 Bret 2022-12-12 12:14:30 UTC
Hi Pavel,

Asked O'Neill Joseph to register with the system and provide me his username.

v/r

Bret

Comment 43 Daniel Záležák 2022-12-20 11:49:42 UTC
Switched to SanityOnly.

Still required OtherQA confirmation.

Comment 47 Pavel Moravec 2023-03-16 21:32:51 UTC
Closing the bugzilla as the fix has been delivered in sos-4.5.0-1.el8 released via https://access.redhat.com/errata/RHBA-2023:1300 errata.

Comment 49 Red Hat Bugzilla 2023-09-18 04:30:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.