Bug 1963917

Summary: [RFE] Data redaction for branch_info file
Product: Red Hat Enterprise Linux 8 Reporter: Olimp Bockowski <obockows>
Component: insights-clientAssignee: Christian Marineau <cmarinea>
Status: NEW --- QA Contact: Red Hat subscription-manager QE Team <rhsm-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.4CC: bfahr, cmarinea, gchamoul, gmccullo, kdixon, link, pakotvan
Target Milestone: betaKeywords: FutureFeature, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Olimp Bockowski 2021-05-24 12:05:51 UTC
Description of problem:
One can obfuscate hostname and also configure properly file-content-redaction.yaml to filter out a domain but still the branch_info includes Satellite's hostname/FQDN/domain part

Version-Release number of selected component (if applicable):
<= insights-client-3.1.3-2

How reproducible:
always

Steps to Reproduce:
1. edit /etc/insights-client/file-content-redaction.yaml
2. put for example

patterns:
  regex:
    - "obockows"


3. run: insights-client --no-upload
4. unpack soscleaner*
5. check branch_info file:

[root@rhel8 soscleaner-2354462507999389]# cat ./branch_info  | grep -o -P 'obockows.*?' 
obockows
obockows
obockows


Actual results:
There is still the exact string that is supposed to be filtered out

Expected results:
Even branch_info falls under limitations we would like to have

Additional info:

Comment 3 Link Dupont 2022-03-15 16:41:39 UTC
Will this cause any problems with host identification if the hostname is removed or obfuscated from within the branch_info file?

Comment 4 Link Dupont 2022-03-28 15:12:28 UTC
Olimp, I looked into this more, and our current obfuscation behavior only obfuscates files under the `./data` directory of an archive. The branch_info file exists at the root of the archive, so it is not included when obfuscating file content. Before we proceed with changing the obfuscation behavior to include branch_info, we'll need to make sure that the data in that file isn't relied upon by some service or services to identify a host.