Bug 1982970

Summary: Fact updates causing unnecessary compliance recalculation in Candlepin
Product: Red Hat Satellite Reporter: Hao Chang Yu <hyu>
Component: CandlepinAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Lai <ltran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.9.0CC: bbuckingham, csnyder, ktordeur, mjia, nmoumoul, pdwyer, pmoravec, redakkan
Target Milestone: 6.11.0Keywords: FutureFeature, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: candlepin-4.0.17-1, candlepin-4.1.12-1, candlepin-4.2.1-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1991960 2044821 2060927 (view as bug list) Environment:
Last Closed: 2022-07-05 14:29:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1991960, 2044821, 2060927    
Bug Blocks:    

Description Hao Chang Yu 2021-07-16 07:04:07 UTC
Description of problem:
When a consumer has updated 1 or more facts, its compliance will be re-calculated by the Candlepin. This is a very expensive especially when there are many consumers (such as 20k+) registered to the Satellite and each of them have some facts update very frequently. Besides, this may also create many "compliance.created" events which can also give a lot of pressure to the messaging broker and eventually cause paging.

https://github.com/candlepin/candlepin/blob/master/src/main/java/org/candlepin/policy/js/compliance/hash/HashableStringGenerators.java#L216

Steps to Reproduce:
1. On the client, ensure the facts are updated

subscription-manager facts --update

2. Set any custom fact in /etc/rhsm/facts/
3. Stop the rhsmcertd service. we will trigger it manually later

systemctl stop rhsmcertd

4. On Satellite, tail the candlepin audit log

tail -f /var/log/candlepin/audit.log

5. On the client, run rhsmcertd immediately.

rhsmcertd -n

6. Wait for 1 mins and then kill the rhsmcertd process

Actual results:
### Recalculated twice here ###
2021-07-16 16:42:12,531 principalType=trusteduser principal=foreman_admin target=COMPLIANCE entityId=f8e8f51a-c20e-4d42-aa6d-0f5d621b530d type=CREATED owner=8ac705086e2c97c4016e2c9863b60001 eventData={"reasons":[],"status":"valid"}
2021-07-16 16:42:13,970 principalType=trusteduser principal=foreman_admin target=COMPLIANCE entityId=f8e8f51a-c20e-4d42-aa6d-0f5d621b530d type=CREATED owner=8ac705086e2c97c4016e2c9863b60001 eventData={"reasons":[],"status":"valid"}

2021-07-16 16:42:13,977 principalType=trusteduser principal=foreman_admin target=SYSTEM_PURPOSE_COMPLIANCE entityId=f8e8f51a-c20e-4d42-aa6d-0f5d621b530d type=CREATED owner=8ac705086e2c97c4016e2c9863b60001 eventData={"nonCompliantUsage":null,"compliantAddOns":{},"nonCompliantRole":null,"reasons":[],"compliantSLA":{},"nonCompliantAddOns":[],"compliantRole":{},"nonCompliantSLA":null,"compliantUsage":{},"status":"not specified"}

2021-07-16 16:42:13,983 principalType=trusteduser principal=foreman_admin target=CONSUMER entityId=8ac705087a3e1670017a3e1796520001 type=MODIFIED owner=8ac705086e2c97c4016e2c9863b60001 eventData=null

### Recalculate compliance again!! This is also a bug. It seems that "syspurpose compliance" and the "subscription compliance" are sharing the same "compliancestatushash" column in the cp_consumer table so after calculating the syspurpose compliance, Candlepin will replace the column with its digest. ###
2021-07-16 16:42:19,315 principalType=trusteduser principal=foreman_admin target=COMPLIANCE entityId=f8e8f51a-c20e-4d42-aa6d-0f5d621b530d type=CREATED owner=8ac705086e2c97c4016e2c9863b60001 eventData={"reasons":[],"status":"valid"} 


Expected results:
In my opinion, the compliance shouldn't be re-calculated on every fact updates. Or maybe it should only be re-calculated when a certain facts that Candlepin cares has changed.


Additional info:
Let me know if you need a separate bugzilla for the syspurpose compliance hash issue above.

Comment 2 Nikos Moumoulidis 2021-12-16 14:09:43 UTC
While this is something we are investigating and planning to fix, it looks to be more of a performance enhancement than a bug,
so I would not consider backporting to Satellite 6.9; fixing in 6.10+ seems more appropriate.

Comment 4 Nikos Moumoulidis 2021-12-21 15:52:13 UTC
(In reply to Hao Chang Yu from comment #0)
> ### Recalculate compliance again!! This is also a bug. It seems that
> "syspurpose compliance" and the "subscription compliance" are sharing the
> same "compliancestatushash" column in the cp_consumer table so after
> calculating the syspurpose compliance, Candlepin will replace the column
> with its digest. ###

Hi Hao,

You were right about this. Apparently there are 2 different columns for these, but one of them (compliancestatushash) is being shared right now,
while the other is being ignored. Can you please file a separate bug for that? That bug alone should be easy to fix, while the general 
unnecessary compliance calculations reduction, which needs a bit of a redesign effort, will be tracked in this bug.

Thanks,
Nikos

Comment 5 Nikos Moumoulidis 2022-01-25 11:27:11 UTC
(In reply to Hao Chang Yu from comment #0)
> ### Recalculate compliance again!! This is also a bug. It seems that
> "syspurpose compliance" and the "subscription compliance" are sharing the
> same "compliancestatushash" column in the cp_consumer table so after
> calculating the syspurpose compliance, Candlepin will replace the column
> with its digest. ###

FYI I have created https://bugzilla.redhat.com/show_bug.cgi?id=2044944 and https://bugzilla.redhat.com/show_bug.cgi?id=2044946 for fixing the hash column overwrite,
and we will use this Satellite bug for a longer term effort of reducing the compliance recalculations.

Comment 6 Nikos Moumoulidis 2022-02-28 09:59:20 UTC
For fixing the sub-issue of unnecessary compliance.created event generation (but not the compliance recalculation itself), I have created the following:
https://bugzilla.redhat.com/show_bug.cgi?id=2059131
https://bugzilla.redhat.com/show_bug.cgi?id=2059135
https://bugzilla.redhat.com/show_bug.cgi?id=2059137

Comment 10 Lai 2022-06-07 08:45:41 UTC
Steps to Retest:
1. Get a client machine up and running and register to satellite (I used a capsule)
2. Enable the rhsmcertd: systemctl enabl rhsmcertd
3. Set a custom fact in /etc/rhsm/facts/capsule.fact with one of the following (I did uname.machine -> echo '{"uname.machine": "bobby"}') so that "target:COMPLIANCE" can be triggered:

cpu.core(s)_per_socket
memory.memtotal
uname.machine
band.storage.usage
cpu.cpu_socket(s)
virt.is_guest

4. On the client, ensure the facts are updated

subscription-manager facts --update

5. Stop the rhsmcertd service. we will trigger it manually later

systemctl stop rhsmcertd

6. On Satellite, tail the candlepin audit log

tail -f /var/log/candlepin/audit.log

7. On the client, run rhsmcertd immediately.

rhsmcertd -n

6. Wait for 1 mins and then kill the rhsmcertd process

Expected result:
There shouldn't be a compliance recalculation right after the other in the same timeframe.

Actual result:
There isn't a compliance recalculation right after the other in the same timeframe

# tail -f /var/log/candlepin/audit.log
2022-06-07 04:27:20,431 principalType=trusteduser principal=foreman_admin target=SYSTEM_PURPOSE_COMPLIANCE entityId=ce3b20d2-81dc-4b85-a35e-769879a3f1b6 type=CREATED owner=8a818230812a075601812a0eee270001 eventData={"nonCompliantUsage":null,"compliantAddOns":{},"nonCompliantRole":null,"reasons":[],"nonCompliantServiceType":null,"compliantSLA":{},"nonCompliantAddOns":[],"compliantRole":{},"nonCompliantSLA":null,"compliantUsage":{},"status":"not specified","compliantServiceType":{}}
2022-06-07 04:27:20,439 principalType=trusteduser principal=foreman_admin target=COMPLIANCE entityId=ce3b20d2-81dc-4b85-a35e-769879a3f1b6 type=CREATED owner=8a818230812a075601812a0eee270001 eventData={"reasons":[],"status":"valid"}
2022-06-07 04:27:20,444 principalType=trusteduser principal=foreman_admin target=SYSTEM_PURPOSE_COMPLIANCE entityId=ce3b20d2-81dc-4b85-a35e-769879a3f1b6 type=CREATED owner=8a818230812a075601812a0eee270001 eventData={"nonCompliantUsage":null,"compliantAddOns":{},"nonCompliantRole":null,"reasons":[],"nonCompliantServiceType":null,"compliantSLA":{},"nonCompliantAddOns":[],"compliantRole":{},"nonCompliantSLA":null,"compliantUsage":{},"status":"not specified","compliantServiceType":{}}
2022-06-07 04:27:20,450 principalType=trusteduser principal=foreman_admin target=ENTITLEMENT entityId=037b97a6123d4ec3aa3ff047eb9dc7da type=CREATED owner=8a818230812a075601812a0eee270001 eventData=null
2022-06-07 04:29:41,419 principalType=trusteduser principal=foreman_admin target=CONSUMER entityId=8a81822d813d175401813d467b8c0e94 type=MODIFIED owner=8a818230812a075601812a0eee270001 eventData=null
2022-06-07 04:29:41,426 principalType=trusteduser principal=foreman_admin target=COMPLIANCE entityId=ce3b20d2-81dc-4b85-a35e-769879a3f1b6 type=CREATED owner=8a818230812a075601812a0eee270001 eventData={"reasons":[{"productName":"Red Hat Satellite Infrastructure Subscription","message":"Supports architecture aarch64,ia64,ppc,ppc64,ppc64le,s390,s390x,x86,x86_64 but the system is bobby machine."}],"status":"partial"}
2022-06-07 04:32:27,018 principalType=trusteduser principal=foreman_admin target=CONSUMER entityId=8a81822d813d175401813d467b8c0e94 type=MODIFIED owner=8a818230812a075601812a0eee270001 eventData=null
2022-06-07 04:32:27,037 principalType=trusteduser principal=foreman_admin target=COMPLIANCE entityId=ce3b20d2-81dc-4b85-a35e-769879a3f1b6 type=CREATED owner=8a818230812a075601812a0eee270001 eventData={"reasons":[{"productName":"Red Hat Satellite Infrastructure Subscription","message":"Supports architecture aarch64,ia64,ppc,ppc64,ppc64le,s390,s390x,x86,x86_64 but the system is some kind of name."}],"status":"partial"}
2022-06-07 04:40:13,635 principalType=trusteduser principal=foreman_admin target=CONSUMER entityId=8a81822d813d175401813d467b8c0e94 type=MODIFIED owner=8a818230812a075601812a0eee270001 eventData=null
2022-06-07 04:40:13,642 principalType=trusteduser principal=foreman_admin target=COMPLIANCE entityId=ce3b20d2-81dc-4b85-a35e-769879a3f1b6 type=CREATED owner=8a818230812a075601812a0eee270001 eventData={"reasons":[{"productName":"Red Hat Satellite Infrastructure Subscription","message":"Supports architecture aarch64,ia64,ppc,ppc64,ppc64le,s390,s390x,x86,x86_64 but the system is a name worth naming."}],"status":"partial"}

There is a couple of `target=COMPLIANCE` but if you notice at 4:40:13, there's only one and that's the most recent changes.  The other compliance was from past setup.

Verified on 6.11 snap 23 with candlepin-4.1.13-1.el8sat.noarch on rhel7 and rhel8

Comment 13 errata-xmlrpc 2022-07-05 14:29:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498