Bug 1598001
| Summary: | Failed to expired reports when the reports table grow too large | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Hao Chang Yu <hyu> |
| Component: | Reporting | Assignee: | Lukas Zapletal <lzap> |
| Status: | CLOSED ERRATA | QA Contact: | Jan HutaĆ <jhutar> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.3.2 | CC: | inecas, jhutar, lzap, mhulan, oprazak, pcreech, spetrosi |
| Target Milestone: | 6.4.0 | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Release Note | |
| Doc Text: |
Starting from Satellite 6.4, the cron job to delete old reports is reconfigured to delete reports in batches of 1000 records with a fractional delay between tasks. This reduces the likelihood of updating workers becoming blocked.
After the upgrade, monitor the number of reports in the database and the output of the report expiration tasks. In case of concurrency problems, update the check-in time for both the Puppet client, which is 30 minutes by default, and RHSM, which is four hours by default. This decreases the load on Satellite Server.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-16 18:55:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 2
Satellite Program
2018-07-04 08:29:38 UTC
Upstream bug assigned to lzap Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/23623 has been resolved. For googlers, this bug reports database transaction deadlocks. This is because on higher-loaded Satellite 6 servers incoming reports are being saved into database while rake task is attempting to acquire exclusive locks on three tables to delete data. One or another process (Satellite 6 request or rake task) is usually kicked out. We changed to rake task to delete data in smaller batches (configurable, by default 1k reports) and put a sleep (0.2 second) in between batches so SQL server can process incoming requests. This should lower amount of deadlocks from Satellite 6 requests. This also effectively makes expiration task SLOWER, it can also still error out with deadlock, this BZ does not aim to completely fix it as it is technically not possible. This kind of data (high-volume of non-critical data - reports) does not belong to SQL database in my opinion and the real solution would be to store them outside of the relation database or at least in different form (normal form is subideal). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2927 |