Bug 226897
Summary: | Errata Processing Slowness (?) causing number of problems | ||
---|---|---|---|
Product: | Red Hat Satellite 5 | Reporter: | Máirín Duffy <duffy> |
Component: | Provisioning | Assignee: | Kevin A. Smith <ksmith> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Beth Nackashi <bnackash> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | unspecified | CC: | akrherz, averma, edsall, inode0, pyaduvan, rhn-bugs, schlegel, vgaikwad |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rhn500h | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-03-13 13:50:15 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 166615 |
Description
Máirín Duffy
2007-02-01 21:42:09 UTC
From the specific examples above it seems that this bug does not affect whether or not the errata are actually scheduled / processed by auto-errata update. Rather, it seems to create an incorrect display of information for non-auto-errata update systems in the webui. One more comment! May be related to bug 201349 More reports/details on this bug available in the thread: https://www.redhat.com/archives/nahant-list/2007-February/msg00058.html as mentioned in the mailing list thread, running up2date -p on the system manually seems to correct the problem in the web ui... not sure how long it is effective for though. I am researching this problem now and have a couple of theories: 1) Taskomatic errata processing is heavily serialized - We process errata one at a time and one org at a time per errata. This is obviously far from optimal and not at all how this should work. I think this is a historical vestige of the Perl version of Taskomatic and also reflects a desire to not "melt" the database when RHN had a much smaller DB server. 2) We had several problems with errata processing during the 410 release. The result was that several errata did not get autoupdate scheduling run for them. AFAIK - nothing was ever done to correct this, meaning that these errata were never rescheduled. I need to research this to verify. I am currently modifying the ErrataQueue, ErrataCache, and ErrataMail tasks to be more parallel and hopefully substantially increase throughput. After that, I will be spelunking thru data to determine if anything needs to be done. Machines which have auto errata updates set are getting these updates. So it is not a matter of the update not being there. It is just a matter of whether the GUI is showing them as being there. This checkin attempts to address the problems reported. Specifically, it: * Introduces a generic threaded queue model which can be used by many Taskomatic tasks when parallelizing work processing is considered to be beneficial. * Ports the ErrataCache and ErrataQueue tasks to use the new threaded framework. This should fix many of the problems reported since errata processing is a largely serial process. This would result in out-of-date data displays in the UI until the task was able to process all its work items. This delay could be sizable given the large queues which can develop during large errata releases. Each of these tasks will default to 2 worker threads unless configured to use more, if needed. * Ports the RepollEntitlement task to use the new threaded queues as well. I have done significant testing on webdev to insure that repoll continues to work as expected. Future QA pushes prior to RHN 500 GA should smoke out any issues as we have at least one more bulk repoll to do. Taskomatic was missing updates for some systems which have auto-updates enabled. Checked in a fix for this. Should be part of the next QA push. The testing used for comments #18 and #19 should be used again. onqa onqa for reals My auto-update systems are not scheduling errata. I registered a system (rlx-2-16.rhndev.redhat.com) to the iowastate account, set it to auto-update, pushed an erratum, and noticed that the erratum applies to that system. However the update is not getting auto-scheduled. Yeah, that's not totally unexpected, unfortunately. I just found a bug in Taskomatic a few minutes ago which could cause this task to fail before the updates have been scheduled. *sigh* Next QA push should have a good fix in it. As I mentioned before, what is the status concerning the errata even appearing in the Web GUI? I can't schedule an update if it isn't showing up? Hi Dave, we are currently working on the issue. The workaround is to run rhn_check manually on the affected systems - you might want to try running rhn_check in a cron script on the systems in the meantime while we work on this. Hope this helps. OK. Fix has just been committed which should handle the DB errors. Also, it appears that rebuilding the errata cache might fix the UI display problems as well. Running down why that would help now. onqa I have retested this to the best of my knowledge. Please re-open if the issue resurfaces after rhn500h is released. Reopening the bug due to another use case which I just found. There are two basic ways a given server's errata cache can be recalculated: 1) User logs into RHN - This only works for orgs with few servers (I think the limit is less than 30). 2) Server's setup is changed - This generally happens when a server registers, errata is pushed to a channel to which the server is subscribed, or a server's base channel is changed. The fix verified only addressed the 2nd use case not the first. I have modified the fix to handle both use cases. Suggested Test Plans: There needs to be two test plans: 1) The same testing used to verify the bug previously. 2) Simulate use case #1: a) Find a user for an org with less than 30 servers. b) Register another server. c) Wait 15-20 mins. This gives Taskomatic time to run and do its thing. d) Open a terminal connected to rhnjava.back-webqa.redhat.com. Watch Tomcat's log with this command: tail -f /var/log/tomcat5/catalina.out e) Login as the user selected in step a. f) Verify that no errors were logged by Tomcat during the login. Inspect the server registered in step b and verify that relevant errata are applied and scheduled. onqa Scenario smoked out an issue with the ErrataQueue task which would cause it to not process errata records. :( Code checked in this evening addresses that issue so that _should_ fix scenario one. One thing to not is that ErrataQueue processing is not fast. It might take up to 30 mins for it to process a single errata. You can monitor the process, though, by looking at the rhnErrataQueue table like so: select count(*) from rhnErrataQueue; or select * from rhnErrataQueue; Also the fix for ErrataQueue has increased the memory requirements for Taskomatic. I was seeing regular OOMs with the current min/max heap sizes. I've bumped up the memory to 192mb and hope that will be enough. pushed this fix to webqa. Imported: RHSA-2007:0009 Will check on status in the morning. I re-ran my original tests from yesterday: Scenario 1 works. Scenario 2 works. Is it my imagination or is this running *a lot* faster than before? Thanks for all your work on these issues. With any luck our problem resolution may improve things for others too. We really appreciate your efforts on this. Closed in rhn500h Release. |