Bug 1215671 - Satellite 5.7: not all erratas scheduled for systems with Auto Errata Update enabled when cloning these erratas via Errata -> Clone Errata
Summary: Satellite 5.7: not all erratas scheduled for systems with Auto Errata Update ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Server
Version: 570
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Stephen Herr
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: sat570-triage 1305157
TreeView+ depends on / blocked
 
Reported: 2015-04-27 12:44 UTC by Jan Hutař
Modified: 2019-09-12 08:25 UTC (History)
5 users (show)

Fixed In Version: spacewalk-java-2.3.8-104 satellite-schema-5.7.0.15 spacewalk-schema-2.3.2-17
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1305157 (view as bug list)
Environment:
Last Closed: 2015-06-01 13:05:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Helper script I'm using to clone channel or do some cleanup (4.88 KB, text/plain)
2015-04-27 12:49 UTC, Jan Hutař
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1305157 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Product Errata RHBA-2015:1040 0 normal SHIPPED_LIVE Red Hat Satellite bug fix update 2015-06-01 17:05:06 UTC

Internal Links: 1305157

Description Jan Hutař 2015-04-27 12:44:07 UTC
Description of problem:
Not all erratas get scheduled for systems with Auto Errata Update enabled when cloning these erratas via Errata -> Clone Errata


Version-Release number of selected component (if applicable):
satellite-schema-5.7.0.14-1.el6sat.noarch
spacewalk-java-2.3.8-102.el6sat.noarch
spacewalk-backend-2.3.3-25.el6sat.noarch
spacewalk-pxt-2.3.2-32.el6sat.noarch
(this is not a regression, seen this with packages before errata 2015:0887)


How reproducible:
often


Steps to Reproduce:
1. Create custom channel with 10 custom erratas (1 package per errata)
2. Register system to it and install 10 OLD packages, so erratas
   became applicable
3. Register that system 100 times
4. Clone the custom channel in original state (i.e. no erratas)
5. Subscribe all these 100 systems into new cloned channel
6. Make sure you see there are no applicable updates for all the systems
   and that none system have any errata update action/event scheduled
   and pending
7. Clone 10 erratas from original channel to the cloned one using
   Errata -> Clone Errata
   This involves a lot of clicking and if I did it through API, I did
   not reproduced:
   7.1. In Errata -> Manage Errata select 10 erratas and click [Clone]
   7.2. Select target channel (cloned one you have created in step "4.")
   7.3. Open each errata and click [Publish Errata], then select package
        to clone and confirm
8. Wait for both "Errata Cache:" and "Errata Notification Queue:" on
   Admin -> Task Engine Status to be "FINISHED" (should take below ~2 minutes,
   I would consider 5 minutes FAIL)
9. Check all the systems that they have:
   On main "Details" page:
     System Status: Software Updates Available Non-Critical: 10 Packages: 10
   On Events -> Pending:
     There should be 10 pending events


Actual results:
NOTE: These are default Taskomatic settings:

  errata-cache-default 	0 * * * * ?
  errata-queue-default 	0 * * * * ?

2 in 5 attempts I got incomplete set of errata update actions scheduled (once 9, once 8 of 10). I was not able to find smaller reproducer to see if publishing erratas or pushing errata packages through edge of minute is a culprit.

When I set:

  errata-cache-default 	0/60 * * * * ?
  errata-queue-default 	3/60 * * * * ?

2 of 2 attempts were OK.

With:

  errata-cache-default 	0/60 * * * * ?
  errata-queue-default 	57/60 * * * * ?

1 of one attempt failed (0 events out of 10 scheduled). Also got some traceback, but that might be caused by myself as I'm deleting cloned channel and cloned erratas before each retest.


Expected results:
I should get 10 errata update actions every-time


Additional info:
Issue discovered (and might be relevant) when working on bug 1205306

Comment 2 Jan Hutař 2015-04-27 12:49:58 UTC
Created attachment 1019321 [details]
Helper script I'm using to clone channel or do some cleanup

Setup:

$ ./bz1205306.py <user> <pass> <fqdn> parent-twxqq $( seq 1000010000 1000010099 | tr "\n" ',' | sed 's/,$//' )

Cleanup:

$ ./bz1205306.py <user> <pass> <fqdn> parent-twxqq $( seq 1000010000 1000010099 | tr "\n" ',' | sed 's/,$//' ) --cleanup-only

Comment 4 Stephen Herr 2015-04-27 18:45:53 UTC
I think that there are two root causes for this bad behavior.

One is that there are two ways of doing asynchronous errata cache updates: one is through taskomatic (normal), and the other is the way that the "Publish Erratum" process happens to use which is to schedule a java MessageQueue event inside of tomcat to be picked up later. The Errata Queue taskomatic task, which is what schedules auto-errata-updates, can only see things that are actually written to the database (like the queued errata-cache taskomatic tasks), and cannot see things that only exist in tomcat's MessageQueue. So when Errata Queue goes to schedule the auto-errata-update actions, it doesn't realize that there are errata cache changes pending and doesn't know it has to wait for them.

Problem 2 is that the Errata Queue task only currently waits for "update errata cache for entire channel" type Errata Cache actions. There are two other types of Errata Cache updates possible, "update errata cache for this server" and "update errata cache for this erratum". Publishing an erratum into a channel triggers the third type. So even we fixed problem #1 so that the pending task information is actually written to the database and visible across processes, Errata Queue still needs to be updated to pay attention to all types of Errata Cache updates.

I will attempt to make this process work properly in every case. Not scheduling auto-errata updates is a serious problem.

Comment 5 Stephen Herr 2015-04-29 19:57:34 UTC
I have decided to fix this problem by moving the auto-errata update scheduling out into its own taskomatic task that runs by default once every 10 minutes. This way we are no longer hindered by the run-once nature of Errata Queue that requires all the proper ducks to be lined up for the auto-errata scheduling to work correctly.

As its own Auto Errata task, it will just schedule auto errata updates as they become available. So whenever the Errata Cache taskomatic task or the java MessageQueue errata cache updates get around to finishing, the next time the Auto Errata taskomatic task runs it will schedule the auto errata update actions.

The logic that drives the new Auto Errata task is, schedule update actions for:
 * servers that have auto errata updates turned on
 * servers that have errata updates that apply to them (as specified by the errata cache, the thing that drives "X errata available" type information in the web ui)
 * don't schedule yet if the channel's yum metadata is still regenerating (as yum would not be able to find the rpms yet if it tried to update)
 * don't schedule if we have ever had another errata update action for this errata / server combination

Note the last bullet, which is equivalent to current behavior and necessary so that we don't get in schedule-update, fail-for-some-reason, schedule-update-again loops. If an update fails, the admin will need to fix whatever was wrong and then manually schedule the update again, it will only ever automatically be scheduled once.

Comment 7 Stephen Herr 2015-04-29 20:16:24 UTC
Committing to Spacewalk master:
d4c2840997c8fb3f8fdc670a2f36c53bf7eec9ae
8fa6aa316ec91877394cc3e0278536a238b24751

Spacewalk builds:
spacewalk-java-2.4.9-1
spacewalk-schema-2.4.5-1

Comment 16 errata-xmlrpc 2015-06-01 13:05:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1040.html


Note You need to log in before you can comment on or make changes to this bug.