Bug 1274282 - taskomatic cobbler-sync-bunch fails twice a day, bunch run takes about 6 minutes for 70 profiles in cobbler
taskomatic cobbler-sync-bunch fails twice a day, bunch run takes about 6 minu...
Status: CLOSED ERRATA
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Other (Show other bugs)
570
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Grant Gainey
Red Hat Satellite QA List
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-22 08:17 EDT by Jan Hutař
Modified: 2016-01-11 10:19 EST (History)
6 users (show)

See Also:
Fixed In Version: spacewalk-java-2.3.8-123-sat
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-11 10:19:43 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jan Hutař 2015-10-22 08:17:01 EDT
Description of problem:
Taskomatic cobbler-sync-bunch fails about twice a day. IMO real issue here is that bunch run takes about 6 minutes with 70 profiles in cobbler. 


Version-Release number of selected component (if applicable):
spacewalk-taskomatic-2.3.8-117.el6sat.noarch


How reproducible:
always


Steps to Reproduce:
1. Create profile
2. Wait for cobbler-sync-bunch to finish
3. Delete the profile
4. Watch Taskomatic and Cobbler log
   # tail -f /var/log/rhn/rhn_taskomatic_daemon.log /var/log/cobbler/cobbler.log


Actual results:
It takes about 6 minutes to run the bunch although only one profile was deleted


Expected results:
Should be quick. It seems suspicious to me, that lots of profiles gets updated


Additional info:
Discussed this with tlestach
Comment 5 Pavel Studeník 2015-10-22 09:00:03 EDT
I suppose that the bug 1267546 is same problem.
Comment 6 Jan Dobes 2015-11-12 12:26:00 EST
I noticed following:

spaceschema=# select count(*) from rhnKSData;
 count 
-------
    97
(1 row)

spaceschema=# select count(*) from rhnKSData where created != modified;
 count 
-------
     0
(1 row)
Comment 7 Jan Dobes 2015-11-12 13:10:41 EST
It's reproducible on Spacewalk nightly too. I see at least two problems there:

1. modified column in rhnKSData is never updated. When cobbler status changes since last sync, we start to sync profiles and because the timestamp of profile from cobbler will be always newer than value we store in table, we sync always all profiles we have.

2. The task will load local profiles a profile labels from cobbler. When we delete some profile, task will crash later when it tries to get this task from cobbler. And I think problem described in bug 1267546 is similar, we are trying to manipulate with loaded cobbler object which was deleted from database. I need to investigate more this part.
Comment 8 Jan Dobes 2015-12-04 10:27:06 EST
about the modified column:

In bunch java code we explicitly call setModified(new Date()) method, however in Hibernate mapping file is updating of this column disabled. And because no other change will happen on this table, neither trigger updating modified column will run.

1. When I enable updating of modified column in Hibernate mapping, all rows in rhnKSData have updated modified column by trigger but the value is same for all rows (time when bunch started) => not valid.

2. When I do 1. and then remove trigger, it finally works correctly but then no updates or inserts into rhnKSData table will update modified date.
Comment 9 Grant Gainey 2015-12-08 16:27:39 EST
The workflow for the CobblerSYncTask is essentially the following:

- wake up
- ask cobbler the last time *anything* changed - that's the last_modified_time() API call
- sync distributions if we need to
- update profiles' kickstart trees if we need to
- see if last_modified_time is more recent than the last time we executed
- if so - let the CobblerProfileSyncCommand figure out what actually changed, and sync whatever it was
- update write down the last-time-we-executed, and exit

Inside the CobblerProfileSyncCommand, we ask cobbler for all the profiles (get_profiles()), and compare each one's 'mtime' with the getModified() of its matching Hibernated DB entity. This is what gets broken by setModified() not ever being allowed to hit the DB, and it's why, if *anything* changes, we end up syncing *everything* - it's because everything's getModified() is always equal to its created(), instead of the last time we sync'd it.

This has been happening forever. The problem is, setModified() calls don't do what the author(s) of CobblerProfileSyncCommand thought they did, and never have. Generally, in Spacewalk, the MODIFIED column in a table exists to track when *any other columns in that row of the table* have changed. If we want a "LAST_SYNCD" column, we need to add and use it - rather than trying to rely on a subtle change to the semantics of MODIFIED. Otherwise, as noted by jdobes in #c8, we end up breaking the semantics of the db-trigger.

Investigation continues on the proper way to address this behavior.

The final issue (or the first, actually) is the actual failure reported. There is certainly a timing-window here - if we sync a lot of profiles, and one of them gets removed from cobbler while we're in the middle of the loop in CobblerProfileSyncCommand, it causes problems. I'm proposing that we deal with that by teaching the code to know about the possibility, catch and log the exception, and continue, rather than the current panic-abort behavior.
Comment 11 Grant Gainey 2015-12-09 14:36:37 EST
The "can't find this profile" is caught and reported from inside CobblerCommand.invokedXMLRPC(), where it logs a stacktrace and converts the XmlRpcFault into a RuntimeException

Rather than potentially destabilize every caller of cobbler, the fix will be to look for the RuntimeException, and IFF its getCause() is an XmlRpcFault, whose message includes 'unknown profile name', simply log that it happened and continue (rather than the current, abort-the-current-sync-run behavior)

This results in an rhn_taskomatic_daemon.log entry that looks something like so:

===
INFO   | jvm 1    | 2015/12/09 14:25:18 | 2015-12-09 14:25:18,446 [DefaultQuartzScheduler_Worker-1] ERROR com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand - Error calling cobbler.
INFO   | jvm 1    | 2015/12/09 14:25:18 | redstone.xmlrpc.XmlRpcFault: <class 'cobbler.cexceptions.CX'>:'internal error, unknown profile name f21-cloud-x86_64-profile-delete-again_15:1:SpacewalkDefaultOrganization'
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at redstone.xmlrpc.XmlRpcClient.handleResponse(XmlRpcClient.java:443)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at redstone.xmlrpc.XmlRpcClient.endCall(XmlRpcClient.java:376)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at redstone.xmlrpc.XmlRpcClient.invoke(XmlRpcClient.java:165)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at com.redhat.rhn.manager.kickstart.cobbler.CobblerXMLRPCHelper.invokeMethod(CobblerXMLRPCHelper.java:69)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand.invokeXMLRPC(CobblerCommand.java:100)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand.invokeXMLRPC(CobblerCommand.java:115)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at com.redhat.rhn.manager.kickstart.cobbler.CobblerProfileSyncCommand.syncProfileToSpacewalk(CobblerProfileSyncCommand.java:154)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at com.redhat.rhn.manager.kickstart.cobbler.CobblerProfileSyncCommand.store(CobblerProfileSyncCommand.java:109)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at com.redhat.rhn.taskomatic.task.CobblerSyncTask.execute(CobblerSyncTask.java:123)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at com.redhat.rhn.taskomatic.task.RhnJavaJob.execute(RhnJavaJob.java:89)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at com.redhat.rhn.taskomatic.TaskoJob.execute(TaskoJob.java:185)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
INFO   | jvm 1    | 2015/12/09 14:25:18 | 2015-12-09 14:25:18,448 [DefaultQuartzScheduler_Worker-1] ERROR com.redhat.rhn.manager.kickstart.cobbler.CobblerProfileSyncCommand - Cobbler doesn't know about this profile any more!
===

and the task continues.

spacewalk.github: 0460c4de81026cbd4dfeaf922bb52b1469825eb8
Comment 16 errata-xmlrpc 2016-01-11 10:19:43 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0023.html

Note You need to log in before you can comment on or make changes to this bug.