Red Hat Bugzilla – Bug 1274282
taskomatic cobbler-sync-bunch fails twice a day, bunch run takes about 6 minutes for 70 profiles in cobbler
Last modified: 2016-01-11 10:19:43 EST
Description of problem:
Taskomatic cobbler-sync-bunch fails about twice a day. IMO real issue here is that bunch run takes about 6 minutes with 70 profiles in cobbler.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create profile
2. Wait for cobbler-sync-bunch to finish
3. Delete the profile
4. Watch Taskomatic and Cobbler log
# tail -f /var/log/rhn/rhn_taskomatic_daemon.log /var/log/cobbler/cobbler.log
It takes about 6 minutes to run the bunch although only one profile was deleted
Should be quick. It seems suspicious to me, that lots of profiles gets updated
Discussed this with tlestach
I suppose that the bug 1267546 is same problem.
I noticed following:
spaceschema=# select count(*) from rhnKSData;
spaceschema=# select count(*) from rhnKSData where created != modified;
It's reproducible on Spacewalk nightly too. I see at least two problems there:
1. modified column in rhnKSData is never updated. When cobbler status changes since last sync, we start to sync profiles and because the timestamp of profile from cobbler will be always newer than value we store in table, we sync always all profiles we have.
2. The task will load local profiles a profile labels from cobbler. When we delete some profile, task will crash later when it tries to get this task from cobbler. And I think problem described in bug 1267546 is similar, we are trying to manipulate with loaded cobbler object which was deleted from database. I need to investigate more this part.
about the modified column:
In bunch java code we explicitly call setModified(new Date()) method, however in Hibernate mapping file is updating of this column disabled. And because no other change will happen on this table, neither trigger updating modified column will run.
1. When I enable updating of modified column in Hibernate mapping, all rows in rhnKSData have updated modified column by trigger but the value is same for all rows (time when bunch started) => not valid.
2. When I do 1. and then remove trigger, it finally works correctly but then no updates or inserts into rhnKSData table will update modified date.
The workflow for the CobblerSYncTask is essentially the following:
- wake up
- ask cobbler the last time *anything* changed - that's the last_modified_time() API call
- sync distributions if we need to
- update profiles' kickstart trees if we need to
- see if last_modified_time is more recent than the last time we executed
- if so - let the CobblerProfileSyncCommand figure out what actually changed, and sync whatever it was
- update write down the last-time-we-executed, and exit
Inside the CobblerProfileSyncCommand, we ask cobbler for all the profiles (get_profiles()), and compare each one's 'mtime' with the getModified() of its matching Hibernated DB entity. This is what gets broken by setModified() not ever being allowed to hit the DB, and it's why, if *anything* changes, we end up syncing *everything* - it's because everything's getModified() is always equal to its created(), instead of the last time we sync'd it.
This has been happening forever. The problem is, setModified() calls don't do what the author(s) of CobblerProfileSyncCommand thought they did, and never have. Generally, in Spacewalk, the MODIFIED column in a table exists to track when *any other columns in that row of the table* have changed. If we want a "LAST_SYNCD" column, we need to add and use it - rather than trying to rely on a subtle change to the semantics of MODIFIED. Otherwise, as noted by jdobes in #c8, we end up breaking the semantics of the db-trigger.
Investigation continues on the proper way to address this behavior.
The final issue (or the first, actually) is the actual failure reported. There is certainly a timing-window here - if we sync a lot of profiles, and one of them gets removed from cobbler while we're in the middle of the loop in CobblerProfileSyncCommand, it causes problems. I'm proposing that we deal with that by teaching the code to know about the possibility, catch and log the exception, and continue, rather than the current panic-abort behavior.
The "can't find this profile" is caught and reported from inside CobblerCommand.invokedXMLRPC(), where it logs a stacktrace and converts the XmlRpcFault into a RuntimeException
Rather than potentially destabilize every caller of cobbler, the fix will be to look for the RuntimeException, and IFF its getCause() is an XmlRpcFault, whose message includes 'unknown profile name', simply log that it happened and continue (rather than the current, abort-the-current-sync-run behavior)
This results in an rhn_taskomatic_daemon.log entry that looks something like so:
INFO | jvm 1 | 2015/12/09 14:25:18 | 2015-12-09 14:25:18,446 [DefaultQuartzScheduler_Worker-1] ERROR com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand - Error calling cobbler.
INFO | jvm 1 | 2015/12/09 14:25:18 | redstone.xmlrpc.XmlRpcFault: <class 'cobbler.cexceptions.CX'>:'internal error, unknown profile name f21-cloud-x86_64-profile-delete-again_15:1:SpacewalkDefaultOrganization'
INFO | jvm 1 | 2015/12/09 14:25:18 | at redstone.xmlrpc.XmlRpcClient.handleResponse(XmlRpcClient.java:443)
INFO | jvm 1 | 2015/12/09 14:25:18 | at redstone.xmlrpc.XmlRpcClient.endCall(XmlRpcClient.java:376)
INFO | jvm 1 | 2015/12/09 14:25:18 | at redstone.xmlrpc.XmlRpcClient.invoke(XmlRpcClient.java:165)
INFO | jvm 1 | 2015/12/09 14:25:18 | at com.redhat.rhn.manager.kickstart.cobbler.CobblerXMLRPCHelper.invokeMethod(CobblerXMLRPCHelper.java:69)
INFO | jvm 1 | 2015/12/09 14:25:18 | at com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand.invokeXMLRPC(CobblerCommand.java:100)
INFO | jvm 1 | 2015/12/09 14:25:18 | at com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand.invokeXMLRPC(CobblerCommand.java:115)
INFO | jvm 1 | 2015/12/09 14:25:18 | at com.redhat.rhn.manager.kickstart.cobbler.CobblerProfileSyncCommand.syncProfileToSpacewalk(CobblerProfileSyncCommand.java:154)
INFO | jvm 1 | 2015/12/09 14:25:18 | at com.redhat.rhn.manager.kickstart.cobbler.CobblerProfileSyncCommand.store(CobblerProfileSyncCommand.java:109)
INFO | jvm 1 | 2015/12/09 14:25:18 | at com.redhat.rhn.taskomatic.task.CobblerSyncTask.execute(CobblerSyncTask.java:123)
INFO | jvm 1 | 2015/12/09 14:25:18 | at com.redhat.rhn.taskomatic.task.RhnJavaJob.execute(RhnJavaJob.java:89)
INFO | jvm 1 | 2015/12/09 14:25:18 | at com.redhat.rhn.taskomatic.TaskoJob.execute(TaskoJob.java:185)
INFO | jvm 1 | 2015/12/09 14:25:18 | at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
INFO | jvm 1 | 2015/12/09 14:25:18 | at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
INFO | jvm 1 | 2015/12/09 14:25:18 | 2015-12-09 14:25:18,448 [DefaultQuartzScheduler_Worker-1] ERROR com.redhat.rhn.manager.kickstart.cobbler.CobblerProfileSyncCommand - Cobbler doesn't know about this profile any more!
and the task continues.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.