Description of problem: At one of our GEOs we have a lab controller that seems to be consuming swap space more aggressively than other sites. Amount of swap consumed at different sites by the beaker-transfer process: Site 1: 134 MB Site 2 (our largest site): 2.5 MB Site 3: 83 MB Site 4: 10 MB Site 5 (where it is more aggressively taking swap): 200+ MB Version-Release number of selected component (if applicable): # rpm -qa |egrep beaker beaker-lab-controller-23.1-1.el6eng.noarch beaker-client-23.1-1.el6eng.noarch beaker-common-23.1-1.el6eng.noarch beaker-lab-controller-addDistro-23.1-1.el6eng.noarch beaker-redhat-0.1.61-1.el6eng.noarch How reproducible: Seems to happen over a weeks worth of time according to the following: Date Amount of swap consumed August 5th 200+MB August 12th System was rebooted for the routine maintenance August 17th 295 MB -- Pid: 1867 (beaker-transfer) SUM: 302192 kB (295.109 MB) August 19th 87 MB (did not need a restart but did b/c of weekend) - Pid: 20127 (beaker-transfer) SUM: 89160 kB (87.0703 MB) Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
(In reply to matt jia from comment #2) > I can confirm that the repeating errors in the transfer.log is a beaker bug. > The recipe.files() returns duplicate logs which is unexpected. See I have filed a bug for this. https://bugzilla.redhat.com/show_bug.cgi?id=1368868
So we have a few different things going on here... As Matt noted the "OSError: [Errno 17] File exists" errors are due to duplicate identically named files in a recipe, which we have seen a few times before although it supposed to be impossible: bug 963492. The problematic recipe ids are 483436 (from 2012!) and 839972 (from 2013). The side effect of that bug is that beaker-transfer goes into a busy loop continually trying to move the files for those recipes and failing -- however it will also move the files for all other recipes, so it does make progress in spite of the error. We only have transfer logs going back a few days and the "File exists" error has been happening for those two recipes in a busy loop as far back as the logs go. It is entirely possible that the error has been repeating non-stop for the last 4 years(!) and nobody noticed -- or more likely, something other bug was previously causing those old recipes to be ignored by beaker-transfer, and it only recently started picking them up and has now hit the problem. However the fact that we are also seeing the memory issues on a different LC (which isn't hitting the "File exists") error indicates that it's probably not the cause. So we can continue investigation and mitigation for that on bug 963492. If we are seeing gradually increasing memory consumption from the beaker-transfer process it indicates we have some (unrelated) small memory leak, probably introduced in Beaker 23.0...
If it is indeed a regression in beaker-transfer 22.3->23.0 there are only a few possible patches which could be the culprit: https://git.beaker-project.org/cgit/beaker/commit/?id=429ca1d9effa7e000de2745846d1f6f8dcd5518e https://git.beaker-project.org/cgit/beaker/commit/?id=bf385dc54a8c12b03b82f4e056e058af47468faf https://git.beaker-project.org/cgit/beaker/commit/?id=beb159d257da6e06326f2b16048feb0dc9fb8bc2
The other slightly suspicious thing is that in 23.0, every iteration says: Aug 23 14:27:06 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO Creating new session... Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO New session created. Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.labcontroller.proxy DEBUG Polling for recipes to be transferred which indicates that it's reauthenticating every 20 seconds -- but it shouldn't need to do that, the session is kept alive by each request unless something goes wrong.
(In reply to Dan Callaghan from comment #8) > The other slightly suspicious thing is that in 23.0, every iteration says: > > Aug 23 14:27:06 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO > Creating new session... > Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO New > session created. > Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.labcontroller.proxy DEBUG > Polling for recipes to be transferred > > which indicates that it's reauthenticating every 20 seconds -- but it > shouldn't need to do that, the session is kept alive by each request unless > something goes wrong. Yes, that does not sound right to me. Previously we only renew session if the user is not login. I think this is a regression in https://git.beaker-project.org/cgit/beaker/commit/?id=bf385dc54a8c12b03b82f4e056e058af47468faf
I wonder if the memory leak could be due to python-krbV bug 1312061 (AuthContext leaked due to missing decref), combined with the fact that we are now re-authenticating every 20 seconds due to the unintentional changes introduced in 23.0. It means we will be constructing many more AuthContext instances than before. Still not sure if it's enough to cause seriously excessive memory consumption, it's only one leaked AuthContext instance (surely no more than a few KB?) every 20 seconds = 4320 leaked per day...
(In reply to Dan Callaghan from comment #8) > The other slightly suspicious thing is that in 23.0, every iteration says: > > Aug 23 14:27:06 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO > Creating new session... > Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO New > session created. > Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.labcontroller.proxy DEBUG > Polling for recipes to be transferred > > which indicates that it's reauthenticating every 20 seconds -- but it > shouldn't need to do that, the session is kept alive by each request unless > something goes wrong. The weird thing is this does not happen in beaker-provision.
beaker-transfer calls _login on every polling iteration, the other two don't. Another reason why this mess needs cleaning up.
I am going to convert this bug into being about the excessive Kerberos logins every 20 seconds. Right now we have still not proven that that is actually causing the memory leakage we are seeing -- but regardless it's something we should fix, and it seems plausible. If I can get a core dump I can probably confirm either way. If the memory leak turns out to be caused by something else we can open a new bug for it.
This is a regression in 23.0.
(In reply to Dan Callaghan from comment #12) > beaker-transfer calls _login on every polling iteration, the other two > don't. Another reason why this mess needs cleaning up. Actually beaker-watchdog does too, beaker-provision is the odd one out. I think this matches what we are seeing (both beaker-transfer and beaker-watchdog show gradually increasing memory usage).
http://gerrit.beaker-project.org/5167 beaker-watchdog: re-authenticate if session has expired http://gerrit.beaker-project.org/5168 beaker-transfer: re-authenticate if session has expired
Beaker 23.2 has been released.