Bug 1368509 - beaker-transfer re-authenticates every 20 seconds, causing memory leak
Summary: beaker-transfer re-authenticates every 20 seconds, causing memory leak
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: lab controller
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: 23.2
Assignee: Dan Callaghan
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-19 15:32 UTC by drohwer
Modified: 2016-09-02 07:01 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-09-02 07:01:33 UTC
Embargoed:


Attachments (Terms of Use)

Description drohwer 2016-08-19 15:32:14 UTC
Description of problem:

At one of our GEOs we have a lab controller that seems to be consuming swap space more aggressively than other sites.

Amount of swap consumed at different sites by the beaker-transfer process:

Site 1: 134 MB 
Site 2 (our largest site): 2.5 MB 
Site 3: 83 MB
Site 4: 10 MB
Site 5 (where it is more aggressively taking swap): 200+ MB

Version-Release number of selected component (if applicable):

# rpm -qa |egrep beaker
beaker-lab-controller-23.1-1.el6eng.noarch
beaker-client-23.1-1.el6eng.noarch
beaker-common-23.1-1.el6eng.noarch
beaker-lab-controller-addDistro-23.1-1.el6eng.noarch
beaker-redhat-0.1.61-1.el6eng.noarch

How reproducible:

Seems to happen over a weeks worth of time according to the following:

Date           Amount of swap consumed
August 5th     200+MB
August 12th    System was rebooted for the routine maintenance
August 17th    295 MB -- Pid: 1867  (beaker-transfer) SUM: 302192 kB (295.109 MB)
August 19th    87 MB (did not need a restart but did b/c of weekend) - Pid: 20127  (beaker-transfer) SUM: 89160 kB (87.0703 MB)


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 matt jia 2016-08-22 01:41:29 UTC
(In reply to matt jia from comment #2)
> I can confirm that the repeating errors in the transfer.log is a beaker bug.
> The recipe.files() returns duplicate logs which is unexpected. See 

I have filed a bug for this.

   https://bugzilla.redhat.com/show_bug.cgi?id=1368868

Comment 6 Dan Callaghan 2016-08-23 03:31:18 UTC
So we have a few different things going on here...

As Matt noted the "OSError: [Errno 17] File exists" errors are due to duplicate identically named files in a recipe, which we have seen a few times before although it supposed to be impossible: bug 963492.

The problematic recipe ids are 483436 (from 2012!) and 839972 (from 2013). The side effect of that bug is that beaker-transfer goes into a busy loop continually trying to move the files for those recipes and failing -- however it will also move the files for all other recipes, so it does make progress in spite of the error.

We only have transfer logs going back a few days and the "File exists" error has been happening for those two recipes in a busy loop as far back as the logs go. It is entirely possible that the error has been repeating non-stop for the last 4 years(!) and nobody noticed -- or more likely, something other bug was previously causing those old recipes to be ignored by beaker-transfer, and it only recently started picking them up and has now hit the problem.

However the fact that we are also seeing the memory issues on a different LC (which isn't hitting the "File exists") error indicates that it's probably not the cause. So we can continue investigation and mitigation for that on bug 963492.

If we are seeing gradually increasing memory consumption from the beaker-transfer process it indicates we have some (unrelated) small memory leak, probably introduced in Beaker 23.0...

Comment 8 Dan Callaghan 2016-08-23 04:29:36 UTC
The other slightly suspicious thing is that in 23.0, every iteration says:

Aug 23 14:27:06 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO Creating new session...
Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO New session created.
Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.labcontroller.proxy DEBUG Polling for recipes to be transferred

which indicates that it's reauthenticating every 20 seconds -- but it shouldn't need to do that, the session is kept alive by each request unless something goes wrong.

Comment 9 matt jia 2016-08-23 04:56:54 UTC
(In reply to Dan Callaghan from comment #8)
> The other slightly suspicious thing is that in 23.0, every iteration says:
> 
> Aug 23 14:27:06 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO
> Creating new session...
> Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO New
> session created.
> Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.labcontroller.proxy DEBUG
> Polling for recipes to be transferred
> 
> which indicates that it's reauthenticating every 20 seconds -- but it
> shouldn't need to do that, the session is kept alive by each request unless
> something goes wrong.

Yes, that does not sound right to me. Previously we only renew session if the user is not login. I think this is a regression in

https://git.beaker-project.org/cgit/beaker/commit/?id=bf385dc54a8c12b03b82f4e056e058af47468faf

Comment 10 Dan Callaghan 2016-08-23 06:19:05 UTC
I wonder if the memory leak could be due to python-krbV bug 1312061 (AuthContext leaked due to missing decref), combined with the fact that we are now re-authenticating every 20 seconds due to the unintentional changes introduced in 23.0. It means we will be constructing many more AuthContext instances than before.

Still not sure if it's enough to cause seriously excessive memory consumption, it's only one leaked AuthContext instance (surely no more than a few KB?) every 20 seconds = 4320 leaked per day...

Comment 11 matt jia 2016-08-23 22:42:05 UTC
(In reply to Dan Callaghan from comment #8)
> The other slightly suspicious thing is that in 23.0, every iteration says:
> 
> Aug 23 14:27:06 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO
> Creating new session...
> Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.common.hub.HubProxy INFO New
> session created.
> Aug 23 14:27:07 lab beaker-transfer[28149]: bkr.labcontroller.proxy DEBUG
> Polling for recipes to be transferred
> 
> which indicates that it's reauthenticating every 20 seconds -- but it
> shouldn't need to do that, the session is kept alive by each request unless
> something goes wrong.

The weird thing is this does not happen in beaker-provision.

Comment 12 Dan Callaghan 2016-08-23 22:53:38 UTC
beaker-transfer calls _login on every polling iteration, the other two don't. Another reason why this mess needs cleaning up.

Comment 13 Dan Callaghan 2016-08-29 00:24:04 UTC
I am going to convert this bug into being about the excessive Kerberos logins every 20 seconds. Right now we have still not proven that that is actually causing the memory leakage we are seeing -- but regardless it's something we should fix, and it seems plausible. If I can get a core dump I can probably confirm either way.

If the memory leak turns out to be caused by something else we can open a new bug for it.

Comment 14 Dan Callaghan 2016-08-29 00:25:57 UTC
This is a regression in 23.0.

Comment 15 Dan Callaghan 2016-08-29 01:00:10 UTC
(In reply to Dan Callaghan from comment #12)
> beaker-transfer calls _login on every polling iteration, the other two
> don't. Another reason why this mess needs cleaning up.

Actually beaker-watchdog does too, beaker-provision is the odd one out. I think this matches what we are seeing (both beaker-transfer and beaker-watchdog show gradually increasing memory usage).

Comment 16 Dan Callaghan 2016-08-29 01:26:34 UTC
http://gerrit.beaker-project.org/5167 beaker-watchdog: re-authenticate if session has expired
http://gerrit.beaker-project.org/5168 beaker-transfer: re-authenticate if session has expired

Comment 19 Dan Callaghan 2016-09-02 07:01:33 UTC
Beaker 23.2 has been released.


Note You need to log in before you can comment on or make changes to this bug.