Bug 1505207 - job page auto-fetch keeps sending requests even when they are taking a long time, DOS'ing Beaker
Summary: job page auto-fetch keeps sending requests even when they are taking a long t...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: general
Version: 24
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: 25.5
Assignee: Jacob McKenzie
QA Contact: Roman Joost
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-23 00:44 UTC by Dan Callaghan
Modified: 2018-07-23 06:40 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-07-23 06:40:51 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Beaker Project Gerrit 6165 0 None None None 2018-06-22 03:58:21 UTC

Description Dan Callaghan 2017-10-23 00:44:41 UTC
Version-Release number of selected component (if applicable):
24.4

How reproducible:
not very

Steps to Reproduce:
1. Open the job page for a running job containing a lot of recipes (for example, a Beaker self-test job)
Might only be applicable on beaker-devel, which is very memory constrained and often performs poorly.
2. Watch the Firefox network requests tool

Actual results:
The page sends an AJAX request to GET /jobs/<id> every 20 seconds.
The requests take longer than 20 seconds to come back
(in today's case they were taking 300 seconds because beaker-devel was struggling).
But the page keeps sending a new one every 20 seconds, so you end up with 20 in-flight requests all waiting.
Effectively this DOS'es Beaker.

Expected results:
Should not send any more requests while the first one is still waiting to come back.
In fact the 20 seconds between requests should really be 20 seconds *after* the request completes, not every 20 seconds, to be kind to the server.

Additional info:
Maybe need to check if we have a timeout on the requests. At some point we would want the request to fail (so we can try a new one) rather than the page get stuck waiting forever for the AJAX request which isn't going to come back.

Comment 2 Roman Joost 2018-07-04 00:51:51 UTC
I've tested this on beaker-devel by firing off a new self test: https://beaker-devel.app.eng.bos.redhat.com/jobs/14659. The old access logs were firing requests more rapidly:

next request          last request         delta (seconds)
2018-06-20 06:47:50 - 2018-06-20 06:47:32 = 18
2018-06-20 06:48:34 - 2018-06-20 06:47:50 = 44
2018-06-20 06:49:05 - 2018-06-20 06:48:34 = 31
2018-06-20 06:49:16 - 2018-06-20 06:49:05 = 11
2018-06-20 06:50:07 - 2018-06-20 06:49:16 = 51
2018-06-20 06:50:38 - 2018-06-20 06:50:07 = 31
2018-06-20 06:50:52 - 2018-06-20 06:50:38 = 14
2018-06-20 06:51:40 - 2018-06-20 06:50:52 = 48
2018-06-20 06:52:11 - 2018-06-20 06:51:40 = 31
2018-06-20 06:52:22 - 2018-06-20 06:52:11 = 11
2018-06-20 06:53:13 - 2018-06-20 06:52:22 = 51
2018-06-20 06:53:44 - 2018-06-20 06:53:13 = 31
2018-06-20 06:54:05 - 2018-06-20 06:53:44 = 21
2018-06-20 06:54:46 - 2018-06-20 06:54:05 = 41
2018-06-20 06:55:17 - 2018-06-20 06:54:46 = 31
2018-06-20 06:55:30 - 2018-06-20 06:55:17 = 13
2018-06-20 06:56:19 - 2018-06-20 06:55:30 = 49
2018-06-20 06:57:01 - 2018-06-20 06:56:19 = 42

The new implementation has a more even distribution:

2018-07-03 06:23:32 - 2018-07-03 06:22:24 = 68
2018-07-03 06:24:32 - 2018-07-03 06:23:32 = 60
2018-07-03 06:25:57 - 2018-07-03 06:24:32 = 85
2018-07-03 06:27:04 - 2018-07-03 06:25:57 = 67
2018-07-03 06:28:08 - 2018-07-03 06:27:04 = 64
2018-07-03 06:29:16 - 2018-07-03 06:28:08 = 68
2018-07-03 06:30:31 - 2018-07-03 06:29:16 = 75
2018-07-03 06:31:45 - 2018-07-03 06:30:31 = 74
2018-07-03 06:32:54 - 2018-07-03 06:31:45 = 69
2018-07-03 06:34:19 - 2018-07-03 06:32:54 = 85
2018-07-03 06:35:52 - 2018-07-03 06:34:19 = 93
2018-07-03 06:37:04 - 2018-07-03 06:35:52 = 72
2018-07-03 06:38:17 - 2018-07-03 06:37:04 = 73
2018-07-03 06:39:40 - 2018-07-03 06:38:17 = 83
2018-07-03 06:41:12 - 2018-07-03 06:39:40 = 92
2018-07-03 06:42:45 - 2018-07-03 06:41:12 = 93
2018-07-03 06:44:16 - 2018-07-03 06:42:45 = 91
2018-07-03 06:45:33 - 2018-07-03 06:44:16 = 77
2018-07-03 06:47:20 - 2018-07-03 06:45:33 = 107

That should reduce the load. Unfortunately, we're still pulling 1M with each request on JSON data, which doesn't solve the issue that there is still a potential risk of putting too much load on the server, but at least one contributor is fixed.

Comment 6 Dan Callaghan 2018-07-06 03:34:35 UTC
Just noticed the recipe page has exactly the same problem. Should we file a separate bug for that?

The impact of the recipe page auto-fetching should be much lower than the job page, since an individual recipe is usually pretty small, unless it has an unusually large number of tasks or results.

Comment 7 Roman Joost 2018-07-23 06:40:51 UTC
Released with Beaker 25.5: https://beaker-project.org/docs/whats-new/release-25.html#beaker-25-5


Note You need to log in before you can comment on or make changes to this bug.