Bug 964955 - Bugzilla application is unresponsive if read-write master is offline
Bugzilla application is unresponsive if read-write master is offline
Status: CLOSED WORKSFORME
Product: Bugzilla
Classification: Community
Component: Database (Show other bugs)
4.4
x86_64 Linux
high Severity high (vote)
: ---
: ---
Assigned To: PnT DevOps Devs
tools-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-20 04:05 EDT by Mark Keir
Modified: 2013-11-28 23:12 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-28 23:12:25 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mark Keir 2013-05-20 04:05:27 EDT
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Set "shutdownhtml" message to indicate application maintenance
2. Stop MySQL service on read-write master DB
3. Watch webserver process load go >100

Actual results:

Application server becomes unresponsive due to excessive load

Expected results:

With a shutdown message set, no DB calls should be made

Additional info:
Comment 1 Shirley Zhou 2013-05-21 07:06:07 EDT
Hi, Mark

Can you please help to verify this bug when its status become ON_QA? As QE has no permission to execute these steps.

Thanks,
Shirley
Comment 2 Mark Keir 2013-05-22 23:42:27 EDT
(In reply to Shirley Zhou from comment #1)
> Hi, Mark
> 
> Can you please help to verify this bug when its status become ON_QA? As QE
> has no permission to execute these steps.
> 
> Thanks,
> Shirley

OK.
You could reproduce this though.

The system behaviour is observed on the webserver whenever the database server cannot be contacted.

Using "top" on the webserver, watch the process list when connections are being made to Bugzilla in normal circumstances.  You should see that processes are generally short lived and there are only a few instances of CGI processes in the process list.

[Optional] set shutdownhtml='Maintenance' in /var/www/html/bugzilla/data/params

Stop the database server configured in /var/www/html/bugzilla/localconfig.

Again watch 'top' while connections to Bugzilla are made via web sessions and XMLRPC calls.  You will see that the processes hang around for a long time and that the number of CGI processes grows, as does the load on the system.
Comment 5 Simon Green 2013-11-28 01:58:52 EST
I have been unable to reproduce this. With MySQL shut down, a page loads < 1 second for regardless of whether the person has a cookie or not. I've even tried to point my db config to a host in the US (my VPS), thinking maybe localhost was to close, and got the same result.
Comment 6 Jason McDonald 2013-11-28 23:12:25 EST
I've spent a couple of hours trying to reproduce this issue on a VM, without success.

I tested by running 50 parallel instances of a buglist.cgi query for all open bugs in the Bugzilla product and 50 parallel instances of a show_bug.cgi query.

I ran these queries three times, once with the system running normally, once with Bugzilla shutdown, and once with Bugzilla shutdown and the mysqld service stopped.

On my low-powered VM, these queries cause a significant spike in system load in the first case and a smaller but still significant spike in the second and third cases (as apache still has to spawn processes to handle the requests when Bugzilla is in shutdown mode).

There was little difference in system load for the second and third cases, and the cgi jobs appear to be relatively shortlived, as expected, and certainly shorter lived than in the first case.

Note that my VM does not employ separate RW master and RO slave databases, though Simon believes that shouldn't be a factor in this bug.

It's also perhaps worth noting that the original report corresponds to the time of the Bugzilla 4.2 -> 4.4 upgrade, where the RW master database server failed.

As neither Simon nor myself can reproduce this bug, I'm inclined to close the report for now.  If the problem is observed again, it would be useful to capture the relevant section of the apache access logs to allow analysis of the particular requests that were occuring at the time (in case it's only certain requests that misbehave when shutdown) and the possibility that some downstream system reacted badly to the shutdown by retrying requests at an unreasonable frequency.

Note You need to log in before you can comment on or make changes to this bug.