1147054 – graceful restarts with vhost front end cause corrupt configurations under load

Bug 1147054 - graceful restarts with vhost front end cause corrupt configurations under load

Summary: graceful restarts with vhost front end cause corrupt configurations under load

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	2.x
Assignee:	Rajat Chopra
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1145982 1148418 (view as bug list)
Depends On:	1146194
Blocks:	1148192 1155794
TreeView+	depends on / blocked

Reported:	2014-09-26 17:12 UTC by Adam Miller
Modified:	2015-05-14 23:37 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-02-18 16:51:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Adam Miller 2014-09-26 17:12:38 UTC

When the environment is under high load, there is a race condition where a httpd graceful restart will be called while the vhost configs are being modified when using the openshift-origin-frontend-apache-vhost which causes the httpd.worker thread to end in a bad state.

We followed up with the httpd team and they traced it down to effectively this:


[Information provided by the httpd dev team]
1. OpenShift edits httpd configuration and calls "httpd.worker -k graceful" to gracefully restart httpd to load new configuration. But for some reason, the confiuration supplied to httpd by OpenShift at the time "httpd -k graceful" is executed is broken. You can see this from the httpd error_log:

> [Tue Sep 23 23:48:02 2014] [notice] SIGUSR1 received.  Doing graceful restart
> httpd.worker: Syntax error on line 222 of /etc/httpd/conf/httpd.conf: Syntax
> error on line 75 of /etc/httpd/conf.d/000000_default.conf: Could not open
> configuration file /etc/httpd/conf.d/openshift/54223d5203ef640de1000981_nagiosmonitor_0_chkexsrv1.conf: No such file or directory

Note that this is also the first date/time when /var/log/messages msg appears:

> Sep 23 23:48:04 ex-std-node3 root: httpd -k graceful already running, perhaps force restart httpd

2. After the Syntax Error during graceful restart, httpd stops itself, so no "httpd.worker" process exists on the system. This is expected behaviour when Syntax Error happens because of invalid configuration files.

3. Next execution of "httpd.worker -k graceful" finds outs that httpd process does not exist and starts the new one. This is expected behaviour. This httpd process handles requests normally and you can see this process in ps output as "httpd.worker -k graceful".

Comment 1 Rajat Chopra 2014-09-26 22:31:09 UTC

Potential fix: https://github.com/openshift/origin-server/pull/5842
Fork ami for the above is being built at : https://ci.dev.openshift.redhat.com/jenkins/job/fork_ami/1256/

For QE, what to test: The application creation rate should not drop drastically because of the use of a common lock file.

Comment 2 Meng Bo 2014-09-28 06:51:28 UTC

I have tried on the fork_ami with 5 jobs to create and delete apps in parallel on m3.medium instance. All the creation and deletion get succeeded. And all the apps can be accessed.

And also do acceptance testing on the fork_ami, no regression issue found.

Move the bug to verified.

Comment 4 Rajat Chopra 2014-09-29 20:04:10 UTC

The pull request has been updated. 
https://github.com/openshift/origin-server/pull/5842
Both versions of the files have been updated now. Please re-test performance and regresssion.

Comment 5 Meng Bo 2014-09-30 10:13:27 UTC

Tested on devenv_5200, no regression issue found, and the app creation without many failures.

Verify the bug.

Comment 6 Jhon Honce 2014-10-01 18:17:44 UTC

*** Bug 1148418 has been marked as a duplicate of this bug. ***

Comment 7 Jhon Honce 2014-10-01 18:20:44 UTC

*** Bug 1145982 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.