Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 973717 - Increase v1 cart model timeout from 120s to 3600s
Increase v1 cart model timeout from 120s to 3600s
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers (Show other bugs)
2.2.0
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: chris alfonso
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-12 10:40 EDT by Andy Goldstein
Modified: 2017-03-08 12 EST (History)
7 users (show)

See Also:
Fixed In Version: rubygem-openshift-origin-node-1.9.14-1.2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-05 13:16:24 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to increase timeout in v1 cart model (872 bytes, patch)
2013-06-12 10:56 EDT, chris alfonso
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1138 normal SHIPPED_LIVE OpenShift Enterprise 1.2.1 bug fix and enhancement update 2013-08-05 17:14:54 EDT

  None (edit)
Description Andy Goldstein 2013-06-12 10:40:38 EDT
Some node operations may take longer than 120s, which is the hardcoded timeout at https://github.com/openshift/origin-server/blob/master/node/lib/openshift-origin-node/model/v1_cart_model.rb#L280. The v2 timeout appears to be 3600s (https://github.com/openshift/origin-server/blob/master/node/lib/openshift-origin-node/utils/shell_exec.rb#L83), if I found the corresponding/equivalent operation.

I'd recommend increasing the v1 cart model timeout to 3600s, as there are still some customers using v1 for now.
Comment 2 chris alfonso 2013-06-12 10:56:12 EDT
Created attachment 760202 [details]
patch to increase timeout in v1 cart model
Comment 3 Rob Millner 2013-06-12 12:52:32 EDT
There are three relevant timeouts in the system:
1. Mcollective terminates the thread managing operation: 400 seconds.

2. Broker considers an operation to have failed: 300 seconds (240 now?).

3. The oo_spawn/shellExec timeout.


The first two timeouts cause spawned processes (ex: cartridge hooks) to just continue, forgotten about.

A common failure is for the configure hook to take too long, for broker to start running destroy in response to the timeout, and for both configure and destroy to continue running at the same time causing half-removed gears to linger.

We have observed that git can deadlock and stay running, causing processes to accumulate on long running systems.

Only the third timeout terminates spawned processes (ex: hooks).  Whatever the timeout is for script execution, if it does not fire ahead of the broker or mcollective timeouts there is a risk of indeterminate results.

Also, it has been observed that the oo_spawn timeout does not always fire on time.
Comment 6 Gaoyun Pei 2013-07-16 02:07:55 EDT
Checked the related code set in v1_cart_model.rb on puddle 2013-07-12 :
    
    begin
        Timeout::timeout(3600) do
          while (line = stdout.gets)
            output << line
          end
        end

The timeout has been changed to 3600s, so verify this bug.
Comment 9 errata-xmlrpc 2013-08-05 13:16:24 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1138.html

Note You need to log in before you can comment on or make changes to this bug.