Bug 1243611

Summary: Ceph osd build commands time out
Product: Red Hat OpenStack Reporter: Graeme Gillies <ggillies>
Component: openstack-puppet-modulesAssignee: Gilles Dubreuil <gdubreui>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: dnavale, emacchi, gdubreui, ggillies, jschluet, yeylon
Target Milestone: ga   
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-puppet-modules-7.0.3-1.el7ost Doc Type: Bug Fix
Doc Text:
Previously, there was no default time out, resulting in some stages of Ceph cluster set-up that look longer than the default 5 minutes (300 seconds). With this update, a time out parameter is added for relevant operations. The default time out parameter value is set at 600 seconds. You can modify the default value, if necessary. As a result, the installation is more resilient, especially when some of the Ceph setup operations take longer than average.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-07 21:02:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Graeme Gillies 2015-07-15 23:19:24 UTC
Sometimes when doing multinode deployments on slow hosts, or a lot of hosts at once, you see errors similar to the following

http://fpaste.org/244826/14370016/

Basically one of the exec stanzas in the ceph puppet module times out after 5 minutes. This causes the deployment to fail.

Changing this value to something higher (600 seconds) or disabling timeout (setting it to 0) causes the deployment to succeed.

Can we please review the timeout setting on this and all exec stanzas in the ceph module, to ensure they are sufficiently long enough for slower environments?

Regards,

Graeme

Comment 3 Gilles Dubreuil 2015-12-04 04:37:21 UTC
For the record, the provided link doesn't exist, maybe attach the errors output or provide a long standing paste.

That said the error and fix is straightforward.

Added a timeout value of 600 to all relevant exec in the puppet-ceph module (see external trackers).

Comment 5 Graeme Gillies 2016-01-03 23:20:23 UTC
Hi,

Apologies for not using a longer lived pastebin. Unfortunately I've been trying to reproduce the problem to give you an output which is useful, but at this stage have actually been unable to reproduce the problem at all.

You mention that you have modified all relevant execs in puppet, do you still need the output from when the problem persists? If so I'll keep trying to get a hold of it

Regards,

Graeme

Comment 7 Yogev Rabl 2016-02-03 13:13:32 UTC
verified installation of Ceph OSDs on multiple nodes with no time outs

openstack-puppet-modules-7.0.3-1.el7ost.noarch

Comment 9 errata-xmlrpc 2016-04-07 21:02:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html