Sometimes when doing multinode deployments on slow hosts, or a lot of hosts at once, you see errors similar to the following http://fpaste.org/244826/14370016/ Basically one of the exec stanzas in the ceph puppet module times out after 5 minutes. This causes the deployment to fail. Changing this value to something higher (600 seconds) or disabling timeout (setting it to 0) causes the deployment to succeed. Can we please review the timeout setting on this and all exec stanzas in the ceph module, to ensure they are sufficiently long enough for slower environments? Regards, Graeme
For the record, the provided link doesn't exist, maybe attach the errors output or provide a long standing paste. That said the error and fix is straightforward. Added a timeout value of 600 to all relevant exec in the puppet-ceph module (see external trackers).
Hi, Apologies for not using a longer lived pastebin. Unfortunately I've been trying to reproduce the problem to give you an output which is useful, but at this stage have actually been unable to reproduce the problem at all. You mention that you have modified all relevant execs in puppet, do you still need the output from when the problem persists? If so I'll keep trying to get a hold of it Regards, Graeme
verified installation of Ceph OSDs on multiple nodes with no time outs openstack-puppet-modules-7.0.3-1.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0603.html