Bug 1413745

Summary: Read timeout at beginning of CFME deployment
Product: Red Hat Quickstart Cloud Installer Reporter: James Olin Oden <joden>
Component: Installation - CloudFormsAssignee: John Matthews <jmatthew>
Status: CLOSED WORKSFORME QA Contact: Sudhir Mallamprabhakara <smallamp>
Severity: unspecified Docs Contact: Dan Macpherson <dmacpher>
Priority: unspecified    
Version: 1.1CC: bthurber, jmontleo, joden, qci-bugzillas
Target Milestone: ---Keywords: Triaged
Target Release: 1.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-17 20:23:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description James Olin Oden 2017-01-16 20:59:35 UTC
Description of problem:
Was doing an OSP(ceph) + CFME deployment.   It got past OSP fine, but when it got to CFME task, it hung at 0% for like 30 minutes and then went to 31.8% and errored out.   The task error was:

   read timeout reached

In the production.log the following error can be found:

   2017-01-16 15:34:22 [foreman-tasks/action] [E] read timeout reached (Excon::Errors::Timeout)

The deployment log had no errors but had this to say about the CFME deployment:

   I, [2017-01-16T15:07:58.408467 #19518]  INFO -- : Planning CFME Deployment 
   D, [2017-01-16T15:07:59.149979 #19518] DEBUG -- : ====== CFME Upload run method ======
   D, [2017-01-16T15:33:13.412775 #19518] DEBUG -- : ====== Leaving CFME Upload run method ======
   D, [2017-01-16T15:33:13.464958 #19518] DEBUG -- : ====== CFME Security Group run method ======
   D, [2017-01-16T15:33:19.212941 #19518] DEBUG -- : ====== Leaving CFME Security Group run method ======
   D, [2017-01-16T15:33:19.254404 #19518] DEBUG -- : ====== CFME Flavor run method ======
   D, [2017-01-16T15:33:21.507910 #19518] DEBUG -- : ====== Leaving CFME Flavor run method ======
   D, [2017-01-16T15:33:21.550262 #19518] DEBUG -- : ====== CFME Launch run method ======

Version-Release number of selected component (if applicable):
QCI-1.1-RHEL-7-20170112.t.0

How reproducible:
First occurrence.

Steps to Reproduce:
1.   Do OSP(ceph) + CFME deployment

Actual results:
At this one time I saw this error.

Expected results:
No errors

Additional info:

Here is the full stack trace from the production.log:

    2017-01-16 15:34:22 [foreman-tasks/action] [E] read timeout reached (Excon::Errors::Timeout)
     | /opt/theforeman/tfm/root/usr/share/gems/gems/excon-0.45.3/lib/excon/socket.rb:299:in `raise_timeout_error'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/excon-0.45.3/lib/excon/socket.rb:49:in `rescue in readline'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/excon-0.45.3/lib/excon/socket.rb:44:in `readline'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/excon-0.45.3/lib/excon/response.rb:53:in `parse'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/excon-0.45.3/lib/excon/middlewares/response_parser.rb:6:in `response_call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/docker-api-1.28.0/lib/excon/middlewares/hijack.rb:45:in `response_call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/excon-0.45.3/lib/excon/connection.rb:372:in `response'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/excon-0.45.3/lib/excon/connection.rb:236:in `request'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/fog-core-1.36.0/lib/fog/core/connection.rb:81:in `request'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/fog-openstack-0.1.3.5/lib/fog/openstack/compute.rb:359:in `request'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/fog-openstack-0.1.3.5/lib/fog/openstack/requests/compute/list_images_detail.rb:6:in `list_images_detail'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/fog-openstack-0.1.3.5/lib/fog/openstack/models/compute/images.rb:21:in `all'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/fog-core-1.36.0/lib/fog/core/collection.rb:113:in `lazy_load'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/fog-core-1.36.0/lib/fog/core/collection.rb:28:in `select'
     | /usr/share/foreman/app/models/compute_resources/foreman/model/openstack.rb:55:in `available_images'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/fusor_server-1.1.28/app/lib/actions/fusor/deployment/open_stack/cfme_launch.rb:58:in `create_image'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/fusor_server-1.1.28/app/lib/actions/fusor/deployment/open_stack/cfme_launch.rb:33:in `run'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:506:in `block (3 levels) in execute_run'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/stack.rb:26:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/stack.rb:26:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware.rb:17:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action/progress.rb:30:in `with_progress_calculation'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action/progress.rb:16:in `run'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/stack.rb:22:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/stack.rb:26:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware.rb:17:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.90/app/lib/actions/middleware/keep_locale.rb:11:in `block in run'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.90/app/lib/actions/middleware/keep_locale.rb:22:in `with_locale'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.90/app/lib/actions/middleware/keep_locale.rb:11:in `run'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/stack.rb:22:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/stack.rb:26:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware.rb:17:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware.rb:30:in `run'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/stack.rb:22:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/world.rb:30:in `execute'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:505:in `block (2 levels) in execute_run'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:504:in `catch'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:504:in `block in execute_run'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:419:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:419:in `block in with_error_handling'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:419:in `catch'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:419:in `with_error_handling'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:499:in `execute_run'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:260:in `execute'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:9:in `block (2 levels) in execute'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/execution_plan/steps/abstract.rb:155:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/execution_plan/steps/abstract.rb:155:in `with_meta_calculation'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:8:in `block in execute'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:22:in `open_action'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:7:in `execute'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/executors/parallel/worker.rb:15:in `block in on_message'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/algebrick-0.7.3/lib/algebrick/matchers/abstract.rb:74:in `block in assigns'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/algebrick-0.7.3/lib/algebrick/matchers/abstract.rb:73:in `tap'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/algebrick-0.7.3/lib/algebrick/matchers/abstract.rb:73:in `assigns'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/algebrick-0.7.3/lib/algebrick/matching.rb:56:in `match_value'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/algebrick-0.7.3/lib/algebrick/matching.rb:36:in `block in match?'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/algebrick-0.7.3/lib/algebrick/matching.rb:35:in `each'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/algebrick-0.7.3/lib/algebrick/matching.rb:35:in `match?'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/algebrick-0.7.3/lib/algebrick/matching.rb:23:in `match'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/executors/parallel/worker.rb:12:in `on_message'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/context.rb:46:in `on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/executes_context.rb:7:in `on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/actor.rb:26:in `on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/awaits.rb:15:in `on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/sets_results.rb:14:in `on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/buffer.rb:38:in `process_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/buffer.rb:31:in `process_envelopes?'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/buffer.rb:20:in `on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/termination.rb:55:in `on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/removes_child.rb:10:in `on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/behaviour/sets_results.rb:14:in `on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/core.rb:161:in `process_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/core.rb:95:in `block in on_envelope'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/core.rb:118:in `block (2 levels) in schedule_execution'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `block in synchronize'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.0/lib/concurrent/actor/core.rb:115:in `block in schedule_execution'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/serialized_execution.rb:18:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/serialized_execution.rb:18:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/serialized_execution.rb:96:in `work'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/serialized_execution.rb:77:in `block in call_job'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_thread_pool_executor.rb:333:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_thread_pool_executor.rb:333:in `run_task'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_thread_pool_executor.rb:322:in `block (3 levels) in create_worker'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_thread_pool_executor.rb:305:in `loop'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_thread_pool_executor.rb:305:in `block (2 levels) in create_worker'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_thread_pool_executor.rb:304:in `catch'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_thread_pool_executor.rb:304:in `block in create_worker'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `call'
     | /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'

Comment 3 Jason Montleon 2017-01-17 13:45:25 UTC
If the installation hangs the overcloud deployment can't communicate with the ceph storage cluster for some reason. I agree it's not ideal, but failure to connect doesn't cause the deployment to fail or images to outright fail uploading. It just blocks until the storage becomes available again.

Make sure you're using Ceph Storage  2.0, not Ceph 1.3 as we were using in OSP 8. They are not compatible. If I recall correctly Ceph Storage 1.3 packages will have a version of roughly 0.97 and Ceph Storage 2.0 packages will have a version of roughly 10.x

Also make sure no firewall rules on your host would prevent the overcloud hosts from communicating with the ceph host if your using a virt environment to test and try again.

Comment 6 James Olin Oden 2017-01-17 14:26:49 UTC
Concerning the version of ceph, here is what I have:

[vagrant@ceph ~]$ rpm -qa | grep ceph
ceph-selinux-10.2.3-17.el7cp.x86_64
ceph-deploy-1.5.36-20.el7cp.noarch
python-cephfs-10.2.3-17.el7cp.x86_64
ceph-base-10.2.3-17.el7cp.x86_64
ceph-mon-10.2.3-17.el7cp.x86_64
ceph-mds-10.2.3-17.el7cp.x86_64
libcephfs1-10.2.3-17.el7cp.x86_64
ceph-common-10.2.3-17.el7cp.x86_64
ceph-osd-10.2.3-17.el7cp.x86_64
ceph-radosgw-10.2.3-17.el7cp.x86_64

Comment 7 James Olin Oden 2017-01-17 14:30:29 UTC
And here are the firewall rules in question:

1340K 4168M ACCEPT     all  --  *      virbr9  0.0.0.0/0            192.168.175.0/24     ctstate RELATED,ESTABLISHED
3542K  227M ACCEPT     all  --  virbr9 *       192.168.175.0/24     0.0.0.0/0           

These basically allow connections on the 192.168.175.0/24 network, which is the provisioning network for the fusor and where the ceph server lives (it's under the provisioning range for handing out addresses via DHCP).

Comment 8 James Olin Oden 2017-01-17 14:35:39 UTC
My second run of the same deployment fails the same way.   So I guess this is reproducible.

Comment 9 Jason Montleon 2017-01-17 16:08:12 UTC
ceph status reports:
ceph status
    cluster bc5e37bc-3cf8-4823-a7c1-5109ddeefe63
     health HEALTH_ERR
     ...
            no osds

I'd guess this is why it is failing.

Mine reads:
[root@ceph20 ~]# ceph status
    cluster 42e66d84-53f0-4283-9d63-7077e1b566f4
     health HEALTH_OK
     ...
     osdmap e18: 3 osds: 3 up, 3 in


Please double check your changes to hostnames, etc. and review the output for errors when vagrant runs the setup.sh script.

Comment 10 Jason Montleon 2017-01-17 20:21:06 UTC
Set up a ceph host. started with:
/dev/vdb1                                                      1.0T  5.1G 1019G   1% /storage1
/dev/vdc1                                                      1.0T  5.1G 1019G   1% /storage2
/dev/vdd1                                                      1.0T  5.1G 1019G   1% /storage3

After the entire deployment was complete:
/dev/vdb1                                                      1.0T  9.1G 1015G   1% /storage1
/dev/vdc1                                                      1.0T  9.1G 1015G   1% /storage2
/dev/vdd1                                                      1.0T  9.1G 1015G   1% /storage3

[root@ceph20 ~]# ceph status
    cluster 42e66d84-53f0-4283-9d63-7077e1b566f4
     health HEALTH_OK
     monmap e1: 1 mons at {ceph20=192.168.240.254:6789/0}
            election epoch 3, quorum 0 ceph20
     osdmap e23: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v530: 124 pgs, 4 pools, 4083 MB data, 916 objects
            27756 MB used, 3043 GB / 3070 GB avail
                 124 active+clean

I will look at adding a validation so we can catch inaccessible and unhealthy clusters in 1.2.