Bug 1388650

Summary: rhel-osp-director: 10 minor update fails: Error: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-injectkey-client.admin]: Could not evaluate: Cannot allocate memory - fork(2)
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: Sébastien Han <shan>
Status: CLOSED WORKSFORME QA Contact: Yogev Rabl <yrabl>
Severity: unspecified Docs Contact: Derek <dcadzow>
Priority: unspecified    
Version: 10.0 (Newton)CC: dbecker, jomurphy, mburns, morazi, rhel-osp-director-maint, sasha, seb
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-28 15:02:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Alexander Chuzhoy 2016-10-25 20:06:56 UTC
rhel-osp-director:  10 minor update fails: Error: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-injectkey-client.admin]: Could not evaluate: Cannot allocate memory - fork(2) 


Environment:
openstack-puppet-modules-9.3.0-0.20161003154825.8c758d6.el7ost.noarch
instack-undercloud-5.0.0-0.20161007201832.f044a47.el7ost.noarch
openstack-tripleo-heat-templates-5.0.0-0.20161008015357.0d3e3e3.1.el7ost.noarch

Steps to reproduce:
1. Deploy overcloud with:
openstack overcloud deploy --debug --templates --libvirt-type kvm --ntp-server clock.redhat.com --neutron-network-type vxlan --neutron-tunnel-types vxlan --control-scale 3 --control-flavor controller-d75f3dec-c770-5f88-9d4c-3fea1bf9c484 --compute-scale 1 --compute-flavor compute-b634c10a-570f-59ba-bdbf-0c313d745a10 --ceph-storage-scale 2 --ceph-storage-flavor ceph-cf1f074b-dadb-5eb8-9eb0-55828273fab7 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e virt/ceph.yaml -e virt/hostnames.yml -e virt/network/network-environment.yaml --log-file overcloud_deployment_48.log

2. Try to minor update the overcloud.

Result:
19:42:51 WAITING
19:42:51 completed: [u'ceph-1', u'ceph-0', u'controller-1', u'compute-0', u'controller-0']
19:42:51 on_breakpoint: [u'controller-2']
19:42:51 Breakpoint reached, continue? Regexp or Enter=proceed (will clear d10a116e-4a19-4ca1-81db-a4c3c6dbd2ce), no=cancel update, C-c=quit interactive mode: IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 IN_PROGRESS
19:42:51 FAILED
19:42:51 update finished with status FAILED




Debugging with heat:


[stack@undercloud-0 ~]$ heat resource-list -n5 overcloud|grep -v COMPLE
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
+-------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------+
| resource_name                             | physical_resource_id                                                            | resource_type                                                                                                       | resource_status | updated_time         | stack_name                                                                                                           |
+-------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------+
| AllNodesDeploySteps                       | ce7c19e7-a673-412b-aea8-4a0990fb4c45                                            | OS::TripleO::PostDeploySteps                                                                                        | UPDATE_FAILED   | 2016-10-25T23:32:46Z | overcloud                                                                                                            |
| ControllerDeployment_Step3                | 04e9abef-6b55-4b4f-b30b-ee2e101d6276                                            | OS::Heat::StructuredDeploymentGroup                                                                                 | UPDATE_FAILED   | 2016-10-25T23:39:02Z | overcloud-AllNodesDeploySteps-yp45mpb47zpv                                                                           |
| 0                                         | 882a504c-ca28-48ac-8d9b-ba098c1cb930                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2016-10-25T23:39:06Z | overcloud-AllNodesDeploySteps-yp45mpb47zpv-ControllerDeployment_Step3-2kgfzuq5uchv                                   |
+-------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------+





[stack@undercloud-0 ~]$ echo -e `heat deployment-show 882a504c-ca28-48ac-8d9b-ba098c1cb930`|grep -i error
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
Error: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-injectkey-client.admin]: Could not evaluate: Cannot allocate memory - fork(2)
Error: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-injectkey-client.openstack]: Could not evaluate: Cannot allocate memory - fork(2)
Error: /Stage[main]/Ceph::Keys/Ceph::Key[client.bootstrap-osd]/Exec[ceph-injectkey-client.bootstrap-osd]: Could not evaluate: Cannot allocate memory - fork(2)
Error: /Stage[main]/Pacemaker::Service/Service[pcsd]: Could not evaluate: Cannot allocate memory - fork(2)
Error: Could not prefetch mysql_database provider 'mysql': Cannot allocate memory - fork(2)
Error: Could not prefetch mysql_grant provider 'mysql': Cannot allocate memory - fork(2)
Error: /Stage[main]/Glance::Db::Sync/Exec[glance-manage db_sync]: Failed to call refresh: Cannot allocate memory - fork(2)
Error: /Stage[main]/Glance::Db::Sync/Exec[glance-manage db_sync]: Cannot allocate memory - fork(2)
Error: /Stage[main]/Glance::Registry/Service[glance-registry]: Failed to call refresh: Cannot allocate memory - fork(2)
Error: /Stage[main]/Glance::Registry/Service[glance-registry]: Cannot allocate memory - fork(2)
Error: /Stage[main]/Apache::Service/Service[httpd]: Failed to call refresh: Cannot allocate memory - fork(2)
Error: /Stage[main]/Apache::Service/Service[httpd]: Cannot allocate memory - fork(2)
Error: Could not prefetch keystone_tenant provider 'openstack': Cannot allocate memory - fork(2)
Error: Could not prefetch keystone_role provider 'openstack': Cannot allocate memory - fork(2)
Error: Could not prefetch keystone_service provider 'openstack': Cannot allocate memory - fork(2)
Error: Could not prefetch keystone_endpoint provider 'openstack': Cannot allocate memory - fork(2)
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Failed to call refresh: Cannot allocate memory - fork(2)
Error: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]: Cannot allocate memory - fork(2)
Error: /Stage[main]/Heat::Engine/Service[heat-engine]: Failed to call refresh: Cannot allocate memory - fork(2)
Error: /Stage[main]/Heat::Engine/Service[heat-engine]: Cannot allocate memory - fork(2)



[stack@undercloud-0 ~]$ free
              total        used        free      shared  buff/cache   available
Mem:       16267968     7818648      249568         560     8199752     8037412
Swap:             0           0    


compute-0.localdomain
              total        used        free      shared  buff/cache   available
Mem:        5946052      949076     3680696        1088     1316280     4642720
Swap:             0     
      0           0
ceph-1.localdomain
              total        used        free      shared  buff/cache   available
Mem:        3881936      330964     2854728         924      696244     3259596
Swap:             0           0           0

ceph-0.localdomain
              total        used        free      shared  buff/cache   available
Mem:        3881936      329280     2836900         964      715756     3260804
Swap:             0           0           0

controller-2.localdomain
              total        used        free      shared  buff/cache   available
Mem:        8010436     6868148      291464       40768      850824      718796
Swap:             0           0           0

controller-1.localdomain
              total        used        free      shared  buff/cache   available
Mem:        8010436     6876672      280536       40764      853228      709284
Swap:             0           0           0

controller-0.localdomain
              total        used        free      shared  buff/cache   available
Mem:        8010436     7102484      349084       56300      558868      510512
Swap:             0           0           0

Comment 3 seb 2017-02-21 19:41:31 UTC
@Alex are you keys correctly configured?

Comment 4 Alexander Chuzhoy 2017-03-10 14:21:12 UTC
@Seb,
I don't have access to that very setup anymore. This was an automated deployment job and I saw it succeeding too, so I'm guessing the deployment config was right.

Comment 5 Alexander Chuzhoy 2017-06-28 15:02:10 UTC
Didn't reproduce for some time - could be environmental. Will re-open if the issue reproduces.