Bug 1304878 - rhel-osp-director: Update overcloud 7.0->7.3 times out on controller UpdateDeployment
rhel-osp-director: Update overcloud 7.0->7.3 times out on controller UpdateD...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity unspecified
: y3
: 7.0 (Kilo)
Assigned To: Jiri Stransky
Alexander Chuzhoy
:
: 1304463 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-04 16:31 EST by Alexander Chuzhoy
Modified: 2016-12-29 11:59 EST (History)
13 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-115.el7ost
Doc Type: Bug Fix
Doc Text:
Heat used a 1MB payload size for returned output. YUM's output exceeded this limit if updating a high number of packages on the Overcloud, which caused 'openstack overcloud update' to fail. This fix adds the '-q' option to YUM during an Overcloud update. This option sets the output to quiet mode, which reduced the output. YUM's output no longer exceeds Heat's limit and 'openstack overcloud update' succeeds.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-18 11:52:35 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
os-collect-config journal log from Controller-1 (221.24 KB, application/x-gzip)
2016-02-04 16:49 EST, Zane Bitter
no flags Details
heat-engine.log (2.06 MB, application/x-gzip)
2016-02-04 17:42 EST, Zane Bitter
no flags Details
stdout from the UpdateDeployment (314.01 KB, text/plain)
2016-02-05 20:18 EST, Zane Bitter
no flags Details
stderr from the UpdateDeployment (698.44 KB, text/plain)
2016-02-05 20:19 EST, Zane Bitter
no flags Details

  None (edit)
Description Alexander Chuzhoy 2016-02-04 16:31:58 EST
rhel-osp-director: Update overcloud 7.0->7.3  times out on controller UpdateDeployment

Environment:
instack-undercloud-2.1.2-37.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-112.el7ost.noarch

Steps to reproduce:
1. Deploy overcloud 7.0 with:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 1   --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server x.x.x.x --timeout 90 -e network-environment.yaml
2. After populating the overcloud with some objects, update it with:
openstack overcloud update stack overcloud -i --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /usr/share/openstack-tripleo-heat-templates/en
vironments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/updates/update-from-vip.yaml  -e network-environment.yaml


Result:
ERROR: openstack ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1
Authentication required

 heat stack-show overcloud|grep reason
| stack_status_reason   | Timed out   




[stack@instack ~]$ heat resource-list -n5 overcloud|grep -v COMPLE
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status | updated_time         | parent_resource                             |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+
| Controller                                  | 773eac1c-84ec-4c89-8034-09d3d1ac5114          | OS::Heat::ResourceGroup                           | UPDATE_FAILED   | 2016-02-03T20:31:15Z |                                             |
| 1                                           | 902aa56c-7f82-4d76-9764-e8733e9476bf          | OS::TripleO::Controller                           | UPDATE_FAILED   | 2016-02-03T20:31:23Z | Controller                                  |
| 0                                           | ef5268eb-47c0-42c0-84e3-4e07fdc17c57          | OS::TripleO::Controller                           | UPDATE_FAILED   | 2016-02-03T20:31:43Z | Controller                                  |
| 2                                           | f4334bce-e0a6-4e14-b952-4399080d5f10          | OS::TripleO::Controller                           | UPDATE_FAILED   | 2016-02-03T20:32:32Z | Controller                                  |
| UpdateDeployment                            | b1358eea-5cec-47a7-bc3a-26f33dc6d307          | OS::Heat::SoftwareDeployment                      | UPDATE_FAILED   | 2016-02-03T20:34:32Z | 1                                           |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+




[stack@instack ~]$ heat resource-list 902aa56c-7f82-4d76-9764-e8733e9476bf
+--------------------------+--------------------------------------+-------------------------------------------------+-----------------+----------------------+
| resource_name            | physical_resource_id                 | resource_type                                   | resource_status | updated_time         |
+--------------------------+--------------------------------------+-------------------------------------------------+-----------------+----------------------+
| Controller               | 8d1fcdf9-b197-4d8c-8dd5-e97ae1fb4999 | OS::Nova::Server                                | CREATE_COMPLETE | 2016-02-03T17:14:50Z |
| ControllerDeployment     | 7b79aabf-96da-4436-b25a-5a246abbe196 | OS::TripleO::SoftwareDeployment                 | CREATE_COMPLETE | 2016-02-03T17:14:50Z |
| ControllerExtraConfigPre | e72c2ae8-65a3-4bc5-8385-e4bf1cf61750 | OS::TripleO::ControllerExtraConfigPre           | CREATE_COMPLETE | 2016-02-03T17:14:50Z |
| UpdateConfig             | 39fdf1e5-7cb3-4906-938a-85054bd64a97 | OS::TripleO::Tasks::PackageUpdate               | UPDATE_COMPLETE | 2016-02-03T20:31:50Z |
| NodeUserData             | 1c247ff1-a8f9-4c12-84e6-1c247efde2c8 | OS::TripleO::NodeUserData                       | UPDATE_COMPLETE | 2016-02-03T20:31:53Z |
| InternalApiPort          | c16bef3e-ec34-4094-8395-15251fa656f1 | OS::TripleO::Controller::Ports::InternalApiPort | UPDATE_COMPLETE | 2016-02-03T20:32:01Z |
| ExternalPort             | d81499b1-2585-4560-ae74-f73f99c9e9f4 | OS::TripleO::Controller::Ports::ExternalPort    | UPDATE_COMPLETE | 2016-02-03T20:32:14Z |
| StoragePort              | 9a0fe13d-0d9d-4887-b3a1-80e4d75fea85 | OS::TripleO::Controller::Ports::StoragePort     | UPDATE_COMPLETE | 2016-02-03T20:32:22Z |
| TenantPort               | 4c536c54-6301-4e26-8c46-8f637fb9e762 | OS::TripleO::Controller::Ports::TenantPort      | UPDATE_COMPLETE | 2016-02-03T20:32:30Z |
| StorageMgmtPort          | 305db570-ac9d-4165-b412-4a0418ca1c18 | OS::TripleO::Controller::Ports::StorageMgmtPort | UPDATE_COMPLETE | 2016-02-03T20:32:39Z |
| NetIpMap                 | 49e537c1-7fcc-4e6c-bb61-30bd879c9750 | OS::TripleO::Network::Ports::NetIpMap           | UPDATE_COMPLETE | 2016-02-03T20:32:52Z |
| NetIpSubnetMap           | 904a5d59-eb61-4b16-929d-7bfbb29f2fea | OS::TripleO::Network::Ports::NetIpMap           | UPDATE_COMPLETE | 2016-02-03T20:33:02Z |
| NetworkConfig            | 140c4610-5632-4a83-a563-fd0e8947cc01 | OS::TripleO::Controller::Net::SoftwareConfig    | UPDATE_COMPLETE | 2016-02-03T20:33:10Z |
| NetworkDeployment        | 4670f82e-d74f-4e5d-9d43-5007257a97f8 | OS::TripleO::SoftwareDeployment                 | UPDATE_COMPLETE | 2016-02-03T20:33:33Z |
| NodeTLSCAData            | 8057d408-76fd-476c-b54a-44f00348da31 | OS::TripleO::NodeTLSCAData                      | CREATE_COMPLETE | 2016-02-03T20:33:37Z |
| NodeTLSData              | 60f97e90-411e-4a79-9427-1c698af971e6 | OS::TripleO::NodeTLSData                        | CREATE_COMPLETE | 2016-02-03T20:33:44Z |
| ControllerConfig         | 813d1ac3-3a5c-45fe-9e8d-5407f25f33d2 | OS::Heat::StructuredConfig                      | CREATE_COMPLETE | 2016-02-03T20:34:00Z |
| UpdateDeployment         | b1358eea-5cec-47a7-bc3a-26f33dc6d307 | OS::Heat::SoftwareDeployment                    | UPDATE_FAILED   | 2016-02-03T20:34:32Z |
+--------------------------+--------------------------------------+-------------------------------------------------+-----------------+----------------------+
Comment 2 Zane Bitter 2016-02-04 16:49 EST
Created attachment 1121206 [details]
os-collect-config journal log from Controller-1

The only resources in the failed state are the Controller group, its constituent nested stacks, and the UpdateDeployment resource on Controller-1.

# heat stack-list -n | grep -v COMPLETE
+--------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------+----------------------+--------------------------------------+
| id                                   | stack_name                                                                                                    | stack_status    | creation_time        | parent                               |
+--------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------+----------------------+--------------------------------------+
| 17d43d0f-b2fb-4134-ab1f-ac8609d1307c | overcloud                                                                                                     | UPDATE_FAILED   | 2016-02-03T17:14:27Z | None                                 |
| 773eac1c-84ec-4c89-8034-09d3d1ac5114 | overcloud-Controller-keqvaphuzsnz                                                                             | UPDATE_FAILED   | 2016-02-03T17:14:46Z | 17d43d0f-b2fb-4134-ab1f-ac8609d1307c |
| 902aa56c-7f82-4d76-9764-e8733e9476bf | overcloud-Controller-keqvaphuzsnz-1-3336hvdojdsw                                                              | UPDATE_FAILED   | 2016-02-03T17:14:50Z | 773eac1c-84ec-4c89-8034-09d3d1ac5114 |
| ef5268eb-47c0-42c0-84e3-4e07fdc17c57 | overcloud-Controller-keqvaphuzsnz-0-wlu5pzdbkmfl                                                              | UPDATE_FAILED   | 2016-02-03T17:14:54Z | 773eac1c-84ec-4c89-8034-09d3d1ac5114 |
| f4334bce-e0a6-4e14-b952-4399080d5f10 | overcloud-Controller-keqvaphuzsnz-2-wu3su4rcs4uy                                                              | UPDATE_FAILED   | 2016-02-03T17:15:00Z | 773eac1c-84ec-4c89-8034-09d3d1ac5114 |
+--------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------+----------------------+--------------------------------------+

The nested stack resources all show "Timed Out" as their status_reason. The UpdateDeployment on Controller-1 shows "UPDATE aborted", and all of its attributes are null.

[root@instack ~]# heat resource-list 902aa56c-7f82-4d76-9764-e8733e9476bf
+--------------------------+--------------------------------------+-------------------------------------------------+-----------------+----------------------+
| resource_name            | physical_resource_id                 | resource_type                                   | resource_status | updated_time         |
+--------------------------+--------------------------------------+-------------------------------------------------+-----------------+----------------------+
| Controller               | 8d1fcdf9-b197-4d8c-8dd5-e97ae1fb4999 | OS::Nova::Server                                | CREATE_COMPLETE | 2016-02-03T17:14:50Z |
| ControllerDeployment     | 7b79aabf-96da-4436-b25a-5a246abbe196 | OS::TripleO::SoftwareDeployment                 | CREATE_COMPLETE | 2016-02-03T17:14:50Z |
| ControllerExtraConfigPre | e72c2ae8-65a3-4bc5-8385-e4bf1cf61750 | OS::TripleO::ControllerExtraConfigPre           | CREATE_COMPLETE | 2016-02-03T17:14:50Z |
| UpdateConfig             | 39fdf1e5-7cb3-4906-938a-85054bd64a97 | OS::TripleO::Tasks::PackageUpdate               | UPDATE_COMPLETE | 2016-02-03T20:31:50Z |
| NodeUserData             | 1c247ff1-a8f9-4c12-84e6-1c247efde2c8 | OS::TripleO::NodeUserData                       | UPDATE_COMPLETE | 2016-02-03T20:31:53Z |
| InternalApiPort          | c16bef3e-ec34-4094-8395-15251fa656f1 | OS::TripleO::Controller::Ports::InternalApiPort | UPDATE_COMPLETE | 2016-02-03T20:32:01Z |
| ExternalPort             | d81499b1-2585-4560-ae74-f73f99c9e9f4 | OS::TripleO::Controller::Ports::ExternalPort    | UPDATE_COMPLETE | 2016-02-03T20:32:14Z |
| StoragePort              | 9a0fe13d-0d9d-4887-b3a1-80e4d75fea85 | OS::TripleO::Controller::Ports::StoragePort     | UPDATE_COMPLETE | 2016-02-03T20:32:22Z |
| TenantPort               | 4c536c54-6301-4e26-8c46-8f637fb9e762 | OS::TripleO::Controller::Ports::TenantPort      | UPDATE_COMPLETE | 2016-02-03T20:32:30Z |
| StorageMgmtPort          | 305db570-ac9d-4165-b412-4a0418ca1c18 | OS::TripleO::Controller::Ports::StorageMgmtPort | UPDATE_COMPLETE | 2016-02-03T20:32:39Z |
| NetIpMap                 | 49e537c1-7fcc-4e6c-bb61-30bd879c9750 | OS::TripleO::Network::Ports::NetIpMap           | UPDATE_COMPLETE | 2016-02-03T20:32:52Z |
| NetIpSubnetMap           | 904a5d59-eb61-4b16-929d-7bfbb29f2fea | OS::TripleO::Network::Ports::NetIpMap           | UPDATE_COMPLETE | 2016-02-03T20:33:02Z |
| NetworkConfig            | 140c4610-5632-4a83-a563-fd0e8947cc01 | OS::TripleO::Controller::Net::SoftwareConfig    | UPDATE_COMPLETE | 2016-02-03T20:33:10Z |
| NetworkDeployment        | 4670f82e-d74f-4e5d-9d43-5007257a97f8 | OS::TripleO::SoftwareDeployment                 | UPDATE_COMPLETE | 2016-02-03T20:33:33Z |
| NodeTLSCAData            | 8057d408-76fd-476c-b54a-44f00348da31 | OS::TripleO::NodeTLSCAData                      | CREATE_COMPLETE | 2016-02-03T20:33:37Z |
| NodeTLSData              | 60f97e90-411e-4a79-9427-1c698af971e6 | OS::TripleO::NodeTLSData                        | CREATE_COMPLETE | 2016-02-03T20:33:44Z |
| ControllerConfig         | 813d1ac3-3a5c-45fe-9e8d-5407f25f33d2 | OS::Heat::StructuredConfig                      | CREATE_COMPLETE | 2016-02-03T20:34:00Z |
| UpdateDeployment         | b1358eea-5cec-47a7-bc3a-26f33dc6d307 | OS::Heat::SoftwareDeployment                    | UPDATE_FAILED   | 2016-02-03T20:34:32Z |
+--------------------------+--------------------------------------+-------------------------------------------------+-----------------+----------------------+

So it looks like the UpdateDeployment was just hanging, until the stack timed out.


The os-collect-config journal on Controller-1 shows something happening 34s after UpdateDeployment starts running (for convenience, I've output the journal timestamps in UTC):

Feb 03 20:35:06 overcloud-controller-1.localdomain os-collect-config[4478]: [2016-02-03 15:35:06,231] (os-refresh-config) [INFO] Starting phase pre-configure

and it completes successfully. However the UpdateDeployment is *not* one of those run:

Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: dib-run-parts Wed Feb 3 15:48:02 EST 2016 Running /usr/libexec/os-refresh-config/configure.d/55-heat-config
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,397] (heat-config) [WARNING] Skipping group os-apply-config with no hook script /var/lib/heat-config/hooks/os-apply-config
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,397] (heat-config) [WARNING] Skipping group os-apply-config with no hook script /var/lib/heat-config/hooks/os-apply-config
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,398] (heat-config) [WARNING] Skipping group os-apply-config with no hook script /var/lib/heat-config/hooks/os-apply-config
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,398] (heat-config) [WARNING] Skipping config 7e3d1f2e-e397-46e9-aa98-edb88d4b48ca, already deployed
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,398] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/7e3d1f2e-e397-46e9-aa98-edb88d4b48ca.json
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,398] (heat-config) [WARNING] Skipping group os-apply-config with no hook script /var/lib/heat-config/hooks/os-apply-config
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,398] (heat-config) [WARNING] Skipping group Heat::Ungrouped with no hook script /var/lib/heat-config/hooks/HeatUngrouped
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,398] (heat-config) [WARNING] Skipping config 02e762c0-1232-420d-a99c-668b1c4e7dfa, already deployed
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,399] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/02e762c0-1232-420d-a99c-668b1c4e7dfa.json
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,399] (heat-config) [WARNING] Skipping config 00fcec8b-bc50-41d6-b91d-f9aabf4a89ef, already deployed
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,399] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/00fcec8b-bc50-41d6-b91d-f9aabf4a89ef.json
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,399] (heat-config) [WARNING] Skipping config 6c7d0e51-43e3-4881-95da-a0fa87da0491, already deployed
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,399] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/6c7d0e51-43e3-4881-95da-a0fa87da0491.json
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,399] (heat-config) [WARNING] Skipping config 715037a4-d334-4762-947d-7b76bd2764db, already deployed
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,399] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/715037a4-d334-4762-947d-7b76bd2764db.json
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,400] (heat-config) [WARNING] Skipping config 038b27a0-0190-452a-a37d-d6cf35bdfdf6, already deployed
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,400] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/038b27a0-0190-452a-a37d-d6cf35bdfdf6.json
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,400] (heat-config) [WARNING] Skipping config 88d98fc1-3dd4-4078-9e96-e84ce081136c, already deployed
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,400] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/88d98fc1-3dd4-4078-9e96-e84ce081136c.json
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,400] (heat-config) [WARNING] Skipping config 48398659-a33a-4700-b2b2-3abf5d4aa231, already deployed
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,400] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/48398659-a33a-4700-b2b2-3abf5d4aa231.json
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,400] (heat-config) [WARNING] Skipping group os-apply-config with no hook script /var/lib/heat-config/hooks/os-apply-config
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,400] (heat-config) [WARNING] Skipping group os-apply-config with no hook script /var/lib/heat-config/hooks/os-apply-config
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: [2016-02-03 15:48:02,401] (heat-config) [WARNING] Skipping group Heat::Ungrouped with no hook script /var/lib/heat-config/hooks/HeatUngrouped
Feb 03 20:48:02 overcloud-controller-1.localdomain os-collect-config[26579]: dib-run-parts Wed Feb 3 15:48:02 EST 2016 55-heat-config completed

before and after this burst of activity (heavily edited here), it just logs stuff of the form:


Feb 03 20:35:06 overcloud-controller-1.localdomain os-collect-config[4478]: 2016-02-03 15:35:06.168 4478 WARNING os_collect_config.local [-] /var/lib/os-collect-config/local-data not found. Skipping
Feb 03 20:35:06 overcloud-controller-1.localdomain os-collect-config[4478]: 2016-02-03 15:35:06.169 4478 WARNING os_collect_config.local [-] No local metadata found (['/var/lib/os-collect-config/local-data'])

The full os-collect-config log is attached.
Comment 3 Zane Bitter 2016-02-04 17:42 EST
Created attachment 1121228 [details]
heat-engine.log

Nothing of interest in the heat-engine log.
Comment 4 Zane Bitter 2016-02-05 11:29:35 EST
It appears that os-collect-config is behaving normally. It sees a metadata update at the appropriate time, but presumably the metadata it retrieves does not contain the data for the UpdateDeployment deployment that it is supposed to run.

Previous bugs of this type have been caused by race conditions in various parts of the code:

https://bugs.launchpad.net/heat/+bug/1488366
https://bugs.launchpad.net/heat/+bug/1477329

Both of those are now fixed, but it's possible there is another race somewhere.

(Thanks to Thomas Hervé for this analysis.)
Comment 5 Zane Bitter 2016-02-05 16:35:08 EST
I raised an upstream bug to note the fact that there definitely *are* races in metadata writes even though we don't know yet how (or, indeed, if) they are being triggered in this case:

https://bugs.launchpad.net/heat/+bug/1542480

I also raised a bug (and submitted a patch) to stop us from ever recalculating metadata as a result of clearing a user hook, although that should not be the cause of this issue (since in this case the hook was set on a SoftwareDeployment resource, and signals to SoftwareDeployment resources were already exempt per bug 1255759):

https://bugs.launchpad.net/heat/+bug/1542455
Comment 6 Zane Bitter 2016-02-05 20:17:46 EST
OK, it turns out all of the speculation above was a red herring. os-collect-config was running fine in the background but not logging (as documented in bug 1272254) because it itself had been updated. (When the new version started up, it didn't feel the need to re-run the deployment because it had already been run, just not logged to the journal.)

The real problem is this, from the heat-api-cfn log on the undercloud (note: this was on a different setup, so the times don't match the ones above):

2016-02-05 13:08:25.079 25154 INFO eventlet.wsgi.server [req-3992ee77-eac5-4162-9538-68beb8c0bf38 b2144b17f14644ccbbd3da4c4f97e352 f56a0c07e08a4166b876bbb8076e3672] Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 454, in handle_one_response
    result = self.application(self.environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/heat/common/wsgi.py", line 410, in __call__
    response = req.get_response(self.application)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1296, in send
    application, catch_exc_info=False)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1260, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/osprofiler/web.py", line 99, in __call__
    return request.get_response(self.application)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1296, in send
    application, catch_exc_info=False)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1260, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 634, in __call__
    return self._call_app(env, start_response)
  File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 554, in _call_app
    return self._app(env, _fake_start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/heat/common/wsgi.py", line 410, in __call__
    response = req.get_response(self.application)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1296, in send
    application, catch_exc_info=False)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1260, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/site-packages/routes/middleware.py", line 131, in __call__
    response = self.app(environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/heat/common/wsgi.py", line 699, in __call__
    raise translate_exception(err, request.best_match_language())
RequestLimitExceeded: Request limit exceeded: JSON body size (1050539 bytes) exceeds maximum allowed size (1048576 bytes).
2016-02-05 13:08:25.079 25154 INFO eventlet.wsgi.server [req-3992ee77-eac5-4162-9538-68beb8c0bf38 b2144b17f14644ccbbd3da4c4f97e352 f56a0c07e08a4166b876bbb8076e3672] 192.0.2.8 - - [05/Feb/2016 13:08:25] "POST /v1/signal/arn%3Aopenstack%3Aheat%3A%3Af75923fe9f1d46008171167c4887ff5f%3Astacks%2Fovercloud-Controller-7dgfo27yepjy-1-zjuybvpl5jha%2F956959d9-e24b-4543-b0c3-e77a5b3133bf%2Fresources%2FUpdateDeployment?Timestamp=2016-02-04T20%3A25%3A30Z&SignatureMethod=HmacSHA256&AWSAccessKeyId=9a1a426c23d944df9fbe2b6aa78a32c5&SignatureVersion=2&Signature=GUOR5yyHPvTm8q1HWGap1rWNAcKg1lSBtjVFtpk%2By5s%3D HTTP/1.1" 500 3618 0.297531

The response from os-collect-config back to Heat is HUUUUUUGE - slightly over 1MB, which turns out to be the limit.

The surfeit of data is mostly the progress output of yum, so it depends entirely on how many packages there are to update (older installations are likely to have more trouble). That's kind of unfortunate because that data is nice to have for debugging when things go wrong - it would be a shame to have to turn it off by running "yum -q". A lot of the data appears to be duplicated between stdout and stderr for reasons that I have not yet dug into, but there is an opportunity there to save probably around a quarter to a third - which would buy us some temporary breathing room. We may or may not also be able to tweak the debug level to find a happy medium.
Comment 7 Zane Bitter 2016-02-05 20:18 EST
Created attachment 1121542 [details]
stdout from the UpdateDeployment
Comment 8 Zane Bitter 2016-02-05 20:19 EST
Created attachment 1121543 [details]
stderr from the UpdateDeployment
Comment 9 Steven Hardy 2016-02-06 03:33:37 EST
Thanks for the update Zane.  Since we're in the process of switching to Swift for metadata polling (ref https://review.openstack.org/#/c/257657/) I guess we can look at moving to swift for the signal transport too, which would probably address the payload limit issue.
Comment 11 Zane Bitter 2016-02-10 15:48:35 EST
I just noticed that https://review.openstack.org/#/c/188737/ was committed upstream as the fix for bug 1243884 (originally part of bug 1242396, before it was split into multiple bugs for different components), but never actually pulled in to the downstream openstack-heat-templates package. This would have prevented the issue.
Comment 13 Zane Bitter 2016-02-12 11:00:08 EST
*** Bug 1304463 has been marked as a duplicate of this bug. ***
Comment 18 Alexander Chuzhoy 2016-02-16 18:09:06 EST
Verified:
Environment: openstack-tripleo-heat-templates-0.8.6-120.el7ost.noarch
Successfully updated from 7.0 -> to 7.3.
Comment 20 errata-xmlrpc 2016-02-18 11:52:35 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0264.html
Comment 21 Zane Bitter 2016-06-16 07:12:45 EDT
*** Bug 1304463 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.