Bug 1396862 - rhel-osp-director: OSP10 minor update fails: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-check-fsid-mismatch-/dev/vdb]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements
Summary: rhel-osp-director: OSP10 minor update fails: Error: /Stage[main]/Ceph::Osds...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: 10.0 (Newton)
Assignee: Giulio Fidente
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-21 00:27 UTC by Alexander Chuzhoy
Modified: 2016-12-29 16:55 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-common-5.4.0-3.el7ost python-tripleoclient-5.4.0-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 16:34:09 UTC


Attachments (Terms of Use)
list_nodes_status output is clean (2.10 KB, text/plain)
2016-11-21 08:00 UTC, Marios Andreou
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 394195 None None None 2016-11-23 12:20:25 UTC
OpenStack gerrit 400676 None None None 2016-11-22 10:53:12 UTC
OpenStack gerrit 400677 None None None 2016-11-22 10:53:31 UTC
OpenStack gerrit 401267 None None None 2016-11-23 14:08:53 UTC
OpenStack gerrit 401999 None None None 2016-11-24 12:05:57 UTC
Launchpad 1643701 None None None 2016-11-21 21:51:21 UTC

Description Alexander Chuzhoy 2016-11-21 00:27:34 UTC
rhel-osp-director:   OSP10 minor update fails: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-check-fsid-mismatch-/dev/vdb]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements

Environment:
openstack-puppet-modules-9.3.0-1.el7ost.noarch
openstack-tripleo-heat-templates-5.1.0-3.el7ost.noarch
instack-undercloud-5.1.0-2.el7ost.noarch

Steps to reproduce:

1. Deploy with 
openstack overcloud deploy --templates --libvirt-type kvm --ntp-server clock.redhat.com --neutron-network-type vxlan --neutron-tunnel-types vxlan --control-scale 3 --control-flavor controller-d75f3dec-c770-5f88-9d4c-3fea1bf9c484 --compute-scale 2 --compute-flavor compute-b634c10a-570f-59ba-bdbf-0c313d745a10 --ceph-storage-scale 2 --ceph-storage-flavor ceph-cf1f074b-dadb-5eb8-9eb0-55828273fab7 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /home/stack/virt/ceph.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/virt/network/network-environment.yaml -e /home/stack/virt/enable-tls.yaml -e /home/stack/virt/inject-trust-anchor.yaml -e /home/stack/virt/public_vip.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml -e /home/stack/virt/hostnames.yml -e /home/stack/virt/debug.yaml --log-file overcloud_deployment_92.log


2. Try to minor update.
Result:
Minor update fails:
Error: /bin/true # comment to satisfy puppet syntax requirements
set -ex
test 7999531e-af4d-11e6-919f-525400c4d45c = $(ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}')
 returned 1 instead of one of [0]




[stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+---------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks            |
+--------------------------------------+--------------+--------+------------+-------------+---------------------+
| 40a91cd8-f5e7-4049-becb-471d1d980806 | ceph-0       | ACTIVE | -          | Running     | ctlplane=192.0.2.10 |
| 86f79df0-bb0c-452b-bf22-3687f6d067a7 | ceph-1       | ACTIVE | -          | Running     | ctlplane=192.0.2.8  |
| 69c1d897-e779-4705-914c-ee8f70915f67 | compute-0    | ACTIVE | -          | Running     | ctlplane=192.0.2.15 |
| c0931d88-c6bb-47bf-baa1-267deafdc837 | compute-1    | ACTIVE | -          | Running     | ctlplane=192.0.2.18 |
| 90299c54-8b98-439c-883f-baedc147fcdc | controller-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.14 |
| 9abdd570-83cd-410a-831a-e617a91b1e7e | controller-1 | ACTIVE | -          | Running     | ctlplane=192.0.2.11 |
| 3c23b605-2029-49ad-b7c3-476b349e3591 | controller-2 | ACTIVE | -          | Running     | ctlplane=192.0.2.13 |
+--------------------------------------+--------------+--------+------------+-------------+---------------------+



Output from heat resource-list -n5 overcloud:
-----------------------------------------------------------------------------------------+-----------------+----------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| AllNodesDeploySteps                          | 4c5bafb6-16f1-4a43-9451-4a995041d130                                            | OS::TripleO::PostDeploySteps                                                                                        | UPDATE_FAILED   | 2016-11-20T19:59:14Z | overcloud                                                                                                                             |
| 0                                            | 00d5a359-c30a-451a-b8ed-2e6837208502                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2016-11-20T20:03:06Z | overcloud-AllNodesDeploySteps-kbbb2iafq4g2-CephStorageDeployment_Step3-5vutgylgszxn                                                   |
| 1                                            | 609d2144-4904-45c2-a7e8-418f3687973e                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2016-11-20T20:03:06Z | overcloud-AllNodesDeploySteps-kbbb2iafq4g2-CephStorageDeployment_Step3-5vutgylgszxn                                                   |
| CephStorageDeployment_Step3                  | 692dafda-49b4-47a5-a33a-c81f2fc56975                                            | OS::Heat::StructuredDeploymentGroup                                                                                 | UPDATE_FAILED   | 2016-11-20T20:03:06Z | overcloud-AllNodesDeploySteps-kbbb2iafq4g2                                                                                            |
| ComputeDeployment_Step3                      | f9042a88-1760-4d5c-9dc0-4b5301fcfffd                                            | OS::Heat::StructuredDeploymentGroup                                                                                 | UPDATE_FAILED   | 2016-11-20T20:03:06Z | overcloud-AllNodesDeploySteps-kbbb2iafq4g2                                                                                            |
| ControllerDeployment_Step3                   | d06f758e-257f-475a-bac5-ad798785376d                                            | OS::Heat::StructuredDeploymentGroup                                                                                 | UPDATE_FAILED   | 2016-11-20T20:03:06Z | overcloud-AllNodesDeploySteps-kbbb2iafq4g2                                                                                            |
| 0                                            | 5ad91b8b-dd4b-4f8a-b150-ebc93c37e034                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2016-11-20T20:03:07Z | overcloud-AllNodesDeploySteps-kbbb2iafq4g2-ControllerDeployment_Step3-2g3kxsqmjqdx                                                    |
| 1                                            | 0ee92f64-ba07-4dbe-982a-bab5a7e57649                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2016-11-20T20:03:07Z | overcloud-AllNodesDeploySteps-kbbb2iafq4g2-ControllerDeployment_Step3-2g3kxsqmjqdx                                                    |
| 2                                            | cad1e3a0-43e8-41db-83be-e6f609dc9a22                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2016-11-20T20:03:07Z | overcloud-AllNodesDeploySteps-kbbb2iafq4g2-ControllerDeployment_Step3-2g3kxsqmjqdx                                                    |
+----------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+---------------------------------------------------------------------------------------------------------------------------------------+





[stack@undercloud-0 ~]$ echo -e `heat deployment-show 00d5a359-c30a-451a-b8ed-2e6837208502`
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{ "status": "FAILED", "server_id": "40a91cd8-f5e7-4049-becb-471d1d980806", "config_id": "89786819-3ca0-4188-b8dc-49404d2fc4cc", "output_values": { "deploy_stdout": "Matching apachectl 'Server version: Apache/2.4.6 (Red Hat Enterprise Linux)
Server built: Aug 3 2016 08:33:27'
Notice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.
Notice: Compiled catalog for ceph-0.localdomain in environment production in 2.33 seconds
Notice: /Stage[setup]/Tripleo::Packages::Upgrades/Exec[package-upgrade]/returns: executed successfully
Notice: /Stage[main]/Ceph/Ceph_config[global/fsid]/value: value changed 'b06a22f0-af4e-11e6-8e5b-525400c4d45c' to '7999531e-af4d-11e6-919f-525400c4d45c'
Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-check-fsid-mismatch-/dev/vdb]/returns: ++ ceph-disk list /dev/vdb
Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-check-fsid-mismatch-/dev/vdb]/returns: ++ egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}'
Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-check-fsid-mismatch-/dev/vdb]/returns: + test 7999531e-af4d-11e6-919f-525400c4d45c = b06a22f0-af4e-11e6-8e5b-525400c4d45c
Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-prepare-/dev/vdb]: Dependency Exec[ceph-osd-check-fsid-mismatch-/dev/vdb] has failures: true
Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[fcontext_/dev/vdb]: Dependency Exec[ceph-osd-check-fsid-mismatch-/dev/vdb] has failures: true
Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-activate-/dev/vdb]: Dependency Exec[ceph-osd-check-fsid-mismatch-/dev/vdb] has failures: true
Notice: /Firewall[998 log all]: Dependency Exec[ceph-osd-check-fsid-mismatch-/dev/vdb] has failures: true
Notice: /Firewall[999 drop all]: Dependency Exec[ceph-osd-check-fsid-mismatch-/dev/vdb] has failures: true
Notice: Finished catalog run in 2.98 seconds
", "deploy_stderr": "exception: connect failed
Error: /bin/true # comment to satisfy puppet syntax requirements
set -ex
test 7999531e-af4d-11e6-919f-525400c4d45c = $(ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}')
 returned 1 instead of one of [0]
Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-check-fsid-mismatch-/dev/vdb]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements
set -ex
test 7999531e-af4d-11e6-919f-525400c4d45c = $(ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}')
 returned 1 instead of one of [0]
Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-prepare-/dev/vdb]: Skipping because of failed dependencies
Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[fcontext_/dev/vdb]: Skipping because of failed dependencies
Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-activate-/dev/vdb]: Skipping because of failed dependencies
Warning: /Firewall[998 log all]: Skipping because of failed dependencies
Warning: /Firewall[999 drop all]: Skipping because of failed dependencies
", "deploy_status_code": 6 }, "creation_time": "2016-11-20T18:39:51Z", "updated_time": "2016-11-20T20:03:54Z", "input_values": { "step": 3, "update_identifier": "1479669528" }, "action": "UPDATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6", "id": "00d5a359-c30a-451a-b8ed-2e6837208502" }

Comment 1 Alexander Chuzhoy 2016-11-21 00:43:56 UTC
The issue reproduces.

Comment 3 Alexander Chuzhoy 2016-11-21 00:54:24 UTC
Note:

Error: /bin/true # comment to satisfy puppet syntax requirements
set -ex
test d55f136a-af51-11e6-a025-52540037ab2f = $(ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}')
 returned 1 instead of one of [0]
Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-check-fsid-mismatch-/dev/vdb]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements
set -ex
test d55f136a-af51-11e6-a025-52540037ab2f = $(ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}')
 returned 1 instead of one of [0]
Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-prepare-/dev/vdb]: Skipping because of failed dependencies
Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[fcontext_/dev/vdb]: Skipping because of failed dependencies
Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/vdb]/Exec[ceph-osd-activate-/dev/vdb]: Skipping because of failed dependencies
Warning: /Firewall[998 log all]: Skipping because of failed dependencies
Warning: /Firewall[999 drop all]: Skipping because of failed dependencies
", "deploy_status_code": 6 }, "creation_time": "2016-11-20T19:10:10Z", "updated_time": "2016-11-20T20:35:54Z", "input_values": { "step": 3, "update_identifier": "1479671336" }, "action": "UPDATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6", "id": "f1259522-9f1d-4877-9282-293dc34082f0" }
[stack@undercloud-0 ~]$ . stackrc 
[stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+---------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks            |
+--------------------------------------+--------------+--------+------------+-------------+---------------------+
| 55bb8a26-c630-4a19-9df5-f879e88b1bfa | ceph-0       | ACTIVE | -          | Running     | ctlplane=192.0.2.19 |
| 4eb094a0-f1af-4015-a03d-3b5ba8ad964e | ceph-1       | ACTIVE | -          | Running     | ctlplane=192.0.2.17 |
| 18d7535b-657b-4a83-b61b-2f5a0cdb8a1a | compute-0    | ACTIVE | -          | Running     | ctlplane=192.0.2.16 |
| 60a02b5d-1f42-417d-85a1-412680e82b12 | compute-1    | ACTIVE | -          | Running     | ctlplane=192.0.2.9  |
| 405555af-2a9c-41d5-a409-416b323d43e4 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.6  |
| 4a0054b3-fb80-4d55-97b4-7ee9de7be7a3 | controller-1 | ACTIVE | -          | Running     | ctlplane=192.0.2.11 |
| 28b2e757-aa15-4354-858b-d38867d1093c | controller-2 | ACTIVE | -          | Running     | ctlplane=192.0.2.18 |
+--------------------------------------+--------------+--------+------------+-------------+---------------------+
[stack@undercloud-0 ~]$ ssh heat-admin@192.0.2.19
[heat-admin@ceph-0 ~]$ sudo -i
[root@ceph-0 ~]# ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}')
-bash: syntax error near unexpected token `)'
[root@ceph-0 ~]# ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}'
fbf4df4a-af52-11e6-a92e-52540037ab2f
[root@ceph-0 ~]# ceph-disk list /dev/vdb
/dev/vdb :
 /dev/vdb2 ceph journal, for /dev/vdb1
 /dev/vdb1 ceph data, active, unknown cluster fbf4df4a-af52-11e6-a92e-52540037ab2f, osd.1, journal /dev/vdb2
[root@ceph-0 ~]# ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}'
fbf4df4a-af52-11e6-a92e-52540037ab2f
[root@ceph-0 ~]# logout
[heat-admin@ceph-0 ~]$ logout
Connection to 192.0.2.19 closed.
[stack@undercloud-0 ~]$ ssh heat-admin@192.0.2.17
[heat-admin@ceph-1 ~]$ sudo -i
[root@ceph-1 ~]# 
[root@ceph-1 ~]# ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}'
fbf4df4a-af52-11e6-a92e-52540037ab2f

Comment 4 Marios Andreou 2016-11-21 08:00:45 UTC
Created attachment 1222316 [details]
list_nodes_status output is clean

Comment 5 Marios Andreou 2016-11-21 08:01:52 UTC
Comment on attachment 1222316 [details]
list_nodes_status output is clean

sorry this was meant for a different BZ

Comment 6 Marios Andreou 2016-11-21 08:32:40 UTC
Sorry about the attachment foo here (it was intended for another BZ). Sasha said this env was unfortunately wiped so he will be recreating this when he comes in today. @needinfo sasha please confirm with an env so we can hand over to the storage team.

Going to assign @gfidente for now (Giulio grateful for any thoughts on the description in comment #0 can you please have a look if you get a chance).

making this DFG:DF-Lifecycle for now but we can hand over to Ceph later after initial triage.

Comment 7 Giulio Fidente 2016-11-21 13:19:58 UTC
Looks like there is a possibility that the FSID provided by the client/user changes on update; this can potentially break the entire Ceph cluster.

Can we test the following is true at the end of an *clean* deployment, on any given cephstorage node:

  the uuid given by:

    sudo hiera ceph::profile::params::fsid

  is the same that you see in the ceph.conf file:

    grep fsid /etc/ceph/ceph.conf

  and also the same you see applied on the disks:

    ceph-disk list /dev/vdb | egrep -o '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}'


Assuming the above passes (as expected), we need to repeat those sane commands on a node where the update failed, to see if/what is changed.

Comment 14 Marios Andreou 2016-11-22 10:53:12 UTC
just cherrypicked to newton since master landed, updating the trackers for https://review.openstack.org/400676 and https://review.openstack.org/400677

Comment 17 Julie Pichon 2016-11-23 11:34:25 UTC
Would it be possible to confirm which puddle / tripleo-common version this problem was seen on? There's suggestions upstream that the initial problem should have been resolved together with the fix for bug 1388930 / https://github.com/openstack/tripleo-common/commit/ebe270 .

Comment 18 Alexander Chuzhoy 2016-11-23 19:14:22 UTC
The issue still reproduces for me:

The deployed and updated versions of python-tripleoclient and tripleo-common rpms are below:

[stack@undercloud-0 ~]$ sudo grep python-tripleoclient /var/log/yum.log
Nov 23 11:32:10 Installed: python-tripleoclient-5.3.0-7.el7ost.noarch
Nov 23 12:32:58 Updated: python-tripleoclient-5.4.0-1.el7ost.noarch

[stack@undercloud-0 ~]$ sudo grep tripleo-common /var/log/yum.log
Nov 23 11:32:09 Installed: openstack-tripleo-common-5.3.0-6.el7ost.noarch
Nov 23 12:34:14 Updated: openstack-tripleo-common-5.4.0-2.el7ost.noarch


Grepping the log on ceph for changed fsid:

Notice: /Stage[setup]/Tripleo::Packages::Upgrades/Exec[package-upgrade]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/fsid]/value: value changed 'bdd8803a-b19c-11e6-8392-525400842cd7' to 'a8e1eac8-b19b-11e6-adcf-525400842cd7'\u001b

Comment 19 Alexander Chuzhoy 2016-11-23 21:42:19 UTC
The issue still reproduces even when I initially deploy with:
openstack-tripleo-common-5.4.0-2.el7ost.noarch
python-tripleoclient-5.4.0-1.el7ost.noarch

Comment 21 Giulio Fidente 2016-11-24 00:31:38 UTC
Moving back to ON_DEV, looks like we're not finished with this bug yet.

At the end of the initial deployment, by downloading the Mistral environment I find two different values for CephClusterFSID.

Before launching the overcloud update, to update the plan in Swift from the new templates, the docs suggest to repeat the deploy command adding the --update-plan-only argument and after that launch the update command.

The --update-plan-only argument is meant to refresh the plan in Swift but it also updates the environment in Mistral and in doing so it deletes one of the two occurrences of CephClusterFSID, which happens to be the version in use, leaving in the Mistral environment only one occurrence which does not match the FSID in use on the cluster.

The overcloud update command will then hit the bug.

Comment 22 John Fulton 2016-11-24 04:24:22 UTC
I also reproduced this with the same versions [1]. Even adding one node to an existing cluster can result in each ceph.conf in the entire cluster getting a new fsid and `ceph-disk list` will report every OSD is from an unknown cluster [2], though if they were active the cluster will continue to run without problem. 

Workaround: set the FSID in your ceph environment file and re-run the update. 

parameter_defaults:
  ExtraConfig:
    ceph::profile::params::fsid: eb2bb192-b1c9-11e6-9205-525400330666

If you make value for the above with `uuidgen` and put it in your environment file when you do the initial deploy, then as log as you have that parameter you won't hit this. Note that you need to pass the above as ExtraConfig, not just CephStorageExtraConfig because the param is needed by both the OSDs and the Mons. 

One nice thing about this workaround is that even if you hit this with an existing cluster your OSDs will still be active and working (e.g. my nova ceph-backend instances are still running and no data is lost). You can then just grab the old FSID from running `ceph-disk list` on an OSD and re-run the upgrade with it hard coded fsid above and your cluster will be fine [3]. Naturally the bug needs to be fixed, but I wanted to point out the workaround for anyone who hits it in the meantime. 

[1]
python-tripleoclient-5.4.0-1.el7ost.noarch
puppet-tripleo-5.4.0-2.el7ost.noarch

[2]
[stack@hci-director ~]$ ansible osds  -b -m shell -a "ceph-disk list | grep unknown" 
192.168.1.22 | SUCCESS | rc=0 >>
 /dev/sda1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.28, journal /dev/sdm4
 /dev/sdb1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.22, journal /dev/sdm3
 /dev/sdc1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.4, journal /dev/sdm1
 /dev/sdd1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.13, journal /dev/sdm2
 /dev/sde1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.7, journal /dev/sdn1
 /dev/sdf1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.16, journal /dev/sdn2
 /dev/sdg1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.25, journal /dev/sdn3
 /dev/sdh1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.34, journal /dev/sdn4
 /dev/sdi1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.1, journal /dev/sdo1
 /dev/sdj1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.19, journal /dev/sdo3
 /dev/sdk1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.10, journal /dev/sdo2
 /dev/sdl1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.31, journal /dev/sdo4

192.168.1.35 | SUCCESS | rc=0 >>
 /dev/sda1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.30, journal /dev/sdm4
 /dev/sdb1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.24, journal /dev/sdm3
 /dev/sdc1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.6, journal /dev/sdm1
 /dev/sdd1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.15, journal /dev/sdm2
 /dev/sde1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.9, journal /dev/sdn1
 /dev/sdf1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.18, journal /dev/sdn2
 /dev/sdg1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.27, journal /dev/sdn3
 /dev/sdh1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.35, journal /dev/sdn4
 /dev/sdi1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.3, journal /dev/sdo1
 /dev/sdj1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.21, journal /dev/sdo3
 /dev/sdk1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.12, journal /dev/sdo2
 /dev/sdl1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.32, journal /dev/sdo4

192.168.1.26 | SUCCESS | rc=0 >>
 /dev/sda1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.26, journal /dev/sdm4
 /dev/sdb1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.20, journal /dev/sdm3
 /dev/sdc1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.2, journal /dev/sdm1
 /dev/sdd1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.11, journal /dev/sdm2
 /dev/sde1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.5, journal /dev/sdn1
 /dev/sdf1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.14, journal /dev/sdn2
 /dev/sdg1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.23, journal /dev/sdn3
 /dev/sdh1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.33, journal /dev/sdn4
 /dev/sdi1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.0, journal /dev/sdo1
 /dev/sdj1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.17, journal /dev/sdo3
 /dev/sdk1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.8, journal /dev/sdo2
 /dev/sdl1 ceph data, active, unknown cluster eb2bb192-b1c9-11e6-9205-525400330666, osd.29, journal /dev/sdo4
/dev/sdq other, unknown
/dev/sr0 other, unknown

[stack@hci-director ~]$

[3] 
[stack@hci-director ~]$ grep fsid ~/custom-templates/ceph.yaml 
    ceph::profile::params::fsid: eb2bb192-b1c9-11e6-9205-525400330666
[stack@hci-director ~]$ ansible osds -b -m shell -a "hiera ceph::profile::params::fsid"
192.168.1.26 | SUCCESS | rc=0 >>
eb2bb192-b1c9-11e6-9205-525400330666

192.168.1.21 | SUCCESS | rc=0 >>
eb2bb192-b1c9-11e6-9205-525400330666

192.168.1.22 | SUCCESS | rc=0 >>
eb2bb192-b1c9-11e6-9205-525400330666

192.168.1.35 | SUCCESS | rc=0 >>
eb2bb192-b1c9-11e6-9205-525400330666

[stack@hci-director ~]$ 
[stack@hci-director ~]$ ansible osds  -b -m shell -a "ceph-disk list | grep osd"
192.168.1.21 | SUCCESS | rc=0 >>
 /dev/sda1 ceph data, active, cluster ceph, osd.45, journal /dev/sdm4
 /dev/sdb1 ceph data, active, cluster ceph, osd.43, journal /dev/sdm3
 /dev/sdc1 ceph data, active, cluster ceph, osd.37, journal /dev/sdm1
 /dev/sdd1 ceph data, active, cluster ceph, osd.40, journal /dev/sdm2
 /dev/sde1 ceph data, active, cluster ceph, osd.38, journal /dev/sdn1
 /dev/sdf1 ceph data, active, cluster ceph, osd.41, journal /dev/sdn2
 /dev/sdg1 ceph data, active, cluster ceph, osd.44, journal /dev/sdn3
 /dev/sdh1 ceph data, active, cluster ceph, osd.47, journal /dev/sdn4
 /dev/sdi1 ceph data, active, cluster ceph, osd.36, journal /dev/sdo1
 /dev/sdj1 ceph data, active, cluster ceph, osd.42, journal /dev/sdo3
 /dev/sdk1 ceph data, active, cluster ceph, osd.39, journal /dev/sdo2
 /dev/sdl1 ceph data, active, cluster ceph, osd.46, journal /dev/sdo4

192.168.1.35 | SUCCESS | rc=0 >>
 /dev/sda1 ceph data, active, cluster ceph, osd.30, journal /dev/sdm4
 /dev/sdb1 ceph data, active, cluster ceph, osd.24, journal /dev/sdm3
 /dev/sdc1 ceph data, active, cluster ceph, osd.6, journal /dev/sdm1
 /dev/sdd1 ceph data, active, cluster ceph, osd.15, journal /dev/sdm2
 /dev/sde1 ceph data, active, cluster ceph, osd.9, journal /dev/sdn1
 /dev/sdf1 ceph data, active, cluster ceph, osd.18, journal /dev/sdn2
 /dev/sdg1 ceph data, active, cluster ceph, osd.27, journal /dev/sdn3
 /dev/sdh1 ceph data, active, cluster ceph, osd.35, journal /dev/sdn4
 /dev/sdi1 ceph data, active, cluster ceph, osd.3, journal /dev/sdo1
 /dev/sdj1 ceph data, active, cluster ceph, osd.21, journal /dev/sdo3
 /dev/sdk1 ceph data, active, cluster ceph, osd.12, journal /dev/sdo2
 /dev/sdl1 ceph data, active, cluster ceph, osd.32, journal /dev/sdo4

192.168.1.26 | SUCCESS | rc=0 >>
 /dev/sda1 ceph data, active, cluster ceph, osd.26, journal /dev/sdm4
 /dev/sdb1 ceph data, active, cluster ceph, osd.20, journal /dev/sdm3
 /dev/sdc1 ceph data, active, cluster ceph, osd.2, journal /dev/sdm1
 /dev/sdd1 ceph data, active, cluster ceph, osd.11, journal /dev/sdm2
 /dev/sde1 ceph data, active, cluster ceph, osd.5, journal /dev/sdn1
 /dev/sdf1 ceph data, active, cluster ceph, osd.14, journal /dev/sdn2
 /dev/sdg1 ceph data, active, cluster ceph, osd.23, journal /dev/sdn3
 /dev/sdh1 ceph data, active, cluster ceph, osd.33, journal /dev/sdn4
 /dev/sdi1 ceph data, active, cluster ceph, osd.0, journal /dev/sdo1
 /dev/sdj1 ceph data, active, cluster ceph, osd.17, journal /dev/sdo3
 /dev/sdk1 ceph data, active, cluster ceph, osd.8, journal /dev/sdo2
 /dev/sdl1 ceph data, active, cluster ceph, osd.29, journal /dev/sdo4

192.168.1.22 | SUCCESS | rc=0 >>
 /dev/sda1 ceph data, active, cluster ceph, osd.28, journal /dev/sdm4
 /dev/sdb1 ceph data, active, cluster ceph, osd.22, journal /dev/sdm3
 /dev/sdc1 ceph data, active, cluster ceph, osd.4, journal /dev/sdm1
 /dev/sdd1 ceph data, active, cluster ceph, osd.13, journal /dev/sdm2
 /dev/sde1 ceph data, active, cluster ceph, osd.7, journal /dev/sdn1
 /dev/sdf1 ceph data, active, cluster ceph, osd.16, journal /dev/sdn2
 /dev/sdg1 ceph data, active, cluster ceph, osd.25, journal /dev/sdn3
 /dev/sdh1 ceph data, active, cluster ceph, osd.34, journal /dev/sdn4
 /dev/sdi1 ceph data, active, cluster ceph, osd.1, journal /dev/sdo1
 /dev/sdj1 ceph data, active, cluster ceph, osd.19, journal /dev/sdo3
 /dev/sdk1 ceph data, active, cluster ceph, osd.10, journal /dev/sdo2
 /dev/sdl1 ceph data, active, cluster ceph, osd.31, journal /dev/sdo4

[stack@hci-director ~]$

Comment 23 Giulio Fidente 2016-11-24 07:15:58 UTC
Update succeeded with the tentative patch at https://review.openstack.org/#/c/401812/

Though we probably can't merge that as it might break deployments of OSPd9 overclouds from an OSPd10 undercloud. Still in progress.

Comment 24 Julie Pichon 2016-11-24 08:34:29 UTC
I think OSP9 overclouds deployed from a OSP10 undercloud should still be fine, as they'll still use triple-common to generate/manage the passwords.

Comment 25 Marios Andreou 2016-11-24 09:04:13 UTC
So @gfidente/@jpichon - do we need something further here? The reviews on the external tracker are all merged to stable/newton so for OSP 10 this BZ can move to POST. Unless we need another thing as per comment #23

Comment 26 Giulio Fidente 2016-11-24 09:08:18 UTC
(In reply to Julie Pichon from comment #24)
> I think OSP9 overclouds deployed from a OSP10 undercloud should still be
> fine, as they'll still use triple-common to generate/manage the passwords.

thanks Julie, I've tested https://review.openstack.org/#/c/401812/ manually as well and seems to work fine for OSPd9 deployment too.

Comment 27 Dougal Matthews 2016-11-24 10:49:13 UTC
> At the end of the initial deployment, by downloading the Mistral environment I find two different values for CephClusterFSID.

That is expected. All automatically generated passwords are stored separately so they can't be easily deleted by mistake. This is only for a backup.

The other one is provided by tripleoclient and will be used.

Comment 28 Marios Andreou 2016-11-24 14:42:31 UTC
we should only move to POST once stable/newton lands.

Comment 30 Alexander Chuzhoy 2016-11-29 02:22:52 UTC
Verified:
Environment:
 openstack-tripleo-common-5.4.0-3.el7ost.noarch.rpm  
 python-tripleoclient-5.4.0-2.el7ost.noarch.rpm 
The reported issue doesn't reproduce.

Comment 32 errata-xmlrpc 2016-12-14 16:34:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.