This sounds a known issue with the puppet Apache module where because it's flag purge_configs is set to true it removes the mellon config files belonging to the mod_auth_mellon RPM.
The documented procedure includes modifying the purge_configs value.
I believe there was a bug open for this but I can no longer find it, it's something Rodrigo worked on for a bit and something I though Ozz was going to address in some manner.
But for now lets confirm this is the same issue. Please ssh into the controller nodes and do this:
rpm -qV mod_auth_mellon
If it's missing any files, in particular these files:
then it's the same issue.
Looks like its the same issue. Here is the output.
[heat-admin@controller-0 ~]$ rpm -qV mod_auth_mellon
missing c /etc/httpd/conf.d/auth_mellon.conf
missing c /etc/httpd/conf.modules.d/10-auth_mellon.conf
Was the setup documented in section 4.14 (which disables the purge_configs option) performed?
Yes. They were performed. The files from step 1 and 2 are below,
[stack@undercloud-0 ~]$ cat fed_deployment/puppet_override_apache.yaml
[stack@undercloud-0 ~]$ cat overcloud_deploy.sh
timeout 100m openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/extra_templates.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /home/stack/virt/docker-images-ceph.yaml \
-e /home/stack/fed_deployment/puppet_override_apache.yaml \
-e /home/stack/fed_deployment/puppet_override_keystone.yaml \
This is consistent with what Rodrigo was seeing with his testing. We would set the purge_configs flag to false yet the files would still get purged. The solution was to go back into the controller nodes and reinstall mod_auth_mellon.
We have 3 possible solutions moving forward.
1) Someone with expertise with puppet in this environment debugs why the purge_configs flag is not be respected. I think that would be Ozz and why I reassigned the bug to him. 
2) We make mellon a full-fledged member of the installed Apache modules puppet knows about so puppet stops treating mellon as an foreign invading pathogen. Once again I think Ozz is the best person to know how to do that. I'm willing to do the work if Ozz wants to point me in the right direction, in which case he can reassign the bug back to me.
3) We document the need to reinstall mellon until such time as we can implement #2.
 I believe Rodrigo had done some initial work investigating this just before he left, I don't think he ever found out the root cause but perhaps we should reach out to him and see what he knows.
Ozz and I did a debug session together this morning.
We reviewed the logs for any anomaly related to the purging of the config files, we did not find anything.
We verified the expected behavior of the heat templates and the puppet operations.
We verified the apache:purge_configs flag was set to false on the controller node.
We verified the mod_auth_mellon RPM files were missing so we reinstalled the RPM.
We reran overcloud_deploy and the RPM config files persisted. This is consistent with my recollection from when we first saw this issue when developing the procedure.
It appears the problem only appears on the first run of the overcloud_deploy. However what we don't know is if the files were deleted *prior* to running overcloud_deploy, there are a number of additional steps between installing the RPM on the overcloud nodes and finally running overcloud_deploy. Both Ozz and I believe none of the intermediate steps should have caused puppet to run and delete the files before we had a chance to set the purge_configs flag to false.
At this point we're perplexed. We believe our best option going forward is to start with a fresh deployment and methodically check the state of the overcloud nodes after each step completes to see where in the process the config files are being deleted. The goal is to identify the operation which triggered the file removal.
I've got some further information based on trying to recreate the problem.
Apparently the mod_auth_mellon files have been deleted *before* we even begin any of the steps for configuring federation. This occurs because the base RHEL image we start with has mod_auth_mellon installed. Not sure who is causing this to be installed in the base image, I assume we have a list of packages somewhere. But this is mostly irrelevant since I presume a customer could use their own base image. I assume when tripleo runs we have little control over enforcing the base image or what is installed on the base image, Ozz might be able to correct me on that.
Since mod_auth_mellon is not one of the Apache modules under control of the puppet apache class it will remove any config files not included in the list of apache modules known to the puppet apache class. This is very clearly documented in:
It appears our attempt to circumvent the removal of the mellon config files by adding apache::purge_configs = false in the federation steps *before* running overcloud_deploy.sh again is pointless because puppet has already been run once when creating the overcloud controllers. Since the image contained these files the act of creating the overcloud controllers causes the mellon files to be removed. Our attempts at remediation are too late. To preserve the mellon files the puppet apache class *must* know about mod_auth_mellon the very *first* time it runs.
How do we add mod_auth_mellon the the puppet apache list of modules?
We have to add this statement:
But where do we add this?
This is where my understanding of the Triplo architecture is weak, but a bit of searching suggests that puppet-tripleo/manifests/profile/base/keystone.pp is the right place. Is this correct Ozz?
It appears the right syntax in this file would be:
However this probably needs to be protected by a conditional indicating if federation is being set up and if so what kind of federation. I don't know if these conditionals exist or how those configuration conditionals work at the moment.
John, so where we would add this option depends on the architecture we want to get. If we want to be able to deploy mod_auth_mellon in a different host than keystone, then we need to create a profile for it. If it will always be in the same node as keystone, then your proposal is correct, we should add it to puppet-tripleo/manifests/profile/base/keystone.pp or trigger it somehow from there.
Thank you Ozz. mod_auth_mellon only needs to be installed on nodes running keystone because it is what supports keystones SAML federation feature.
I presume we would need to add a configuration option in tripleo/manifests/profile/base/keystone.pp near the top of this class
class tripleo::profile::base::keystone (
something along the lines of
$enable_saml_mellon = false
and then wrap the mellon statements inside that conditional.
Where would the enable_saml_mellon be set to true for our deployments?
Note: Openstack doc discusses other SAML providers such as shibboleth so I explicitly chose to append 'mellon' to the end of the config to allow for the possibility of supporting $enable_saml_shibboleth in the future.
*** This bug has been marked as a duplicate of bug 1434875 ***
enable_saml_mellon would need to be set to true somewhere in tripleo-heat-templates. Now, we need to figure out if we would make that part of another composable service, or would we add that in the keystone service template. But yeah, it would be in tripleo-heat-templates.