Section Number and Name:
All sections that reference the a "playbook" (e.g. prerequisites.yml, deploy_cluster.yml, etc)
Describe the issue:
The default ansible-based installer is configured to utilize fact-caching with a TTL of 10-minutes. During that window an admin may correct an installation problem by fixing a mistake somewhere. If they then re-run any playbook, within the cache TTL, _old_ (broken or incompatible) variable value will be used (instead of a new/correct value).
For example, see description of https://github.com/openshift/openshift-ansible/pull/8322 This would cause astonishment and confusion for any reasonable person. However, it was advised (in upstream PR linked above), this default behavior (presumably) has a positive effect for the common-case, large-cluster installations.
Suggestions for improvement:
Add a warning, recommending removal of invalid cache contents, should any change happen to system, network, or inventory configuration. The location of the cache is set by the `fact_caching_connection` value in `ansible.conf`. The Ansible Tower cache, or by the `$CACHE_PLUGIN_CONNECTION` environment variable.
Default OpenShift Ansible Configuration: https://github.com/openshift/openshift-ansible/blob/master/ansible.cfg
Cache configuration docs: http://docs.ansible.com/ansible/latest/plugins/cache.html
Cache options ref.: http://docs.ansible.com/ansible/latest/reference_appendices/config.html#cache-plugin
Fact caching works as designed. Perhaps you're hitting some bug in openshift-ansible.
What value are you having trouble with precisely?
(In reply to Michael Gugino from comment #1)
> What value are you having trouble with precisely?
The system's hostname. Correcting it between playbook runs, and gee-wiz, it kept using the wrong value :S
Fact caching isn't only for variables, it caches _everything_. If I can hit this and get confused, you-betcha an ansible-noob is going to pull their hair out. Personally, I'd prefer cache was disabled by default, but a documentation "fix" was called for, so this is that.
At least add a warning: Kill or remove cache between playbook runs, after any changes are made (to system, inventory, network, etc).
(In reply to Chris Evich from comment #2)
> (In reply to Michael Gugino from comment #1)
> > What value are you having trouble with precisely?
> The system's hostname. Correcting it between playbook runs, and gee-wiz, it
> kept using the wrong value :S
> Fact caching isn't only for variables, it caches _everything_. If I can hit
> this and get confused, you-betcha an ansible-noob is going to pull their
> hair out. Personally, I'd prefer cache was disabled by default, but a
> documentation "fix" was called for, so this is that.
> At least add a warning: Kill or remove cache between playbook runs, after
> any changes are made (to system, inventory, network, etc).
So, this was not related to a variable in particular?
Hostname and other facts (not variables) are cached by the fact cache.
I think we should consider disabling fact-caching by default; I don't believe a typical deployment has much use for it, though it might save a couple of minutes in really large environments when going from prerequisites.yml to deploy_cluster.yml.
> So, this was not related to a variable in particular?
I think so, though it was several test-installations ago so I may be misremembering. I believe what happened is, I had 'openshift_hostname' set but the actual system hostname was incorrect. I removed that setting and corrected the actual hostname. However, upon re-running prerequisites.yml, it failed failed b/c both(?) were still cached.
I too think disabling the cache by default, or having an alternate configuration is a better solution. I tried that, and they sent me here for a docs-fix :S I guess perhaps the thing to do is proove w/n both system-facts and variables (set_facts) are cached (I believe they are)...
...okay, so with setting ``fact_caching = yaml`` and looking at the changes to the cache file:
* Cache is not checked/utilized if ``gather_facts`` is set False, even if values were previously cached.
* The ``set_facts`` task will utilize cache if it finds a value set, and its current ``cacheable`` attribute is set true. It doesn't check how a value made it into the cache or if it's valid.
* The contents of ``ansible_env`` are cached, however appear to be always refreshed. I would guess ``ansible_date_time`` behave similarly.
* Variables set in static inventory do not appear to be cached. I did not test dynamic inventory, but I'd guess it works the same (it has it's own caching API).
* Local facts (/etc/ansible/facts.d) are cached, including invalid value state (does not refresh when contents corrected).
The last point could be especially problematic, especially if the playbook sets a local fact ``foo`` based on a ``when: ansible_local.foo is undefined`` condition.
Anyway for the this bug (after testing), my guess is it was a cached ``ansible_hostname`` or local fact that caused it, and not my ``openshift_hostname`` setting change (in inventory). I can attempt to re-create this invalid-hostname situation if required.
Correction: ``ansible_env`` contents _are_ cached. My ``fact_caching_timeout`` was too low previously (30 seconds).
We have a note in Known Issues:
"On failure of the Ansible installer, you must start from a clean operating system installation. If you are using virtual machines, start from a fresh image. If you are using bare metal machines, see Uninstalling OpenShift Container Platform for instructions."
We link to this note from the Advanced Installer topic:
If for any reason the installation fails, before re-running the installer, see Known Issues to check for any specific instructions or workarounds.
Does this address your concern, or do you think something less "dramatic" would be better?
How does this look:
The installer caches playbook configuration values for 10 minutes, by default. If for some reason you change any system, network,
or inventory configuration, then re-run the installer within that 10 minute period, the new values are not used and the
previous valuea are used instead. You can delete the contents of the cashe, which is defined
by the `fact_caching_connection` value in the *_/etc/ansible/ansible.cfg_* file, which is
shown in xref:../../scaling_performance/install_practices.adoc#scaling-performance-install-optimization[Recommended Installation Practices].
It's a nasty time-suck problem when you hit it, where noticing a visible "caution" would be much appreciated. OTOH, probably 99.999% of the time, it's unnecessary reading. I'm comfortable leaving he degree of underlining/callout to your judgement, as my opinion on the matter is heavily bias.
Changes are live: