While trying to use oc cluster up --metrics|--logging deployment fails with TASK [Gather Cluster facts] *************************************************************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/playbooks/init/cluster_facts.yml:9 Friday 02 February 2018 21:02:07 +0000 (0:00:00.076) 0:00:05.943 ******* An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'ansible_default_ipv4' fatal: [127.0.0.1]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n File "/tmp/ansible_jrgkpu/ansible_module_openshift_facts.py", line 1687, in \n main()\n File "/tmp/ansible_jrgkpu/ansible_module_openshift_facts.py", line 1674, in main\n additive_facts_to_overwrite)\n File "/tmp/ansible_jrgkpu/ansible_module_openshift_facts.py", line 1339, in init\n additive_facts_to_overwrite)\n File "/tmp/ansible_jrgkpu/ansible_module_openshift_facts.py", line 1368, in generate_facts\n defaults = self.get_defaults(roles, deployment_type, deployment_subtype)\n File "/tmp/ansible_jrgkpu/ansible_module_openshift_facts.py", line 1400, in get_defaults\n ip_addr = self.system_facts['ansible_default_ipv4']['address']\nKeyError: 'ansible_default_ipv4'\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 1} Additional details in https://github.com/openshift/openshift-ansible/issues/7006
Proposed: https://github.com/openshift/openshift-ansible/pull/7760
release-3.9: https://github.com/openshift/openshift-ansible/pull/7805
I'm facing the similar issue when runnining 'oc cluster up --logging=true --metrics=true --version=v3.9 --loglevel=2': $ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-ghbt9 1/1 Running 0 7m default persistent-volume-setup-wplxp 0/1 Completed 0 7m default router-1-4j4tx 1/1 Running 0 7m logging openshift-ansible-logging-job-264wh 0/1 Error 0 4m logging openshift-ansible-logging-job-c8g8c 0/1 Error 0 7m logging openshift-ansible-logging-job-g7g8v 0/1 Error 0 5m logging openshift-ansible-logging-job-lnfk9 0/1 Error 0 2m logging openshift-ansible-logging-job-nv5mv 0/1 Error 0 5m logging openshift-ansible-logging-job-w8pts 0/1 Error 0 4m logging openshift-ansible-logging-job-whckj 0/1 Error 0 5m openshift-infra openshift-ansible-metrics-job-2p4kf 0/1 Error 0 3m openshift-infra openshift-ansible-metrics-job-77t4b 0/1 Error 0 4m openshift-infra openshift-ansible-metrics-job-gw7sv 0/1 Error 0 5m openshift-infra openshift-ansible-metrics-job-h8xqm 0/1 Error 0 5m openshift-infra openshift-ansible-metrics-job-md5p9 0/1 Error 0 7m openshift-infra openshift-ansible-metrics-job-v5q4n 0/1 Error 0 5m openshift-web-console webconsole-548fd9b7c4-kzmh6 1/1 Running 0 7m Both Metrics and Logging playbook jobs fail due the same problem: TASK [Gather Cluster facts] **************************************************** Sunday 01 April 2018 13:56:00 +0000 (0:00:00.074) 0:00:07.064 ********** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'ansible_default_ipv4' fatal: [127.0.0.1]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_tQr_lb/ansible_module_openshift_facts.py\", line 1688, in <module>\n main()\n File \"/tmp/ansible_tQr_lb/ansible_module_openshift_facts.py\", line 1675, in main\n additive_facts_to_overwrite)\n File \"/tmp/ansible_tQr_lb/ansible_module_openshift_facts.py\", line 1340, in __init__\n additive_facts_to_overwrite)\n File \"/tmp/ansible_tQr_lb/ansible_module_openshift_facts.py\", line 1369, in generate_facts\n defaults = self.get_defaults(roles, deployment_type, deployment_subtype)\n File \"/tmp/ansible_tQr_lb/ansible_module_openshift_facts.py\", line 1401, in get_defaults\n ip_addr = self.system_facts['ansible_default_ipv4']['address']\nKeyError: 'ansible_default_ipv4'\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 1} My environment: $ oc version oc v3.9.0+191fece kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://127.0.0.1:8443 openshift v3.9.0+191fece kubernetes v1.9.1+a0ce1bc657 $ docker version Client: Version: 18.03.0-ce API version: 1.37 Go version: go1.9.4 Git commit: 0520e24 Built: Wed Mar 21 23:11:15 2018 OS/Arch: linux/amd64 Experimental: false Orchestrator: swarm Server: Engine: Version: 18.03.0-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.4 Git commit: 0520e24 Built: Wed Mar 21 23:15:01 2018 OS/Arch: linux/amd64 Experimental: false $ cat /etc/fedora-release Fedora release 27 (Twenty Seven)
Please pull the latest image for the v3.9 tag and try again.
I'm still facing the same issue when running 'oc cluster up --metrics=true --version=v3.9 --loglevel=2' after cleaning all images from my Docker local storage. It looks like the image tagged with v3.9 (or latest) at registry.access.redhat.com is still from 2 weeks ago: $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry.access.redhat.com/openshift3/ose-ansible latest 19f345ac236b 2 weeks ago 846MB registry.access.redhat.com/openshift3/ose-ansible v3.9 19f345ac236b 2 weeks ago 846MB registry.access.redhat.com/openshift3/ose-haproxy-router latest 4eb76bae54ef 2 weeks ago 1.28GB registry.access.redhat.com/openshift3/ose-haproxy-router v3.9 4eb76bae54ef 2 weeks ago 1.28GB registry.access.redhat.com/openshift3/ose-deployer latest ba9779c50c5b 2 weeks ago 1.26GB registry.access.redhat.com/openshift3/ose-deployer v3.9 ba9779c50c5b 2 weeks ago 1.26GB registry.access.redhat.com/openshift3/ose latest 078f595369ae 2 weeks ago 1.26GB registry.access.redhat.com/openshift3/ose v3.9 078f595369ae 2 weeks ago 1.26GB registry.access.redhat.com/openshift3/ose-docker-registry latest 11923de49247 2 weeks ago 459MB registry.access.redhat.com/openshift3/ose-docker-registry v3.9 11923de49247 2 weeks ago 459MB registry.access.redhat.com/openshift3/ose-web-console latest a0f5a2e23591 2 weeks ago 489MB registry.access.redhat.com/openshift3/ose-web-console v3.9 a0f5a2e23591 2 weeks ago 489MB registry.access.redhat.com/openshift3/ose-pod latest e598d93f5abe 2 weeks ago 209MB registry.access.redhat.com/openshift3/ose-pod v3.9 e598d93f5abe 2 weeks ago 209MB Some 'oc' outputs: $ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-q4sfp 1/1 Running 0 24m default persistent-volume-setup-x74b7 0/1 Completed 0 24m default router-1-59rw2 1/1 Running 0 24m openshift-infra openshift-ansible-metrics-job-4dx6q 0/1 Error 0 23m openshift-infra openshift-ansible-metrics-job-5h76z 0/1 Error 0 20m openshift-infra openshift-ansible-metrics-job-6cwgz 0/1 Error 0 23m openshift-infra openshift-ansible-metrics-job-6qlg6 0/1 Error 0 22m openshift-infra openshift-ansible-metrics-job-fggrm 0/1 Error 0 21m openshift-infra openshift-ansible-metrics-job-fxgd6 0/1 Error 0 22m openshift-infra openshift-ansible-metrics-job-vdmfn 0/1 Error 0 24m openshift-web-console webconsole-744d5fcf55-ck6vh 1/1 Running 0 24m And: $ oc logs openshift-ansible-metrics-job-5h76z -n openshift-infra (...) TASK [Gather Cluster facts] **************************************************** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'ansible_default_ipv4' fatal: [127.0.0.1]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_S2aJ4E/ansible_module_openshift_facts.py\", line 1704, in <module>\n main()\n File \"/tmp/ansible_S2aJ4E/ansible_module_openshift_facts.py\", line 1691, in main\n additive_facts_to_overwrite)\n File \"/tmp/ansible_S2aJ4E/ansible_module_openshift_facts.py\", line 1355, in __init__\n additive_facts_to_overwrite)\n File \"/tmp/ansible_S2aJ4E/ansible_module_openshift_facts.py\", line 1384, in generate_facts\n defaults = self.get_defaults(roles, deployment_type, deployment_subtype)\n File \"/tmp/ansible_S2aJ4E/ansible_module_openshift_facts.py\", line 1416, in get_defaults\n ip_addr = self.system_facts['ansible_default_ipv4']['address']\nKeyError: 'ansible_default_ipv4'\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 0} PLAY RECAP ********************************************************************* 127.0.0.1 : ok=30 changed=0 unreachable=0 failed=1 INSTALLER STATUS *************************************************************** Initialization : In Progress (0:00:12)
Commit is in build openshift-ansible-3.9.20-1.git.0.f99fb43.el7
(In reply to Russell Teague from comment #6) > Commit is in build openshift-ansible-3.9.20-1.git.0.f99fb43.el7 I couldn't find any container image tagged with v3.9.20 in the registries, and looks like 'v3.9' and 'latest' tags are still pointing to the affected version. Maybe because the new build was not pushed to the public registries yet. I'll wait and try again in a few days.
Davi, The v3.9.20 tag is specific to OCP images. However, the public CI infrastructure will push docker.io/openshift/origin-ansible:v3.9 periodically and it looks like that has the change if you'd like to test with it. For supported OCP installs however you'll need to wait until this bug is attached to an errata and ships.
(In reply to Scott Dodson from comment #8) > Davi, > > The v3.9.20 tag is specific to OCP images. However, the public CI > infrastructure will push docker.io/openshift/origin-ansible:v3.9 > periodically and it looks like that has the change if you'd like to test > with it. For supported OCP installs however you'll need to wait until this > bug is attached to an errata and ships. Thanks for your feedback! I'm testing with 'oc cluster up --metrics=true --version=v3.9 --loglevel=2' (origin's oc version) and I'm not sure how should I make it pull the artefacts from a different registry/repository to get the fix. Any advice?
I used the latest v3.9 brew image which point to v3.9.27 version, also met the comment 5 error. #skopeo inspect docker://brew-***/openshift3/ose-ansible:v3.9 --tls-verify=false | grep version "version": "v3.9.27" #oc cluster up -image='brew-****/openshift3/ose' --logging=true --metrics=true --version=v3.9 --loglevel=8 # oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-vzr9z 1/1 Running 0 6m default persistent-volume-setup-m5qzq 0/1 Evicted 0 8m default persistent-volume-setup-rdw49 0/1 Completed 0 6m default router-1-9f6c2 1/1 Running 0 6m logging openshift-ansible-logging-job-sfw4l 0/1 Error 0 8m openshift-infra openshift-ansible-metrics-job-64mwd 0/1 Error 0 8m openshift-web-console webconsole-5849496d9d-fcsb7 1/1 Running 0 8m TASK [Gather Cluster facts] **************************************************** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'ansible_default_ipv4' fatal: [127.0.0.1]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_gs2xvV/ansible_module_openshift_facts.py\", line 1688, in <module>\n main()\n File \"/tmp/ansible_gs2xvV/ansible_module_openshift_facts.py\", line 1675, in main\n additive_facts_to_overwrite)\n File \"/tmp/ansible_gs2xvV/ansible_module_openshift_facts.py\", line 1340, in __init__\n additive_facts_to_overwrite)\n File \"/tmp/ansible_gs2xvV/ansible_module_openshift_facts.py\", line 1369, in generate_facts\n defaults = self.get_defaults(roles, deployment_type, deployment_subtype)\n File \"/tmp/ansible_gs2xvV/ansible_module_openshift_facts.py\", line 1401, in get_defaults\n ip_addr = self.system_facts['ansible_default_ipv4']['address']\nKeyError: 'ansible_default_ipv4'\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 0} Another issue: No --logging|metrics options in 'oc cluster up' v3.10, and can't use 'oc cluster add logging' to add this component. So how to add logging|metrics with v3.10?
After updating the image, looks like now the playbook failure is at a different point: $ oc logs -f openshift-ansible-metrics-job-gl8vz -n openshift-infra (...) RUNNING HANDLER [openshift_metrics : restart master api] *********************** Saturday 28 April 2018 13:40:37 +0000 (0:00:00.179) 0:02:32.201 ******** fatal: [127.0.0.1]: FAILED! => {"changed": false, "cmd": "/usr/bin/systemctl", "msg": "Failed to get D-Bus connection: Operation not permitted", "rc": 1, "stderr": "Failed to get D-Bus connection: Operation not permitted\n", "stderr_lines": ["Failed to get D-Bus connection: Operation not permitted"], "stdout": "", "stdout_lines": []} The command used is still the same: $ oc cluster up --metrics=true --version=v3.9 --loglevel=2 The images I have locally are: $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE openshift/origin-ansible v3.9 d2967b4f1b8a 16 hours ago 1.22GB openshift/origin-web-console v3.9 2bff42918944 9 days ago 489MB openshift/origin-docker-registry v3.9 c13803d064c8 9 days ago 458MB openshift/origin-haproxy-router v3.9 0a48e702efe7 9 days ago 1.28GB openshift/origin-deployer v3.9 78a4725976a1 9 days ago 1.25GB openshift/origin v3.9 94e55dec4dc3 9 days ago 1.25GB openshift/origin-pod v3.9 f62f913f6617 9 days ago 220MB openshift/origin-metrics-cassandra v3.9 d77a710bd9f0 5 months ago 780MB openshift/origin-metrics-hawkular-metrics v3.9 67c1503b2ae2 5 months ago 914MB openshift/origin-metrics-heapster v3.9 93f72c7c2f46 5 months ago 820MB
I've hit the same (original reported) error running the oc v3.9.30 client on a Fedora 28 machine. In the openshift-infra project you get a number of failed attempts to run the metrics ansible installation. The failure reports that "KeyError: 'ansible_default_ipv4'" It appears the ansible playbooks are attempting to reference the ansible_default_ipv4 fact and from this derive an ipv4 address. At the command line, I am able to check the facts for my Fedora 28 machine using the following: ansible all -i localhost, -m setup -c local If I grep this ansible_default_ipv4 the fact is present and correct Using a failed metrics install pod, I can run this up using the debug option oc debug pod/<pod name> Then running the above ansible command in the pod, I can see that the fact is not available. What I've also determined is that the /sbin/ip is not present in the container, if this is used by ansible to determine this fact then it could be why the fact is not available.
Looks like iproute was added to fix this upstream: https://github.com/openshift/openshift-ansible/pull/7760 but the fix has not made it into registry.access.redhat.com/openshift3/ose-ansible:v3.9.30 image (last updated 11 days ago)
Update for Dockerfile.rhel7 used to build ose-ansible https://github.com/openshift/openshift-ansible/pull/8870
Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/ee09d0942722b242a2397d4b8c3fe9862f3c575b Bug 1558689 - Add iproute to Dockerfile.rhel7 iproute is required by Ansible to gather some networking facts Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1558689 https://github.com/openshift/openshift-ansible/commit/c3195db0363833c8a273b1a9bc91c4e6021477ee Merge pull request #8870 from mtnbikenc/fix-1558689 Bug 1558689 - Add iproute to Dockerfile.rhel7
release-3.9: https://github.com/openshift/openshift-ansible/pull/8874
openshift-ansible-3.9.33-1
logging and metrics pods still can't be running with 3.9.33-1 client and 3.9.33-1 images #oc cluster up --image='brew-***/openshift3/ose' --logging=true --metrics=true --version=v3.9.33-1 --loglevel=8 --public-hostname=10.8.241.46 #oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-hbx5q 1/1 Running 0 1m default persistent-volume-setup-b95qk 0/1 Completed 0 1m default router-1-q79f2 1/1 Running 0 1m logging openshift-ansible-logging-job-jt58s 0/1 Error 0 20s logging openshift-ansible-logging-job-q592c 0/1 Error 0 1m openshift-infra openshift-ansible-metrics-job-fr74d 0/1 Error 0 1m openshift-infra openshift-ansible-metrics-job-v9nnl 0/1 Error 0 16s openshift-web-console webconsole-746648b7d4-fn8x8 1/1 Running 0 1m #oc logs -f openshift-ansible-logging-job-q592c -n logging TASK [Ensure various deps for running system containers are installed] ********* skipping: [10.8.241.46] => (item=atomic) => {"changed": false, "item": "atomic", "skip_reason": "Conditional result was False", "skipped": true} skipping: [10.8.241.46] => (item=ostree) => {"changed": false, "item": "ostree", "skip_reason": "Conditional result was False", "skipped": true} skipping: [10.8.241.46] => (item=runc) => {"changed": false, "item": "runc", "skip_reason": "Conditional result was False", "skipped": true} PLAY [Initialize cluster facts] ************************************************ TASK [Gathering Facts] ********************************************************* ok: [10.8.241.46] TASK [Gather Cluster facts] **************************************************** fatal: [10.8.241.46]: FAILED! => {"changed": false, "cmd": "hostname -f", "failed": true, "msg": "[Errno 2] No such file or directory", "rc": 2} PLAY RECAP ********************************************************************* 10.8.241.46 : ok=19 changed=0 unreachable=0 failed=1 localhost : ok=11 changed=0 unreachable=0 failed=0
https://github.com/openshift/openshift-ansible/pull/9164
#oc cluster up --image='brew-pulp-***:8888/openshift3/ose' --logging=true --metrics=true --version=v3.9.40 error in comment #18 has fixed, but new error comeout since no useful yum config file TASK [openshift_version : fail] ************************************************ fatal: [127.0.0.1]: FAILED! => {"changed": false, "failed": true, "msg": "Package 'atomic-openshift' not found"} oc debug openshift-ansible-logging-job-wx2h2 -n logging Defaulting container name to openshift-ansible-logging-job. Use 'oc describe pod/openshift-ansible-logging-job-wx2h2-debug -n logging' to see all of the containers in this pod. Debugging with pod/openshift-ansible-logging-job-wx2h2-debug, original command: <image entrypoint> Waiting for pod to start ... Pod IP: 172.16.120.90 If you don't see a command prompt, try pressing enter. sh-4.2# yum search atomic-openshift Loaded plugins: ovl, product-id, search-disabled-repos, subscription-manager This system is not receiving updates. You can use subscription-manager on the host to register and assign subscriptions. =============================================================================================== N/S matched: atomic-openshift =============================================================================================== atomic-openshift-clients.x86_64 : Origin Client binaries for Linux atomic-openshift-utils.noarch : Atomic OpenShift Utilities Name and summary matches only, use "search all" for everything.
Support for installing metrics and logging via `oc cluster up` has always been experimental and is removed in 3.10 and newer. We won't be able to fix this one.