Bug 1415057 - [IntService_public_324]jks-cert-gen pod failed by FileNotFoundException for /etc/origin/logging/system.admin.jks
Summary: [IntService_public_324]jks-cert-gen pod failed by FileNotFoundException for /...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.5.z
Assignee: Jeff Cantrill
QA Contact: Xia Zhao
URL:
Whiteboard:
: 1415056 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-20 06:27 UTC by Xia Zhao
Modified: 2017-10-25 13:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-10-25 13:00:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ansible full logs (232.43 KB, text/plain)
2017-01-20 06:27 UTC, Xia Zhao
no flags Details
ansible log -20170124 (207.65 KB, text/plain)
2017-01-24 07:04 UTC, Junqi Zhao
no flags Details
system.admin.jks is under /etc/origin/logging (1.49 MB, text/plain)
2017-01-25 06:22 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3049 0 normal SHIPPED_LIVE OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update 2017-10-25 15:57:15 UTC

Description Xia Zhao 2017-01-20 06:27:35 UTC
Created attachment 1242658 [details]
ansible full logs

Description of problem:
Deploy logging with ansible, failed at TASK [openshift_logging : create JKS generation pod]:

Enable pod scheduling on OCP master to workaround bug #1415056 :
# oc get node
NAME                            STATUS                     AGE
$master                         Ready                      3h
$node                           Ready,SchedulingDisabled   3h

Make sure jks-cert-gen pod was scheduled on master node

# oc get po -o wide -n xiazhao
NAME                 READY     STATUS    RESTARTS   AGE       IP           NODE
jks-cert-gen-b1c8g   0/1       Error     0          17m       10.128.0.2   $master


# oc logs -f jks-cert-gen-b1c8g
+ dir=/etc/origin/logging
+ SCRATCH_DIR=/etc/origin/logging
+ [[ ! -f /etc/origin/logging/system.admin.jks ]]
+ generate_JKS_client_cert system.admin
+ NODE_NAME=system.admin
+ ks_pass=kspass
+ ts_pass=tspass
+ dir=/etc/origin/logging
+ echo Generating keystore and certificate for node system.admin
+ keytool -genkey -alias system.admin -keystore /etc/origin/logging/system.admin.jks -keyalg RSA -keysize 2048 -validity 712 -keypass kspass -storepass kspass -dname 'CN=system.admin, OU=OpenShift, O=Logging'
Generating keystore and certificate for node system.admin
keytool error: java.io.FileNotFoundException: /etc/origin/logging/system.admin.jks (Permission denied)


And checked on master that file system.admin.jks didn't exist in directory /etc/origin/logging/:

# ls -al /etc/origin/logging/
total 132
drwxr-xr-x. 2 root root 4096 1月  19 22:27 .
drwx------. 7 root root 4096 1月  19 22:23 ..
-rw-r--r--. 1 root root 1196 1月  19 22:25 02.pem
-rw-r--r--. 1 root root 1196 1月  19 22:26 03.pem
-rw-r--r--. 1 root root 1196 1月  19 22:26 04.pem
-rw-r--r--. 1 root root 1184 1月  19 22:26 05.pem
-rw-r--r--. 1 root root 1050 1月  19 22:24 ca.crt
-rw-r--r--. 1 root root    0 1月  19 22:25 ca.crt.srl
-rw-r--r--. 1 root root  301 1月  19 22:26 ca.db
-rw-r--r--. 1 root root   20 1月  19 22:26 ca.db.attr
-rw-r--r--. 1 root root   20 1月  19 22:26 ca.db.attr.old
-rw-r--r--. 1 root root  233 1月  19 22:26 ca.db.old
-rw-------. 1 root root 1675 1月  19 22:24 ca.key
-rw-r--r--. 1 root root    3 1月  19 22:26 ca.serial.txt
-rw-r--r--. 1 root root    3 1月  19 22:26 ca.serial.txt.old
-rw-r--r--. 1 root root 4679 1月  19 22:27 generate-jks.sh
-rw-r--r--. 1 root root 2242 1月  19 22:24 kibana-internal.crt
-rw-------. 1 root root 1679 1月  19 22:24 kibana-internal.key
-rw-r--r--. 1 root root  321 1月  19 22:24 server-tls.json
-rw-r--r--. 1 root root 4263 1月  19 22:24 signing.conf
-rw-r--r--. 1 root root 1184 1月  19 22:26 system.admin.crt
-rw-r--r--. 1 root root  948 1月  19 22:26 system.admin.csr
-rw-r--r--. 1 root root 1708 1月  19 22:26 system.admin.key
-rw-r--r--. 1 root root 1196 1月  19 22:26 system.logging.curator.crt
-rw-r--r--. 1 root root  960 1月  19 22:26 system.logging.curator.csr
-rw-r--r--. 1 root root 1704 1月  19 22:26 system.logging.curator.key
-rw-r--r--. 1 root root 1196 1月  19 22:25 system.logging.fluentd.crt
-rw-r--r--. 1 root root  960 1月  19 22:25 system.logging.fluentd.csr
-rw-r--r--. 1 root root 1704 1月  19 22:25 system.logging.fluentd.key
-rw-r--r--. 1 root root 1196 1月  19 22:26 system.logging.kibana.crt
-rw-r--r--. 1 root root  960 1月  19 22:26 system.logging.kibana.csr
-rw-r--r--. 1 root root 1704 1月  19 22:26 system.logging.kibana.key


Part of the ansible execution log (full log in attachment):
TASK [openshift_logging : create JKS generation pod] ***************************
task path: /home/xiazhao/openshift-ansible/roles/openshift_logging/tasks/generate_certs.yaml:149
Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/commands/command.py
...
<$master-public-dns> ESTABLISH SSH CONNECTION FOR USER: root
<$master-public-dns> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="/home/xiazhao/cfile/libra.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/home/xiazhao/.ansible/cp/ansible-ssh-%h-%p-%r -tt $master-public-dns '/bin/sh -c '"'"'/usr/bin/python /root/.ansible/tmp/ansible-tmp-1484890889.35-91809174259402/command.py; rm -rf "/root/.ansible/tmp/ansible-tmp-1484890889.35-91809174259402/" > /dev/null 2>&1 && sleep 0'"'"''
fatal: [$master-public-dns]: FAILED! => {
    "attempts": 5, 
    "changed": false, 
    "cmd": [
        "oc", 
        "--config=/tmp/openshift-logging-ansible-dbkNx8/admin.kubeconfig", 
        "get", 
        "pod/jks-cert-gen-b1c8g", 
        "-o", 
        "jsonpath={.status.phase}", 
        "-n", 
        "xiazhao"
    ], 
    "delta": "0:00:01.228427", 
    "end": "2017-01-20 00:41:34.030142", 
    "failed": true, 
    "invocation": {
        "module_args": {
            "_raw_params": "oc --config=/tmp/openshift-logging-ansible-dbkNx8/admin.kubeconfig get pod/jks-cert-gen-b1c8g -o jsonpath='{.status.phase}' -n xiazhao", 
            "_uses_shell": false, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "warn": true
        }, 
        "module_name": "command"
    }, 
    "rc": 0, 
    "start": "2017-01-20 00:41:32.801715", 
    "stderr": "", 
    "stdout": "Pending", 
    "stdout_lines": [
        "Pending"
    ], 
    "warnings": []
}
	to retry, use: --limit @/home/xiazhao/openshift-ansible/playbooks/common/openshift-cluster/openshift_logging.retry

PLAY RECAP *********************************************************************
$master-public-dns : ok=59   changed=1    unreachable=0    failed=1   


Version-Release number of selected component (if applicable):
# openshift version
openshift v3.5.0.6+87f6173
kubernetes v1.5.2+43a9be4
etcd 3.1.0-rc.0

How reproducible:
Always

Steps to Reproduce:
1. prepare the inventory file

[oo_first_master]
$master-public-dns ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="~/cfile/libra.pem" openshift_public_hostname=$master-public-dns

[oo_first_master:vars]
deployment_type=openshift-enterprise
openshift_release=v3.5.0
openshift_logging_install_logging=true

openshift_logging_kibana_hostname=kibana.$sub-domain
public_master_url=https://$master-public-dns:8443

openshift_logging_image_prefix=registry.ops.openshift.com/openshift3/
openshift_logging_image_version=3.5.0

openshift_logging_namespace=xiazhao

2. Running the playbook from a control machine (my laptop) which is not oo_master:
git clone https://github.com/openshift/openshift-ansible
ansible-playbook -vvv -i ~/inventory   playbooks/common/openshift-cluster/openshift_logging.yml

Actual results:
failed at TASK [openshift_logging : create JKS generation pod]

Expected results:
Should complete successfully

Additional info:
Full ansible log attached

Comment 1 Jeff Cantrill 2017-01-20 13:24:20 UTC
WIP PR to fix: https://github.com/openshift/openshift-ansible/pull/3135

Comment 2 Jeff Cantrill 2017-01-20 13:25:38 UTC
I believe this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1415056   Same root cause

Comment 3 Jeff Cantrill 2017-01-20 21:31:03 UTC
*** Bug 1415056 has been marked as a duplicate of this bug. ***

Comment 4 ewolinet 2017-01-23 22:30:27 UTC
Above PR has been merged in

Comment 5 Junqi Zhao 2017-01-24 07:02:12 UTC
Tested according to xiazhao's step from my local desktop, error "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!" throws out. and jks-cert pod can't be generated.

libselinux-python package already installed on both desktop and master.

<localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1485237195.37-77603840609488/ > /dev/null 2>&1 && sleep 0'
fatal: [ec2-52-204-85-177.compute-1.amazonaws.com -> localhost]: FAILED! => {
    "changed": true, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "backup": false, 
            "content": null, 
            "delimiter": null, 
            "dest": "/tmp/openshift-logging-ansible-QTRjW5/signing.conf", 
            "directory_mode": null, 
            "follow": true, 
            "force": true, 
            "group": null, 
            "mode": null, 
            "original_basename": "signing.conf.j2", 
            "owner": null, 
            "regexp": null, 
            "remote_src": null, 
            "selevel": null, 
            "serole": null, 
            "setype": null, 
            "seuser": null, 
            "src": "/root/.ansible/tmp/ansible-tmp-1485237195.37-77603840609488/source", 
            "unsafe_writes": null, 
            "validate": null
        }
    }, 
    "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"
}
	to retry, use: --limit @/home/fedora/openshift-ansible/playbooks/common/openshift-cluster/openshift_logging.retry


attached ansible log

Comment 6 Junqi Zhao 2017-01-24 07:04:05 UTC
Created attachment 1243830 [details]
ansible log -20170124

Comment 7 Jeff Cantrill 2017-01-24 13:19:45 UTC
Isn't the proper solution to install libselinux-python per the instructions for using ansible? http://docs.ansible.com/ansible/intro_installation.html#managed-node-requirements

Comment 8 Junqi Zhao 2017-01-25 06:11:22 UTC
checked libselinux-python on the master
$ rpm -qa | grep libselinux-python
libselinux-python-2.5-6.el7.x86_64

checked libselinux-python on my desktop
$ rpm -qa | grep libselinux-python
libselinux-python3-2.5-3.fc24.x86_64



libselinux-python and libselinux-python3 are different, so install libselinux-python on my desktop, and run the ansible script again, error "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!" don't throw out.

Comment 9 Junqi Zhao 2017-01-25 06:18:51 UTC
run the ansible again after installing libselinux-python on my desktop, and can find system.admin.jks.

This bug is fixed, although there are other bugs need to be filed.
Set it to VERIFIED and close it.

# ls -al /etc/origin/logging/
total 140
drwxr-xr-x. 2 root root 4096 Jan 25 00:12 .
drwx------. 7 root root 4096 Jan 24 21:57 ..
-rw-r--r--. 1 root root 1196 Jan 24 21:59 02.pem
-rw-r--r--. 1 root root 1196 Jan 24 21:59 03.pem
-rw-r--r--. 1 root root 1196 Jan 24 22:00 04.pem
-rw-r--r--. 1 root root 1184 Jan 24 22:00 05.pem
-rw-r--r--. 1 root root 1050 Jan 24 21:57 ca.crt
-rw-r--r--. 1 root root    0 Jan 24 21:59 ca.crt.srl
-rw-r--r--. 1 root root  301 Jan 24 22:00 ca.db
-rw-r--r--. 1 root root   20 Jan 24 22:00 ca.db.attr
-rw-r--r--. 1 root root   20 Jan 24 22:00 ca.db.attr.old
-rw-r--r--. 1 root root  233 Jan 24 22:00 ca.db.old
-rw-------. 1 root root 1675 Jan 24 21:57 ca.key
-rw-r--r--. 1 root root    3 Jan 24 22:00 ca.serial.txt
-rw-r--r--. 1 root root    3 Jan 24 22:00 ca.serial.txt.old
-rw-r--r--. 1 root root 3768 Jan 25 00:12 elasticsearch.jks
-rw-r--r--. 1 root root 2242 Jan 24 21:58 kibana-internal.crt
-rw-------. 1 root root 1679 Jan 24 21:58 kibana-internal.key
-rw-r--r--. 1 root root 3979 Jan 25 00:12 logging-es.jks
-rw-r--r--. 1 root root  321 Jan 24 21:58 server-tls.json
-rw-r--r--. 1 root root 4263 Jan 24 21:57 signing.conf
-rw-r--r--. 1 root root 1184 Jan 24 22:00 system.admin.crt
-rw-r--r--. 1 root root  948 Jan 24 22:00 system.admin.csr
-rw-r--r--. 1 root root 3701 Jan 25 00:12 system.admin.jks
-rw-r--r--. 1 root root 1704 Jan 24 22:00 system.admin.key
-rw-r--r--. 1 root root 1196 Jan 24 22:00 system.logging.curator.crt
-rw-r--r--. 1 root root  960 Jan 24 22:00 system.logging.curator.csr
-rw-r--r--. 1 root root 1704 Jan 24 22:00 system.logging.curator.key
-rw-r--r--. 1 root root 1196 Jan 24 21:59 system.logging.fluentd.crt
-rw-r--r--. 1 root root  960 Jan 24 21:59 system.logging.fluentd.csr
-rw-r--r--. 1 root root 1704 Jan 24 21:59 system.logging.fluentd.key
-rw-r--r--. 1 root root 1196 Jan 24 21:59 system.logging.kibana.crt
-rw-r--r--. 1 root root  960 Jan 24 21:59 system.logging.kibana.csr
-rw-r--r--. 1 root root 1708 Jan 24 21:59 system.logging.kibana.key
-rw-r--r--. 1 root root  797 Jan 25 00:12 truststore.jks

Comment 10 Junqi Zhao 2017-01-25 06:22:28 UTC
Created attachment 1244163 [details]
system.admin.jks is under /etc/origin/logging

Comment 12 errata-xmlrpc 2017-10-25 13:00:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049


Note You need to log in before you can comment on or make changes to this bug.