Bug 1419244

Summary: elasticsearch pod can't start up successfully
Product: OpenShift Container Platform Reporter: Xia Zhao <xiazhao>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.0CC: aos-bugs, juzhao, tdawson
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 19:11:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xia Zhao 2017-02-04 08:03:02 UTC
Description of problem:
Deploy logging 3.5.0 stacks (by using deployer), es pod is in error status since failed to load logging configuration at /usr/share/java/elasticsearch/config:

$ oc get po
NAME                          READY     STATUS      RESTARTS   AGE
logging-curator-1-zlk9n       1/1       Running     0          2m
logging-deployer-bn4x8        0/1       Completed   0          3m
logging-es-qom358j5-1-g8ffj   0/1       Error       4          2m
logging-fluentd-x5ng3         1/1       Running     0          2m
logging-kibana-1-p8vbp        2/2       Running     0          2m

$ oc logs -f logging-es-qom358j5-1-g8ffj
Comparing the specificed RAM to the maximum recommended for ElasticSearch...
Inspecting the maximum RAM available...
ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms128M -Xmx512m'
Exception in thread "main" ElasticsearchException[Failed to load logging configuration]; nested: NoSuchFileException[/usr/share/java/elasticsearch/config];
Likely root cause: java.nio.file.NoSuchFileException: /usr/share/java/elasticsearch/config
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
    at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
    at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
    at java.nio.file.Files.readAttributes(Files.java:1737)
    at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:225)
    at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
    at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
    at java.nio.file.Files.walkFileTree(Files.java:2662)
    at org.elasticsearch.common.logging.log4j.LogConfigurator.resolveConfig(LogConfigurator.java:142)
    at org.elasticsearch.common.logging.log4j.LogConfigurator.configure(LogConfigurator.java:103)
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:259)
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
Refer to the log for complete error details.
Checking if Elasticsearch is ready on https://localhost:9200 .

Version-Release number of selected component (if applicable):
ops registry:
openshift3/logging-deployer    2162a2197767
openshift3/logging-kibana    e0ab09c2cbeb
openshift3/logging-fluentd    47057624ecab
openshift3/logging-auth-proxy    139f7943475e
openshift3/logging-elasticsearch    7015704dc0f8
openshift3/logging-curator    7f034fdf7702

# openshift version
openshift v3.5.0.16+a26133a
kubernetes v1.5.2+43a9be4
etcd 3.1.0

How reproducible:
Always

Steps to Reproduce:
1.Deploy logging 3.5.0 stacks

Actual results:
es pod is in error status

Expected results:
EFK pods should all be in running status

Additional info:

Comment 2 Xia Zhao 2017-02-07 03:09:57 UTC
Tested with the latest es image on ops registry, es pod still can't start up successfully:

$ oc get po
NAME                          READY     STATUS             RESTARTS   AGE
logging-curator-1-3hcf0       1/1       Running            0          3m
logging-deployer-r4wlg        0/1       Completed          0          4m
logging-es-5ue3hokj-1-x468d   0/1       CrashLoopBackOff   4          3m
logging-fluentd-qcv5b         1/1       Running            0          3m
logging-kibana-1-80tt8        2/2       Running            0          3m

$ oc logs -f logging-es-5ue3hokj-1-x468d
Comparing the specificed RAM to the maximum recommended for ElasticSearch...
Inspecting the maximum RAM available...
ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms128M -Xmx512m'
/opt/app-root/src/run.sh: line 141: /usr/share/elasticsearch/bin/elasticsearch: No such file or directory

Images tested with:
openshift3/logging-elasticsearch   3.5.0               eed2ca51f2ba        5 hours ago         399.2 MB

#openshift version
openshift v3.5.0.17+c55cf2b
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Comment 3 Xia Zhao 2017-02-07 05:18:06 UTC
The error message in comment #2 also reproduced when deploying logging with ansible scripts.

Comment 5 Xia Zhao 2017-02-08 03:36:49 UTC
Fixed with the latest image. Tested with

openshift3/logging-elasticsearch   3.5.0               7605f043d232        12 hours ago        399.2 MB

ES pod can be running after logging deployment. Set to verified.

Comment 6 Junqi Zhao 2017-02-15 03:51:52 UTC
Use deployer pod to deploy logging stacks, elasticsearch pod can't start up successfully now, this is regression issue happen again, so re-open this defect.

# oc get po
NAME                              READY     STATUS             RESTARTS   AGE
logging-curator-1-bxfrr           1/1       Running            1          6m
logging-curator-ops-1-jqn9j       1/1       Running            1          6m
logging-deployer-z54br            0/1       Completed          0          6m
logging-es-l52au538-1-z8sxw       0/1       CrashLoopBackOff   5          6m
logging-es-ops-4td0ohfw-1-85r2q   0/1       CrashLoopBackOff   5          4m
logging-fluentd-c2n0f             1/1       Running            0          6m
logging-kibana-1-57ghr            2/2       Running            0          5m
logging-kibana-ops-1-1d2q3        2/2       Running            0          5m

# oc logs logging-es-l52au538-1-z8sxw
Comparing the specificed RAM to the maximum recommended for ElasticSearch...
Inspecting the maximum RAM available...
ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms128M -Xmx512m'
Exception in thread "main" ElasticsearchException[Failed to load logging configuration]; nested: NoSuchFileException[/usr/share/java/elasticsearch/config];
Likely root cause: java.nio.file.NoSuchFileException: /usr/share/java/elasticsearch/config
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
	at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
	at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
	at java.nio.file.Files.readAttributes(Files.java:1737)
	at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:225)
	at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
	at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
	at java.nio.file.Files.walkFileTree(Files.java:2662)
	at org.elasticsearch.common.logging.log4j.LogConfigurator.resolveConfig(LogConfigurator.java:142)
	at org.elasticsearch.common.logging.log4j.LogConfigurator.configure(LogConfigurator.java:103)
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:259)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)
Refer to the log for complete error details.
Checking if Elasticsearch is ready on https://localhost:9200 .[root@ip-172-18-6-101 ~]# oc logs logging-es-ops-4td0ohfw-1-85r2q
Comparing the specificed RAM to the maximum recommended for ElasticSearch...
Inspecting the maximum RAM available...
ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms128M -Xmx4096m'
Exception in thread "main" ElasticsearchException[Failed to load logging configuration]; nested: NoSuchFileException[/usr/share/java/elasticsearch/config];
Likely root cause: java.nio.file.NoSuchFileException: /usr/share/java/elasticsearch/config
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
	at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
	at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
	at java.nio.file.Files.readAttributes(Files.java:1737)
	at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:225)
	at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
	at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
	at java.nio.file.Files.walkFileTree(Files.java:2662)
	at org.elasticsearch.common.logging.log4j.LogConfigurator.resolveConfig(LogConfigurator.java:142)
	at org.elasticsearch.common.logging.log4j.LogConfigurator.configure(LogConfigurator.java:103)
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:259)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)
Refer to the log for complete error details.

Image id:
openshift3/logging-deployer    11d8fc04974e
openshift3/logging-elasticsearch    d715f4d34ad4
openshift3/logging-kibana    e0ab09c2cbeb
openshift3/logging-fluentd    47057624ecab
openshift3/logging-auth-proxy    139f7943475e
openshift3/logging-curator    7f034fdf7702

Comment 8 Junqi Zhao 2017-02-21 03:52:03 UTC
Although java.net.ConnectException still exists in ES and ES-OPS pod log(see https://bugzilla.redhat.com/show_bug.cgi?id=1420217#c13), ES and ES-OPS pod can be start up successfully now.

# oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-m299t           1/1       Running     0          1h
logging-curator-ops-1-f9s0z       1/1       Running     0          1h
logging-deployer-k85f4            0/1       Completed   0          1h
logging-es-hh83q08r-1-j12wz       1/1       Running     0          1h
logging-es-ops-owlp8vht-1-kh4f9   1/1       Running     0          1h
logging-fluentd-rhl6s             1/1       Running     0          1h
logging-kibana-1-znn8r            2/2       Running     0          1h
logging-kibana-ops-1-54ck2        2/2       Running     0          1h

Image ID:
openshift3/logging-deployer    db6383c1a6d6
openshift3/logging-elasticsearch    d715f4d34ad4
openshift3/logging-kibana    e0ab09c2cbeb
openshift3/logging-fluentd    47057624ecab
openshift3/logging-auth-proxy    139f7943475e
openshift3/logging-curator    7f034fdf7702

Comment 10 errata-xmlrpc 2017-04-12 19:11:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884