Bug 1380663

Summary: How can we use CEPH as a persistant storage for Elasticsearch ?
Product: OpenShift Container Platform Reporter: Miheer Salunke <misalunk>
Component: StorageAssignee: hchen
Status: CLOSED WORKSFORME QA Contact: Jianwei Hou <jhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.1CC: aos-bugs, ewolinet, jcantril, juzhao, lmeyer, pweil, rmeggins
Target Milestone: ---Keywords: UpcomingRelease
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-15 16:30:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
log for elasticsearch dc, pod, error info and deploy logging stack steps
none
log for elasticsearch pod info after attaching rbd server, and debug pod info none

Description Miheer Salunke 2016-09-30 09:42:06 UTC
Description of problem:

We want to use CEPH as a persistent storage for Elasticsearch in Openshift aggregated-logging .Is it supported ? how we can do this ? It is a recommended option

I think it shall be achievable via following links-


https://docs.openshift.com/enterprise/3.2/install_conf
ig/aggregate_logging.html#aggregated-elasticsearch

https://docs.openshift.com/enterprise/3.2/install_conf
ig/persistent_storage/persistent_storage_ceph_rbd.html

But I still need to confirm this from the engineering.

It seems Elasticsearch recommends using local disk storage.

Mail thread- 

http://post-office.corp.redhat.com/archives/openshift-sme/2016-September/msg00286.html

However, if the customer really wants to use ceph, we will have to figure out a way to support it.

In this case the customer is interested in using CEPH as  storage for Elasticsearch.

Version-Release number of selected component (if applicable):
Openshift Enterprise 3.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Luke Meyer 2016-10-12 13:55:53 UTC
https://docs.openshift.com/enterprise/3.2/install_config/persistent_storage/persistent_storage_ceph_rbd.html indicates how to set up a PV; then from logging you need to set a volume as indicated in the aggregated logging docs, for example:

    oc set volume dc/logging-es-<unique> \
          --add --overwrite --name=elasticsearch-storage \
          --type=persistentVolumeClaim --claim-name=logging-es-1

Ceph RBD is block storage as I understand it so it should be supported.

Comment 4 Junqi Zhao 2016-10-31 03:07:38 UTC
Verified with OSE 3.2 latest puddle, Ceph rbd can be used as a persistant storage for Elasticsearch.

Verify Steps:

1. Deploy logging EFK stack.
2. Deploy Ceph rbd, you can refer to https://github.com/openshift-qe/docker-rbd/
or https://docs.openshift.com/enterprise/3.2/install_config/persistent_storage/persistent_storage_ceph_rbd.html
3. Use Ceph rbd as a persistant storage for Elasticsearch

Comment 5 Xia Zhao 2016-10-31 09:33:18 UTC
@juzhao

Please retest with the real check points:

After attaching Ceph rbd as a persistant storage for Elasticsearch, make sure es pod was running fine by doing 'oc get po' instead of 'oc get dc':

  volumes:
  - name: elasticsearch
    secret:
      secretName: logging-elasticsearch
  - name: elasticsearch-storage
    persistentVolumeClaim:
      claimName: logging-es-1

And also make sure logging data exist on pod storage.

It will be good also visit Kibana UI and make sure es can be successfully connected/communicated in this state.

Comment 6 Junqi Zhao 2016-11-03 10:31:29 UTC
Created attachment 1216911 [details]
log for elasticsearch dc, pod, error info and deploy logging stack steps

Test on AWS OSE 3.2 latest puddle with 3.2.1 image,set rbd server then deploy logging stack, after attaching rbd storage as a PersistentVolumeClaim, elasticsearch pod failed to start up, AccessDeniedException error shows, although added "oadm policy add-scc-to-user privileged system:serviceaccount:logging:aggregated-logging-elasticsearch", this error still happened,see the error trace, the attached logging.tar contains elasticsearch dc/pod output info,error info and deploy logging stack steps:
+ mkdir -p /elasticsearch/logging-es
+ ln -s /etc/elasticsearch/keys/searchguard.key /elasticsearch/logging-es/searchguard_node_key.key
+ regex='^([[:digit:]]+)([GgMm])$'
+ [[ 1G =~ ^([[:digit:]]+)([GgMm])$ ]]
+ num=1
+ unit=G
+ [[ G =~ [Gg] ]]
+ (( num = num * 1024 ))
+ [[ 1024 -lt 512 ]]
+ ES_JAVA_OPTS='-Des.path.home=/usr/share/elasticsearch -Des.config=/etc/elasticsearch/elasticsearch.yml -Xms256M -Xmx512m'
+ /usr/share/elasticsearch/bin/elasticsearch
[2016-11-03 05:49:10,165][INFO ][node                     ] [Alchemy] version[1.5.2], pid[8], build[d761af4/2015-07-24T22:21:43Z]
[2016-11-03 05:49:10,166][INFO ][node                     ] [Alchemy] initializing ...
[2016-11-03 05:49:11,619][INFO ][plugins                  ] [Alchemy] loaded [searchguard, openshift-elasticsearch-plugin, cloud-kubernetes], sites []
{1.5.2}: Initialization Failed ...
- ElasticsearchIllegalStateException[Failed to created node environment]
AccessDeniedException[/elasticsearch/persistent/logging-es]

Comment 7 Junqi Zhao 2016-11-03 10:33:38 UTC
Assign back, error happened when using CEPH as a persistant storage for Elasticsearch, more info please see Comment 6

Comment 8 ewolinet 2016-11-03 13:40:31 UTC
What are the permissions for the CEPH storage that you mounted for ES? Can you confirm that ES would be able to create files and write on the mounted volume?

Comment 9 Junqi Zhao 2016-11-04 01:32:01 UTC
(In reply to ewolinet from comment #8)
> What are the permissions for the CEPH storage that you mounted for ES? Can
> you confirm that ES would be able to create files and write on the mounted
> volume?

I set privileged permissions for the CEPH storage, since the AccessDeniedException error, EC can not create files and write on the mounted volume.

Which aspect you think I should check again, then I can test again based on your suggestions.

Regards

Comment 10 Luke Meyer 2016-11-04 15:22:28 UTC
@juzhao I think comment #8 was asking about file system permissions for the storage that gets mounted with the Ceph method. Once the storage is mounted and writeable by the pod user, ES shouldn't have problems getting started. Can you try something like the following:

1. scale down the ES DC
   $ oc scale --replicas=0 dc/logging-es-<unique>
2. create a debug pod
   $ oc debug dc/logging-es-<unique>
3. Determine that the volume is mounted and its file perms
   sh-4.2$ df -h /elasticsearch/persistent/
   sh-4.2$ ls -ldZ /elasticsearch/persistent/
   sh-4.2$ touch /elasticsearch/persistent/foo

What we're trying to do here is figure out if it is writeable by the pod user. If it's not writeable, that's a storage configuration problem. If it is, then we need to understand what's special about ES's access that prevents it from accessing this storage.

Comment 11 Junqi Zhao 2016-11-08 07:36:42 UTC
@ewolinet
ES pod has write permissions now and can use CEPH as a persistant storage for Elasticsearch now, but I want to write down how I debug this issue and want to ask one question which puzzled me.

I tried with your suggestions in your comment. AccessDeniedException still happened, ES pod don't have write permissions.

After consulting with storage team, they suggested me to use "securityContext:
privileged: true" under ES dc's containers node, I checked the value, it's default value is true. I debugged the dc according to your comments, AccessDeniedException still happened. Then I tried to change the value to false, although AccessDeniedException happened again after debugging with your comment 10, ES pod had the write permission and can start up correctly and write logs down.

So I am puzzled why I must to change the privileged value to false, then the ES pod can have the write permission.

Comment 12 ewolinet 2016-11-08 17:02:10 UTC
Junqi,

I looked at the logs that you posted above and I saw that the security context in elasticsearch_pods was  "securityContext: {}". So it seems that the oc patch you used didn't apply the expected change. I actually see "securityContext" in two different places in your elasticsearch_pods file, same for the elasticsearch_dc.txt file.

Do you have the output for https://bugzilla.redhat.com/show_bug.cgi?id=1380663#c10 ?

I'm a little surprised that we would need to be running as a privileged pod to use CEPH, we currently only suggest that the pod would run privileged to have a host mount.

Comment 13 Junqi Zhao 2016-11-09 11:17:31 UTC
Created attachment 1218905 [details]
log for elasticsearch pod info after attaching rbd server, and debug pod info

Comment 14 Junqi Zhao 2016-11-09 11:25:10 UTC
@ewolinet
see my attached file, timestamp: 2016-11-09 06:17 EST
es_pod_after_attaching_rbd_securitycontext_true.txt is es pod info after attaching rbd pvc, but privileged value is true, es_pod_after_attaching_rbd_securitycontext_false.txt is es pod info after attaching rbd pvc, but privileged value is false,  we can see the securityContext part is different in these two files.

es_debug_pod.txt is the output for https://bugzilla.redhat.com/show_bug.cgi?id=1380663#c10.

Comment 16 hchen 2016-11-14 16:18:28 UTC
@Junqi: have you tried using just selinux label in securityContext, with privileged=false?

Comment 17 Junqi Zhao 2016-11-15 06:36:41 UTC
@Huamin,

I did't tried just selinux label in securityContext with privileged=false
since we didn't have seLinux label in dc

Comment 18 hchen 2016-11-15 16:30:20 UTC
We have been testing ceph rbd volumes without having to be privileged. But selinux is required in this case. You can use seLinux label Level: "s0:c0,c1" and skip privileged: true.

I am closing this BZ.

Comment 19 Luke Meyer 2016-11-15 19:29:35 UTC
@hchen is that documented somewhere? I would not expect to have to fiddle with selinux labeling just to use some storage, so we need to make sure this is called out somewhere.