Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1540147

Summary: Elastic search pod did not get up in 10 minutes
Product: OpenShift Container Platform Reporter: Pavol Brilla <pbrilla>
Component: LoggingAssignee: Rich Megginson <rmeggins>
Status: CLOSED NOTABUG QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: aos-bugs, lsvaty, nhosoi, nkinder, pbrilla, rmeggins, sradco
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-14 12:22:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1510988    
Attachments:
Description Flags
inventory
none
vars.yaml
none
Dump with 770 on directory none

Description Pavol Brilla 2018-01-30 11:16:07 UTC
Description of problem:
Using documentation for ovirt-metrics, 

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Follow steps in documentation and at 1.12. try to restart elasticsearch pod
2. See error: replication controller "logging-es-data-master-sowm2972-2" has failed progressing



Actual results:
# oc logs  $( oc get -n logging dc -l component=es -o name )
--> Scaling logging-es-data-master-sowm2972-3 to 1
--> Waiting up to 10m0s for pods in rc logging-es-data-master-sowm2972-3 to become ready
error: update acceptor rejected logging-es-data-master-sowm2972-3: pods for rc "logging-es-data-master-sowm2972-3" took longer than 600 seconds to become ready

Expected results:
pod should be started

Additional info:
Document will be provided in private comment as it is not yet publicly published

Comment 2 Pavol Brilla 2018-01-30 11:20:19 UTC
Machine stats:  16G RAM, 12 cores, 
Disk size: 150G

Comment 3 Pavol Brilla 2018-01-30 11:21:44 UTC
20G defined for machine, 16G guaranteed by engine

Comment 4 Shirly Radco 2018-01-30 13:51:13 UTC
Please test on a clean machine. I believe this might be specific to the environment and not really 100% reproducible.

It still needs to be resolved, but I want to make sure its not a blocker to the release.

Comment 5 Pavol Brilla 2018-01-30 14:01:08 UTC
Machine for this was cleanly installed this morning only for purpose to test docs.

Clean 7.4 RHEL from PXE

Comment 6 Shirly Radco 2018-01-30 14:06:46 UTC
What is the resource consumption of the machine? cpu, memory.

Comment 9 Lukas Svaty 2018-01-31 12:45:39 UTC
I was able to reproduce this in automation, If you would like to take a look at steps used take a look here:
https://github.com/StLuke/ovirt-metrics-store/blob/master/playbooks/viaq-store.yml

Comment 10 Rich Megginson 2018-01-31 14:50:08 UTC
not sure what's going on - logging-dump produced no es log info - es describe output isn't very useful

Nathan/Noriko - can one of you try to reproduce?

Comment 11 Noriko Hosoi 2018-01-31 16:49:30 UTC
(In reply to Rich Megginson from comment #10)
> not sure what's going on - logging-dump produced no es log info - es
> describe output isn't very useful
> 
> Nathan/Noriko - can one of you try to reproduce?

I'm having a difficulty to set up the 3.6 environment. :(

In the meantime, could it be possible for us to allow to access one of the failed system?

I'm interested in the ansible log from the previous section 1.11. Running Ansible /tmp/ansible.log and the pods' status and events.
  oc get pods
  oc get events

Comment 12 Noriko Hosoi 2018-02-01 00:18:17 UTC
(In reply to Lukas Svaty from comment #9)
> I was able to reproduce this in automation, If you would like to take a look
> at steps used take a look here:
> https://github.com/StLuke/ovirt-metrics-store/blob/master/playbooks/viaq-
> store.yml

The steps look good to me.  Can we also see the inventory file and vars.yml file?
https://github.com/StLuke/ovirt-metrics-store/blob/master/playbooks/viaq-store.yml#L125

Comment 13 Lukas Svaty 2018-02-01 07:03:19 UTC
Created attachment 1389363 [details]
inventory

Comment 14 Lukas Svaty 2018-02-01 07:03:47 UTC
Created attachment 1389364 [details]
vars.yaml

Comment 15 Lukas Svaty 2018-02-01 07:05:49 UTC
Also, my setup was done on VM with lower specs (8GB machine) so this might have affected the deployment of es pod. I am able to successfully run the playbooks now, on a bigger machine (however on CentOS vs origin).

Comment 16 Pavol Brilla 2018-02-01 08:57:10 UTC
Original machine was discarded by lsvaty test on CentOS

Comment 17 Rich Megginson 2018-02-01 15:50:28 UTC
(In reply to Lukas Svaty from comment #15)
> Also, my setup was done on VM with lower specs (8GB machine) so this might
> have affected the deployment of es pod. I am able to successfully run the
> playbooks now, on a bigger machine (however on CentOS vs origin).

So is this still a bug?  CLOSED NOTABUG?

Comment 18 Lukas Svaty 2018-02-02 08:04:54 UTC
I'll try to reproduce with my playbooks, if not pbrilla can reproduce manually as last stand. If we won't be we'll close the bug with INSUFFICIENT_DATA

Comment 19 Lukas Svaty 2018-02-05 15:38:12 UTC
was able to reproduce this with the playbook mentioned and vars taken from comment#1 still relevant

Comment 20 Noriko Hosoi 2018-02-05 18:08:02 UTC
(In reply to Lukas Svaty from comment #19)
> was able to reproduce this with the playbook mentioned and vars taken from
> comment#1 still relevant

Thanks for retrying the test, Lukas.

Is the test env the same as in #c2 and #c3?
> Machine stats:  16G RAM, 12 cores, 
> Disk size: 150G

> 20G defined for machine, 16G guaranteed by engine

And there is not log from the es pod again if you run logging-dump.sh as suggested in #c7?  How about "os get events", "os get events"?  Thanks.

Comment 22 Pavol Brilla 2018-02-07 12:44:10 UTC
Created attachment 1392653 [details]
Dump with 770 on directory

We changed against document:
chmod 0770 /var/lib/elastricsearch

still same result, attaching dump

Comment 23 Pavol Brilla 2018-02-08 12:25:19 UTC
ok, host reprovisioned, ES pod is up

changes against documentation 

Directory for persistent storage:
chmod -R g+wx /var/lib/elasticsearch

User received one extra scc:
oadm policy add-scc-to-user hostaccess system:serviceaccount:logging:aggregated-logging-elasticsearch​

after those 2 changes pod started flawlessly

Comment 24 Pavol Brilla 2018-02-14 12:22:40 UTC
OK closing this bug, after discussion

SCC is not needed

and directory privilegies are correct for persistent storage