Bug 1540147
| Summary: | Elastic search pod did not get up in 10 minutes | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Pavol Brilla <pbrilla> | ||||||||
| Component: | Logging | Assignee: | Rich Megginson <rmeggins> | ||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Anping Li <anli> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 3.6.0 | CC: | aos-bugs, lsvaty, nhosoi, nkinder, pbrilla, rmeggins, sradco | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | 3.6.z | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2018-02-14 12:22:40 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1510988 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Pavol Brilla
2018-01-30 11:16:07 UTC
Machine stats: 16G RAM, 12 cores, Disk size: 150G 20G defined for machine, 16G guaranteed by engine Please test on a clean machine. I believe this might be specific to the environment and not really 100% reproducible. It still needs to be resolved, but I want to make sure its not a blocker to the release. Machine for this was cleanly installed this morning only for purpose to test docs. Clean 7.4 RHEL from PXE What is the resource consumption of the machine? cpu, memory. Please run logging-dump.sh: https://github.com/openshift/origin-aggregated-logging/blob/master/hack/README-dump.md https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh and attach the output to this bz. I was able to reproduce this in automation, If you would like to take a look at steps used take a look here: https://github.com/StLuke/ovirt-metrics-store/blob/master/playbooks/viaq-store.yml not sure what's going on - logging-dump produced no es log info - es describe output isn't very useful Nathan/Noriko - can one of you try to reproduce? (In reply to Rich Megginson from comment #10) > not sure what's going on - logging-dump produced no es log info - es > describe output isn't very useful > > Nathan/Noriko - can one of you try to reproduce? I'm having a difficulty to set up the 3.6 environment. :( In the meantime, could it be possible for us to allow to access one of the failed system? I'm interested in the ansible log from the previous section 1.11. Running Ansible /tmp/ansible.log and the pods' status and events. oc get pods oc get events (In reply to Lukas Svaty from comment #9) > I was able to reproduce this in automation, If you would like to take a look > at steps used take a look here: > https://github.com/StLuke/ovirt-metrics-store/blob/master/playbooks/viaq- > store.yml The steps look good to me. Can we also see the inventory file and vars.yml file? https://github.com/StLuke/ovirt-metrics-store/blob/master/playbooks/viaq-store.yml#L125 Created attachment 1389363 [details]
inventory
Created attachment 1389364 [details]
vars.yaml
Also, my setup was done on VM with lower specs (8GB machine) so this might have affected the deployment of es pod. I am able to successfully run the playbooks now, on a bigger machine (however on CentOS vs origin). Original machine was discarded by lsvaty test on CentOS (In reply to Lukas Svaty from comment #15) > Also, my setup was done on VM with lower specs (8GB machine) so this might > have affected the deployment of es pod. I am able to successfully run the > playbooks now, on a bigger machine (however on CentOS vs origin). So is this still a bug? CLOSED NOTABUG? I'll try to reproduce with my playbooks, if not pbrilla can reproduce manually as last stand. If we won't be we'll close the bug with INSUFFICIENT_DATA was able to reproduce this with the playbook mentioned and vars taken from comment#1 still relevant (In reply to Lukas Svaty from comment #19) > was able to reproduce this with the playbook mentioned and vars taken from > comment#1 still relevant Thanks for retrying the test, Lukas. Is the test env the same as in #c2 and #c3? > Machine stats: 16G RAM, 12 cores, > Disk size: 150G > 20G defined for machine, 16G guaranteed by engine And there is not log from the es pod again if you run logging-dump.sh as suggested in #c7? How about "os get events", "os get events"? Thanks. Created attachment 1392653 [details]
Dump with 770 on directory
We changed against document:
chmod 0770 /var/lib/elastricsearch
still same result, attaching dump
ok, host reprovisioned, ES pod is up changes against documentation Directory for persistent storage: chmod -R g+wx /var/lib/elasticsearch User received one extra scc: oadm policy add-scc-to-user hostaccess system:serviceaccount:logging:aggregated-logging-elasticsearch​ after those 2 changes pod started flawlessly OK closing this bug, after discussion SCC is not needed and directory privilegies are correct for persistent storage |