Description of problem: Using documentation for ovirt-metrics, Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Follow steps in documentation and at 1.12. try to restart elasticsearch pod 2. See error: replication controller "logging-es-data-master-sowm2972-2" has failed progressing Actual results: # oc logs $( oc get -n logging dc -l component=es -o name ) --> Scaling logging-es-data-master-sowm2972-3 to 1 --> Waiting up to 10m0s for pods in rc logging-es-data-master-sowm2972-3 to become ready error: update acceptor rejected logging-es-data-master-sowm2972-3: pods for rc "logging-es-data-master-sowm2972-3" took longer than 600 seconds to become ready Expected results: pod should be started Additional info: Document will be provided in private comment as it is not yet publicly published
Machine stats: 16G RAM, 12 cores, Disk size: 150G
20G defined for machine, 16G guaranteed by engine
Please test on a clean machine. I believe this might be specific to the environment and not really 100% reproducible. It still needs to be resolved, but I want to make sure its not a blocker to the release.
Machine for this was cleanly installed this morning only for purpose to test docs. Clean 7.4 RHEL from PXE
What is the resource consumption of the machine? cpu, memory.
Please run logging-dump.sh: https://github.com/openshift/origin-aggregated-logging/blob/master/hack/README-dump.md https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh and attach the output to this bz.
I was able to reproduce this in automation, If you would like to take a look at steps used take a look here: https://github.com/StLuke/ovirt-metrics-store/blob/master/playbooks/viaq-store.yml
not sure what's going on - logging-dump produced no es log info - es describe output isn't very useful Nathan/Noriko - can one of you try to reproduce?
(In reply to Rich Megginson from comment #10) > not sure what's going on - logging-dump produced no es log info - es > describe output isn't very useful > > Nathan/Noriko - can one of you try to reproduce? I'm having a difficulty to set up the 3.6 environment. :( In the meantime, could it be possible for us to allow to access one of the failed system? I'm interested in the ansible log from the previous section 1.11. Running Ansible /tmp/ansible.log and the pods' status and events. oc get pods oc get events
(In reply to Lukas Svaty from comment #9) > I was able to reproduce this in automation, If you would like to take a look > at steps used take a look here: > https://github.com/StLuke/ovirt-metrics-store/blob/master/playbooks/viaq- > store.yml The steps look good to me. Can we also see the inventory file and vars.yml file? https://github.com/StLuke/ovirt-metrics-store/blob/master/playbooks/viaq-store.yml#L125
Created attachment 1389363 [details] inventory
Created attachment 1389364 [details] vars.yaml
Also, my setup was done on VM with lower specs (8GB machine) so this might have affected the deployment of es pod. I am able to successfully run the playbooks now, on a bigger machine (however on CentOS vs origin).
Original machine was discarded by lsvaty test on CentOS
(In reply to Lukas Svaty from comment #15) > Also, my setup was done on VM with lower specs (8GB machine) so this might > have affected the deployment of es pod. I am able to successfully run the > playbooks now, on a bigger machine (however on CentOS vs origin). So is this still a bug? CLOSED NOTABUG?
I'll try to reproduce with my playbooks, if not pbrilla can reproduce manually as last stand. If we won't be we'll close the bug with INSUFFICIENT_DATA
was able to reproduce this with the playbook mentioned and vars taken from comment#1 still relevant
(In reply to Lukas Svaty from comment #19) > was able to reproduce this with the playbook mentioned and vars taken from > comment#1 still relevant Thanks for retrying the test, Lukas. Is the test env the same as in #c2 and #c3? > Machine stats: 16G RAM, 12 cores, > Disk size: 150G > 20G defined for machine, 16G guaranteed by engine And there is not log from the es pod again if you run logging-dump.sh as suggested in #c7? How about "os get events", "os get events"? Thanks.
Created attachment 1392653 [details] Dump with 770 on directory We changed against document: chmod 0770 /var/lib/elastricsearch still same result, attaching dump
ok, host reprovisioned, ES pod is up changes against documentation Directory for persistent storage: chmod -R g+wx /var/lib/elasticsearch User received one extra scc: oadm policy add-scc-to-user hostaccess system:serviceaccount:logging:aggregated-logging-elasticsearch after those 2 changes pod started flawlessly
OK closing this bug, after discussion SCC is not needed and directory privilegies are correct for persistent storage