Bug 1690995 - Metrics store Elasticsearch pod has status Error
Summary: Metrics store Elasticsearch pod has status Error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine-metrics
Classification: oVirt
Component: Generic
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.3.3
: ---
Assignee: Shirly Radco
QA Contact: Ivana Saranova
URL:
Whiteboard:
Depends On:
Blocks: 1631193
TreeView+ depends on / blocked
 
Reported: 2019-03-20 15:52 UTC by Ivana Saranova
Modified: 2019-04-16 13:58 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-04-16 13:58:32 UTC
oVirt Team: Metrics
Embargoed:
sradco: ovirt-4.3?
lleistne: testing_ack+


Attachments (Terms of Use)

Description Ivana Saranova 2019-03-20 15:52:03 UTC
Description of problem:
When installing metrics according to the README in the role, the Elasticsearch deploy pod on the created metrics-store machine ends with Error status.

Version-Release number of selected component (if applicable):
ovirt-engine-4.3.2-0.1.el7.noarch
ovirt-engine-metrics-1.2.2-0.0.master.20190319135632.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. On the Manager machine, copy /etc/ovirt-engine-metrics/config.yml.example to config.yml:
```
# cp /etc/ovirt-engine-metrics/metrics-store-config.yml.example /etc/ovirt-engine-metrics/config.yml.d/metrics-store-config.yml
```
2. Update the values of /etc/ovirt-engine-metrics/metrics-store-config.yml to match the details of your specific environment:
```
# vi /etc/ovirt-engine-metrics/config.yml.d/metrics-store-config.yml
```

3. Go to ovirt-engine-metrics repo:
```
# cd /usr/share/ovirt-engine-metrics
```

4. Run the metrics store installation playbook that creates the metrics store installer virtual machine
```
# ANSIBLE_JINJA2_EXTENSIONS="jinja2.ext.do" ./configure_ovirt_machines_for_metrics.sh --playbook=ovirt-metrics-store-installation.yml
```

**Note:** If this playbook ends with failure on https://github.com/oVirt/ovirt-ansible-vm-infra/issues/65
But all vms are created succesfully. Need to see how to fix the failure.
You can continue with deploying OpenShift from the metrics store installer virtual machine.

5. Log into the admin portal and review the metrics store installer virtual machine creation.

6. Log into the metrics store installer virtual machine
```
# ssh root@<metrics-store-installer ip or fqdn>
```
**Note:** If you are not using DNS, make sure to add the new OpenShift virtual machines  to /etc/hosts on the engine and installer machines.

7. Run the ansible playbook that deploys OpenShift on the created vms

```
# ANSIBLE_CONFIG="/usr/share/ansible/openshift-ansible/ansible.cfg" \
  ANSIBLE_ROLES_PATH="/usr/share/ansible/roles/:/usr/share/ansible/openshift-ansible/roles" \
  ansible-playbook -i integ.ini install_okd.yaml -e @vars.yaml
```


Actual results:

[root@master0 ~]# oc get pods
NAME                                       READY     STATUS    RESTARTS   AGE
logging-es-data-master-dv5h9nw6-1-deploy   0/1       Error     0          1h
logging-fluentd-j9kp5                      1/1       Running   0          1h
logging-kibana-1-tzl8v                     2/2       Running   0          2h
logging-mux-1-rpjwn                        1/1       Running   0          1h

[root@master0 ~]# oc get svc
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP    PORT(S)     AGE
logging-es              ClusterIP   172.30.7.36      master0        9200/TCP    2h
logging-es-cluster      ClusterIP   None             <none>         9300/TCP    2h
logging-es-prometheus   ClusterIP   172.30.164.164   <none>         443/TCP     2h
logging-kibana          ClusterIP   172.30.150.193   <none>         443/TCP     2h
logging-mux             ClusterIP   172.30.85.157    master0        24284/TCP   2h

[root@master0 ~]# oc logs logging-es-data-master-dv5h9nw6-1-deploy
--> Scaling logging-es-data-master-dv5h9nw6-1 to 1
Warning: acceptAvailablePods encountered %T, retryingwatch closed before Until timeout--> Error listing events for replication controller logging-es-data-master-dv5h9nw6-1: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/events?fieldSelector=involvedObject.kind%3DReplicationController%2CinvolvedObject.uid%3D5a8f3648-4b16-11e9-955b-001a4aa33015%2CinvolvedObject.name%3Dlogging-es-data-master-dv5h9nw6-1%2CinvolvedObject.namespace%3Dopenshift-logging: dial tcp 172.30.0.1:443: connect: connection refused
error: update acceptor rejected logging-es-data-master-dv5h9nw6-1: acceptAvailablePods failed to watch ReplicationController openshift-logging/logging-es-data-master-dv5h9nw6-1: Get https://172.30.0.1:443/api/v1/namespaces/openshift-logging/replicationcontrollers?fieldSelector=metadata.name%3Dlogging-es-data-master-dv5h9nw6-1&resourceVersion=4091&watch=true: dial tcp 172.30.0.1:443: connect: connection refused

Expected results:
Elasticsearch pod is running and working properly.

Additional info:

Comment 1 Ivana Saranova 2019-04-04 09:32:13 UTC
Steps:
1) Install metrics-store according to the documentation and README
2) Check if Elasticsearch pod is deployed and running: oc get pods

Results:
Elasticsearch pod is ready, running and successfully deployed.


Verified in: 
ovirt-engine-4.2.8.5-0.1.el7ev.noarch
ovirt-engine-metrics-1.2.1.3-1.el7ev.noarch

Also verified in:
ovirt-engine-4.3.3.1-0.1.el7.noarch
ovirt-engine-metrics-1.2.1.3-1.el7ev.noarch

Comment 2 Sandro Bonazzola 2019-04-16 13:58:32 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.