Bug 1632869 - Add support for only installing and running the schema installer job
Summary: Add support for only installing and running the schema installer job
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Ruben Vargas Palma
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1632870
TreeView+ depends on / blocked
 
Reported: 2018-09-25 17:52 UTC by John Sanda
Modified: 2019-07-09 08:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1632870 (view as bug list)
Environment:
Last Closed: 2019-01-10 09:04:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0024 0 None None None 2019-01-10 09:04:07 UTC

Description John Sanda 2018-09-25 17:52:33 UTC
Description of problem:
There is no easy way to rerun a Kubernetes job. As discussed in bug 1632852 there are times when the job terminates with a failure and does not run again. In that sort of scenario the job has to be recreated in order for it to run again. This should be automated through the installer.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Vadim Rutkovsky 2018-10-12 15:56:25 UTC
PR https://github.com/openshift/openshift-ansible/pull/10340

Comment 2 Vadim Rutkovsky 2018-10-15 09:09:52 UTC
Fix is available in openshift-ansible-3.11.23-1

Comment 3 Junqi Zhao 2018-10-17 08:53:40 UTC
Tested with openshift-ansible-3.11.23-1
scenario 1:
1. Deploy metrics 3.11 and make sure all the pods could be in running status
2. Delete hawkular-metrics-schema job, and run playbooks/openshift-metrics/schema.yml, hawkular-metrics-schema job could be created, and hawkular-metrics-schema pod could be created.


scenario 2:
1. Don't deploy metrics 3.11, run playbooks/openshift-metrics/schema.yml directly. Although hawkular-metrics-schema pod is in ContainerCreating status due to secrets "hawkular-metrics-account" and "hawkular-metrics-certs" are not created(they are used for hawkular-metrics pod), purpose to only installing and running the schema installer job is achieved.

# oc -n openshift-infra get job
NAME                      DESIRED   SUCCESSFUL   AGE
hawkular-metrics-schema   1         0            26m

# oc -n openshift-infra get pod
NAME                            READY     STATUS              RESTARTS   AGE
hawkular-metrics-schema-jtms6   0/1       ContainerCreating   0          26m

# oc -n openshift-infra describe pod hawkular-metrics-schema-jtms6
Events:
  Type     Reason       Age                 From                                   Message
  ----     ------       ----                ----                                   -------
  Normal   Scheduled    27m                 default-scheduler                      Successfully assigned openshift-infra/hawkular-metrics-schema-jtms6 to ip-172-18-10-45.ec2.internal
  Warning  FailedMount  17m (x13 over 27m)  kubelet, ip-172-18-10-45.ec2.internal  MountVolume.SetUp failed for volume "hawkular-metrics-certs" : secrets "hawkular-metrics-certs" not found
  Warning  FailedMount  7m (x9 over 25m)    kubelet, ip-172-18-10-45.ec2.internal  Unable to mount volumes for pod "hawkular-metrics-schema-jtms6_openshift-infra(3cff463f-d1e5-11e8-97db-0e7631322b02)": timeout expired waiting for volumes to attach or mount for pod "openshift-infra"/"hawkular-metrics-schema-jtms6". list of unmounted volumes=[hawkular-metrics-certs hawkular-metrics-account]. list of unattached volumes=[hawkular-metrics-certs hawkular-metrics-account default-token-fz7sx]
  Warning  FailedMount  1m (x21 over 27m)   kubelet, ip-172-18-10-45.ec2.internal  MountVolume.SetUp failed for volume "hawkular-metrics-account" : secrets "hawkular-metrics-account" not found

@Vadim
Do you think the scenarios are enough to close this defect?

Comment 4 Vadim Rutkovsky 2018-10-17 09:07:21 UTC
(In reply to Junqi Zhao from comment #3)
> Tested with openshift-ansible-3.11.23-1
> scenario 1:
> 1. Deploy metrics 3.11 and make sure all the pods could be in running status
> 2. Delete hawkular-metrics-schema job, and run
> playbooks/openshift-metrics/schema.yml, hawkular-metrics-schema job could be
> created, and hawkular-metrics-schema pod could be created.

This looks correct. 

Initially I've seen this with Origin, where the image for schema job was missing and this playbook was necessary to restore Metrics install.

@Ruben, any ideas how to corrupt the schema so that the job would fix it and QE could verify schema is restored?

> 
> scenario 2:
> 1. Don't deploy metrics 3.11, run playbooks/openshift-metrics/schema.yml
> directly. Although hawkular-metrics-schema pod is in ContainerCreating
> status due to secrets "hawkular-metrics-account" and
> "hawkular-metrics-certs" are not created(they are used for hawkular-metrics
> pod), purpose to only installing and running the schema installer job is
> achieved.

I don't think its valid. Schema job playbook should not be used if metrics are not deployed

Comment 5 Junqi Zhao 2018-10-17 11:19:00 UTC
(In reply to Vadim Rutkovsky from comment #4)
> @Ruben, any ideas how to corrupt the schema so that the job would fix it and
> QE could verify schema is restored?

No matter the schema is corrupted or not, hawkular-metrics-schema job will be deleted firstly, then create new hawkular-metrics-schema job

roles/openshift_metrics/tasks/run_schema_job.yaml
- include_tasks: install_hawkular_schema_job.yaml

roles/openshift_metrics/tasks/install_hawkular_schema_job.yaml
---
- name: list installed jobs
  command: >
    {{ openshift_client_binary }} -n {{ openshift_metrics_project }} --config={{ mktemp.stdout }}/admin.kubeconfig
    get jobs
  register: jobs

# We cannot use oc apply here because the Job template has immutable fields
# on which oc apply will fail.
- name: remove hawkular-metrics-schema job
  command: >
    {{ openshift_client_binary }} -n {{ openshift_metrics_project }} --config={{ mktemp.stdout }}/admin.kubeconfig
    delete job hawkular-metrics-schema
  register: delete_schema_job
  when: "'hawkular-metrics-schema' in jobs.stdout"

- name: generate hawkular-metrics schema job
  template:
    src: hawkular_metrics_schema_job.j2
    dest: "{{ mktemp.stdout }}/templates/hawkular_metrics_schema_job.yaml"
  changed_when: false
*****************************************************************

So, it is no need to do the testing with corrupted hawkular-metrics-schema job

Comment 8 Junqi Zhao 2018-10-18 05:44:05 UTC
Tested with openshift-ansible-3.11.23-1, issue is fixed.

Steps:
scale down cassandra and hawkular-metrics rc, after a while, scale them up.
There are error in hawkular-metrics pod, "The schema version check failed". Then run the playbooks/openshift-metrics/schema.yml playbook, all pods will be running well.

# oc -n openshift-infra get pod
NAME                            READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-x97wh      1/1       Running     0          12m
hawkular-metrics-kwjxz          1/1       Running     1          12m
hawkular-metrics-schema-vf8gj   0/1       Completed   0          5m
heapster-htc2p                  1/1       Running     0          3h

Comment 10 errata-xmlrpc 2019-01-10 09:04:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0024


Note You need to log in before you can comment on or make changes to this bug.