Bug 1336568 - Satellite 6 Content Hosts Disappear due to Elasticsearch
Summary: Satellite 6 Content Hosts Disappear due to Elasticsearch
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Hosts - Content
Version: 6.1.8
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: jcallaha
URL:
Whiteboard:
Depends On:
Blocks: 1317008
TreeView+ depends on / blocked
 
Reported: 2016-05-16 22:11 UTC by Wilson Harris
Modified: 2020-08-13 08:28 UTC (History)
32 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-29 21:16:26 UTC
Target Upstream Version:


Attachments (Terms of Use)
reindex.rake replacement (6.16 KB, text/plain)
2016-05-19 15:55 UTC, Justin Sherrill
no flags Details
reindex_object.rake (1.74 KB, text/plain)
2016-07-07 19:29 UTC, Justin Sherrill
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2328141 0 None None None 2016-05-18 09:49:48 UTC
Red Hat Product Errata RHBA-2017:1668 0 normal SHIPPED_LIVE Satellite 6.1.12 Async Errata 2017-06-30 01:15:59 UTC

Description Wilson Harris 2016-05-16 22:11:07 UTC
Description of problem:

Satellite 6 Content Hosts Disappear due to Elastisearch

The content hosts page is blank.

Re-indexing the satellite does not resolve this issue to repopulate the web-ui.


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Benjamin Chardi 2016-05-17 15:06:28 UTC
Hi there,

We are facing the same problem in our customer, Mercadona.
We have Satellite 6.1.7 and after some "foreman-rake katello:reindex" done because other bugs, the content host list on WebUI appears empty.

We need a solution ASAP, this bug is putting on risk the project ...

Comment 2 Pavel Moravec 2016-05-17 17:03:09 UTC
Simple check if content hosts list empty is caused by elasticsearch:

curl -X GET 'http://localhost:9200/katello_katello::system/katello%2Fsystem/_search?pretty' -d '{"query":{"match_all":{}},"sort":[{"name_sort":"asc"}]}'

get some UUID from the output, and replace it in:

curl -X GET 'http://localhost:9200/katello_katello::system/katello%2Fsystem/_search?pretty' -d '{"query":{"match_all":{}},"sort":[{"name_sort":"asc"}],"filter":{"and":[{"terms":{"uuid":["0ecddc45-efa4-4963-b8e9-843828b61f95"]}}]}}'

if this curl returns:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

then elasticsearch index katello_katello::system (the one required for content host list) is broken.


Possible workaround that sometimes works:
1) revert change from https://access.redhat.com/solutions/2076563 (this KCS had been often applied before reporting this problem)
2)
service elasticsearch stop
rm -rf /var/lib/elasticsearch/*
katello-service restart
foreman-rake katello:reindex

(if reindex wont help, try it once again)

Worth to provide if it wont help:
- DBs backups
- tcpdump taken during running reindex:
tcpdump -i any -s 0 port 9200 or port 9300 -w es.$(date "+%s").cap

Comment 5 Justin Sherrill 2016-05-18 13:28:04 UTC
If you see this again, please grab the following output BEFORE trying to fix the issue:

curl -X GET 'http://localhost:9200/katello_katello::system/katello%2Fsystem/_mapping?pretty'

Comment 6 Justin Sherrill 2016-05-18 13:31:21 UTC
and also ask the user if they ran katello:reindex recently

Comment 12 Justin Sherrill 2016-05-19 18:08:03 UTC
Freddy,  IF you look at the contents of that script it should explain it. 

I provided that script more for people that are already familiar with it (this is an updated copy).

Comment 14 jnikolak 2016-05-20 00:07:32 UTC
We ran this on customers box, but we didn't get any new content_hosts showing up

The reindex just finished and still no content hosts.

Re-indexing Katello::Distribution
Re-indexing Katello::PackageGroup
Re-indexing Katello::Erratum

real    222m28.304s
user    36m53.866s
sys     0m47.355s
+ date
Thu May 19 16:01:05 CDT 2016

Comment 15 Justin Sherrill 2016-05-20 14:49:39 UTC
Jon,

Would you be able to run through these steps and gather the output of all commands:


https://gist.github.com/jlsherrill/cceafcf643d489b1a10c2ed260b8126f

Comment 17 jnikolak 2016-05-20 23:30:41 UTC
You have to have the memory to be able to handle the increase.

On box where this was applied, the server had 32GB of memory and was hardly using any of it.
I guess its a case of different options for different servers.

Comment 18 jnikolak 2016-05-21 07:25:19 UTC
In one more environment, I run the same reindex on system.
1st time, with default memory in elasticsearch and tomcat it failed.
2nd time, with memory 1gig in elasticsearch and tomcat it got futher.
3rd time, with memory min 1gig and 4gig in both elasticsearch and tomcat, it finished all the way.

This machine only had 8gig. 


my shards are set to: 
--> index.number_of_shards: 3

I may change that and see if it increases/decreases in speed, to reproduce issue (of performance slow)

Comment 20 jnikolak 2016-05-22 07:24:25 UTC
I was running more tests.

I tried adjusting shards, replicas,  bootstrap.mlockall, tomcat memory and elasticsearch memory.

Each setting made no difference to performance of the reindex.
I got variable results, probably just based on the output of the request at the time.

I put everything back to default.
The only difference I found was:

In elasticsearch.yaml
*  that if you have replicas greater than 1, there will be no content hosts shown. you have to reindex to get everything back.

* if you have 0 shards, there will be no content hosts
This might be a good technique to clear the content_hosts.

* if you exceed the memory in /etc/syconfig/elasticsearch can handle, it will error out.

* the more replicas you have the longer it takes.

* having 10,100,500 shards made no difference to reindex or site performance.

* having more memory in tomcat or elastic (/etc/sysconfig) won't give much improvement.


#####################
Conclusion, leaving everything default is fine.

Comment 25 hprakash 2016-06-03 07:10:37 UTC
An another customer (case#  01644740) faced same issue, after deleting elasticsearch index and re-creating only for "system", customer was able to load content hosts properly. 

Steps followed by the customer:

1- Removed the index from elastic search:
curl -X DELETE 'http://localhost:9200/katello_katello::system/'

2- From the foreman console run the below command to re-generate indexes. 
 Katello::System.create_elasticsearch_index

Thanks
Himanshu Prakash

Comment 31 Justin Sherrill 2016-07-07 19:27:48 UTC
Currently attached  reindex_object.task  does not properly handle hypervisors, uploading a new one

Comment 32 Justin Sherrill 2016-07-07 19:29:03 UTC
Created attachment 1177435 [details]
reindex_object.rake

Comment 33 hprakash 2016-08-09 05:08:50 UTC
(In reply to Justin Sherrill from comment #32)
> Created attachment 1177435 [details]
> reindex_object.rake

@Justin, would reindex_object.rake resolve the issue in Sat6.1?
I believe this issue would not exist in Sat 6.2 as elastic search is removed in it.

Comment 34 Justin Sherrill 2016-08-09 12:30:30 UTC
Yes it should, and yes elastic search is gone in 6.2

Comment 36 Bryan Kearney 2016-09-12 15:28:56 UTC
Adding needinfo onto justin.

Comment 39 Bryan Kearney 2016-12-07 20:17:49 UTC
I am closing this bug out. The root cause was due to interaction with elasticsearch. Elasticsearch has been removed from Satellite 6.2, and this removal will not be back ported. Customers should use the resolution above until they are able to upgrade to Satellite 6.2 or later.

Comment 41 Justin Sherrill 2017-03-01 15:18:24 UTC
We will ship this updated rake task as part of a future 6.1.z release

Comment 51 jcallaha 2017-06-22 19:46:02 UTC
Failed in Satellite 6.1.12 Snap 2.

Running the reindex, in its current state, removes all hosts from the index. The UI continues "loading" forever, but a 500 ISE is seen in the web console. After further testing, we determined that the Hypervisor reindex was causing the issue.

Comment 53 jcallaha 2017-06-28 15:34:17 UTC
Verified in Satellite 6.1.12 Snap 3, based on no-break criteria.

The reindex now successfully completes and no longer removes host entries from elastisearch. The hypervisor portion no longer runs, but those systems are accounted for by the system portion.

Additionally, I added a sleep to the reindex task per comment #50, allowing me to register new hosts during the process. At no point did the content host page fail to load. After this completion, all previous and new content hosts were shown.

[root@cloud-qe-19 yum.repos.d]# foreman-rake katello:reindex
Re-indexing Katello::ActivationKey
Re-indexing Katello::ContentView
Re-indexing Katello::Repository
Re-indexing Katello::ContentViewFilter
Re-indexing Katello::Product
Re-indexing Katello::ContentViewErratumFilterRule
Re-indexing Katello::ContentViewHistory
Re-indexing Katello::Provider
Re-indexing Katello::ContentViewPackageFilterRule
Re-indexing Katello::ContentViewPackageGroupFilterRule
Re-indexing Katello::TaskStatus
Re-indexing Katello::ContentViewPuppetEnvironment
Re-indexing Katello::ContentViewPuppetModule
Re-indexing Katello::Distributor
Re-indexing Katello::HostCollection
Re-indexing Katello::System
Re-indexing Katello::Job
Re-indexing Katello::Notice
Re-indexing Katello::Package
Re-indexing Katello::PuppetModule
Re-indexing Katello::Distribution
Re-indexing Katello::PackageGroup
Re-indexing Katello::Erratum

Comment 55 errata-xmlrpc 2017-06-29 21:16:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1668


Note You need to log in before you can comment on or make changes to this bug.