Description of problem: Satellite 6 Content Hosts Disappear due to Elastisearch The content hosts page is blank. Re-indexing the satellite does not resolve this issue to repopulate the web-ui. How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hi there, We are facing the same problem in our customer, Mercadona. We have Satellite 6.1.7 and after some "foreman-rake katello:reindex" done because other bugs, the content host list on WebUI appears empty. We need a solution ASAP, this bug is putting on risk the project ...
Simple check if content hosts list empty is caused by elasticsearch: curl -X GET 'http://localhost:9200/katello_katello::system/katello%2Fsystem/_search?pretty' -d '{"query":{"match_all":{}},"sort":[{"name_sort":"asc"}]}' get some UUID from the output, and replace it in: curl -X GET 'http://localhost:9200/katello_katello::system/katello%2Fsystem/_search?pretty' -d '{"query":{"match_all":{}},"sort":[{"name_sort":"asc"}],"filter":{"and":[{"terms":{"uuid":["0ecddc45-efa4-4963-b8e9-843828b61f95"]}}]}}' if this curl returns: { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } } then elasticsearch index katello_katello::system (the one required for content host list) is broken. Possible workaround that sometimes works: 1) revert change from https://access.redhat.com/solutions/2076563 (this KCS had been often applied before reporting this problem) 2) service elasticsearch stop rm -rf /var/lib/elasticsearch/* katello-service restart foreman-rake katello:reindex (if reindex wont help, try it once again) Worth to provide if it wont help: - DBs backups - tcpdump taken during running reindex: tcpdump -i any -s 0 port 9200 or port 9300 -w es.$(date "+%s").cap
If you see this again, please grab the following output BEFORE trying to fix the issue: curl -X GET 'http://localhost:9200/katello_katello::system/katello%2Fsystem/_mapping?pretty'
and also ask the user if they ran katello:reindex recently
Freddy, IF you look at the contents of that script it should explain it. I provided that script more for people that are already familiar with it (this is an updated copy).
We ran this on customers box, but we didn't get any new content_hosts showing up The reindex just finished and still no content hosts. Re-indexing Katello::Distribution Re-indexing Katello::PackageGroup Re-indexing Katello::Erratum real 222m28.304s user 36m53.866s sys 0m47.355s + date Thu May 19 16:01:05 CDT 2016
Jon, Would you be able to run through these steps and gather the output of all commands: https://gist.github.com/jlsherrill/cceafcf643d489b1a10c2ed260b8126f
You have to have the memory to be able to handle the increase. On box where this was applied, the server had 32GB of memory and was hardly using any of it. I guess its a case of different options for different servers.
In one more environment, I run the same reindex on system. 1st time, with default memory in elasticsearch and tomcat it failed. 2nd time, with memory 1gig in elasticsearch and tomcat it got futher. 3rd time, with memory min 1gig and 4gig in both elasticsearch and tomcat, it finished all the way. This machine only had 8gig. my shards are set to: --> index.number_of_shards: 3 I may change that and see if it increases/decreases in speed, to reproduce issue (of performance slow)
I was running more tests. I tried adjusting shards, replicas, bootstrap.mlockall, tomcat memory and elasticsearch memory. Each setting made no difference to performance of the reindex. I got variable results, probably just based on the output of the request at the time. I put everything back to default. The only difference I found was: In elasticsearch.yaml * that if you have replicas greater than 1, there will be no content hosts shown. you have to reindex to get everything back. * if you have 0 shards, there will be no content hosts This might be a good technique to clear the content_hosts. * if you exceed the memory in /etc/syconfig/elasticsearch can handle, it will error out. * the more replicas you have the longer it takes. * having 10,100,500 shards made no difference to reindex or site performance. * having more memory in tomcat or elastic (/etc/sysconfig) won't give much improvement. ##################### Conclusion, leaving everything default is fine.
An another customer (case# 01644740) faced same issue, after deleting elasticsearch index and re-creating only for "system", customer was able to load content hosts properly. Steps followed by the customer: 1- Removed the index from elastic search: curl -X DELETE 'http://localhost:9200/katello_katello::system/' 2- From the foreman console run the below command to re-generate indexes. Katello::System.create_elasticsearch_index Thanks Himanshu Prakash
Currently attached reindex_object.task does not properly handle hypervisors, uploading a new one
Created attachment 1177435 [details] reindex_object.rake
(In reply to Justin Sherrill from comment #32) > Created attachment 1177435 [details] > reindex_object.rake @Justin, would reindex_object.rake resolve the issue in Sat6.1? I believe this issue would not exist in Sat 6.2 as elastic search is removed in it.
Yes it should, and yes elastic search is gone in 6.2
Adding needinfo onto justin.
I am closing this bug out. The root cause was due to interaction with elasticsearch. Elasticsearch has been removed from Satellite 6.2, and this removal will not be back ported. Customers should use the resolution above until they are able to upgrade to Satellite 6.2 or later.
We will ship this updated rake task as part of a future 6.1.z release
Failed in Satellite 6.1.12 Snap 2. Running the reindex, in its current state, removes all hosts from the index. The UI continues "loading" forever, but a 500 ISE is seen in the web console. After further testing, we determined that the Hypervisor reindex was causing the issue.
Verified in Satellite 6.1.12 Snap 3, based on no-break criteria. The reindex now successfully completes and no longer removes host entries from elastisearch. The hypervisor portion no longer runs, but those systems are accounted for by the system portion. Additionally, I added a sleep to the reindex task per comment #50, allowing me to register new hosts during the process. At no point did the content host page fail to load. After this completion, all previous and new content hosts were shown. [root@cloud-qe-19 yum.repos.d]# foreman-rake katello:reindex Re-indexing Katello::ActivationKey Re-indexing Katello::ContentView Re-indexing Katello::Repository Re-indexing Katello::ContentViewFilter Re-indexing Katello::Product Re-indexing Katello::ContentViewErratumFilterRule Re-indexing Katello::ContentViewHistory Re-indexing Katello::Provider Re-indexing Katello::ContentViewPackageFilterRule Re-indexing Katello::ContentViewPackageGroupFilterRule Re-indexing Katello::TaskStatus Re-indexing Katello::ContentViewPuppetEnvironment Re-indexing Katello::ContentViewPuppetModule Re-indexing Katello::Distributor Re-indexing Katello::HostCollection Re-indexing Katello::System Re-indexing Katello::Job Re-indexing Katello::Notice Re-indexing Katello::Package Re-indexing Katello::PuppetModule Re-indexing Katello::Distribution Re-indexing Katello::PackageGroup Re-indexing Katello::Erratum
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1668