The MySQL slow query log file is full of entries where the complete list of minhash xor values are being queried and returned. This is from the RESTv1.java file in the recalculateMinHashes() method. final List<MinHashXOR> minHashXORs = entityManager.createQuery(MinHashXOR.SELECT_ALL_QUERY).getResultList(); Since this information is very unlikely to change, we should consider caching it between calls.
This should also be applied to the Topic XML factory, as it also produces the same slow query.
Fixed in 1.5-SNAPSHOT build 201404111431 The MinHashXOR select all query is now cached upon first use in the CacheEntityLoader class. All references of getting all the MinHashXOR's are now accessed via the CacheEntityLoader and when the MinHashXORs are recalculated the cached values are invalidated. Additionally I've also optimized how minhashes are calculated to save processing time by removing logic that can be done once instead of for every MinHashXOR value. Note: This version is currently live on the test/development server.
I changed the HashMap in CacheEntityLoader to a ConcurrentHashMap. From what I can tell concurrent read and write access to a HashMap can lead to unpredictable results.