Description of problem: To enhance cumin-data scale, it is necessary to run multiple cumin-data instances with each instance bound to a subset of available QMF classes. The existing feature that allows binding on the package level is not sufficient to achieve the level of partitioning we need for scale.
Fixed in revision 4661. Detailed comments and examples have been added to the standard cumin.conf file explaining the configuration mechanism for multiple cumin-data instances and describing alternate configurations. The default configuration will start multiple cumin-data instances with the QMF class partitioning set up most advantageously for scale. A [master] section has been added to cumin.conf that is read by /usr/bin/cumin. This is now the normal place for setting values for the "datas" and "webs" arguments to /usr/bin/cumin. Options set in /etc/sysconfig/cumin file will override the values in cumin.conf when cumin is run as a service -- probably options should not be set in /etc/sysconfig/cumin except for/by test scripts which may seek to run different configurations quickly. The 'packages' config parameter is no longer valid (neither is 'the recent classes' parameter added during development). Instead, two new parameters have been created: include-classes (default is all QMF classes in all QMF packages) exclude-classes (default is empty list) Each takes a list of package:class values, where 'class' may be '*' For example include-classes: com.redhat.grid:Slot, com.redhat.sesame:*
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause To maintain performance as scale increases, cumin needs to distribute data processing across multiple instances of cumin-data. Responsibility for data processing needs to be partitioned at the level of QMF classes. Consequence Without this enhancement users may notice decreases in cumin performance as scale increases. Change The standard configuration for cumin at installation will run multiple cumin-data instances with the workload partitioned for best performance from a scale perspective. Result These changes allow cumin to better maintain performance as scale increases.
Verified in cumin-0.1.4746-1.el5 and cumin-0.1.4746-1.el6.noarch
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html