Follow the https://issues.jboss.org/browse/MODCLUSTER-372
Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-372 I can see I forgot to add {{ServerLimit 40}}, but it would only allow for MaxClients to be higher than 1920, and actually shouldn't have any bearing on the case of mod_cluster and registered contexts.
Jean-Frederic Clere <jfclere> made a comment on jira MODCLUSTER-372 Looking in mod_proxy_cluster.c find_node_context_host() I have spotted something very bad: +++ /* keep only the contexts corresponding to our balancer */ if (balancer != NULL) { nodeinfo_t *node; if (node_storage->read_node(context->node, &node) != APR_SUCCESS) continue; +++ We are reading the shared memory for each context that is _very_ bad.
Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-372 I've identified these bottle necks: Three the most expensive functions and their callers: # Function {{ap_slotmem_mem}} from mod_slotmem/sharedmem_util.c called by: * {{get_context}} from mod_manager/context.c * {{get_node}} from mod_manager/node.c * {{loc_read_node}} from mod_manager/mod_manager.c * {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c * {{loc_read_context}} mod_manager/context.c * {{read_context_table}} mod_proxy_cluster/mod_proxy_cluster.c * {{manager_info}} from mod_manager/mod_manager.c # Function {{ap_slotmem_do}} from mod_slotmem/sharedmem_util.c called mostly by * httpd_request and httpd_core functions from httpd sources. # Function {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c called by * httpd sources, mostly on request_process and request_connection. So, IMHO, {{find_node_context_host}} is indeed the trouble :) Attaching the profiling valgrind logs [^callgrind.zip], created as: {noformat} valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes --compress-strings=no --compress-pos=no --collect-systime=yes ./httpd -f /tmp/hudson/httpd/conf/httpd.conf -E /tmp/hudson/httpd/logs/httpd.log {noformat} with this debug setting: {code} <IfModule worker.c> ThreadLimit 50 StartServers 1 ServerLimit 1 MinSpareThreads 50 MaxSpareThreads 50 MaxClients 50 ThreadsPerChild 50 MaxRequestsPerChild 0 </IfModule> {code} and 4 worker nodes, 65 contexts each, and several dozens of concurrent client sessions.
Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-372 I've identified these bottle necks: Three the most expensive functions and their callers.They keep this order regardless of the event metric being Instruction fetch/Data read/Data write. # Function {{ap_slotmem_mem}} from mod_slotmem/sharedmem_util.c called by: * {{get_context}} from mod_manager/context.c * {{get_node}} from mod_manager/node.c * {{loc_read_node}} from mod_manager/mod_manager.c * {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c * {{loc_read_context}} mod_manager/context.c * {{read_context_table}} mod_proxy_cluster/mod_proxy_cluster.c * {{manager_info}} from mod_manager/mod_manager.c # Function {{ap_slotmem_do}} from mod_slotmem/sharedmem_util.c called mostly by * httpd_request and httpd_core functions from httpd sources. # Function {{find_node_context_host}} from mod_proxy_cluster/mod_proxy_cluster.c called by * httpd sources, mostly on request_process and request_connection. So, IMHO, {{find_node_context_host}} is indeed the trouble :) Attaching the profiling valgrind logs [^callgrind.zip], created as: {noformat} valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes --compress-strings=no --compress-pos=no --collect-systime=yes ./httpd -f /tmp/hudson/httpd/conf/httpd.conf -E /tmp/hudson/httpd/logs/httpd.log {noformat} with this debug setting: {code} <IfModule worker.c> ThreadLimit 50 StartServers 1 ServerLimit 1 MinSpareThreads 50 MaxSpareThreads 50 MaxClients 50 ThreadsPerChild 50 MaxRequestsPerChild 0 </IfModule> {code} and 4 worker nodes, 65 contexts each, and several dozens of concurrent client sessions.
Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-372 Jean-Frederic is experimenting with reading from local instead of shared mem, the preliminary results of my unified tests look very promising: ||test||balancer CPU usage peak|| |1.2.x branch, 4 workers, 61 contexts each|70%| |1.2.x branch, 4 workers, 1 context each|17%| |MODCLUSTER-372 branch, 4 workers, 61 contexts each|28%| |MODCLUSTER-372 branch, 4 workers, 1 context each|17%|
Jean-Frederic Clere <jfclere> updated the status of jira MODCLUSTER-372 to Resolved
Michal Babacek <mbabacek> updated the status of jira MODCLUSTER-372 to Reopened
Michal Babacek <mbabacek> updated the status of jira MODCLUSTER-372 to Resolved
Fix ported to 1.2.x branch [1], it's ready for EAP6. Follow the linked Jira for more details. It will be verified as soon as the productized bits are ready. [1] https://github.com/modcluster/mod_cluster/pull/64
Failed. Please, note DR2 contains the old mod_cluster 1.2.6. The component update is needed. I'm setting DR3 as a new milestone.
Fail. mod_cluster was not upgraded. According to [BZ 1050223], it was supposed to be in version 1.2.8, but it is the old 1.2.6.
Assigning to myself for verification...
Verified :-) Note: Don't expect this fix in EAP 6.3.0 Beta, because native libraries are out of the Beta release scope.
Changed bug type from Known Issue to Bug Fix as part of bz 1097719, taking in to account the comment above.