Bug 1224378
Summary: | ccs_config_validate chews on long lines | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Radek Steiger <rsteiger> |
Component: | cluster | Assignee: | Christine Caulfield <ccaulfie> |
Status: | CLOSED WONTFIX | QA Contact: | cluster-qe <cluster-qe> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.7 | CC: | ccaulfie, cluster-maint, jpokorny, rpeterso, teigland |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-08-17 12:47:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Radek Steiger
2015-05-22 17:53:20 UTC
Emerson fencing agent with validation times depending on # of attributes ------------------------------------------------------------------------ [root@virt-100 cluster]# time ccs_config_validate Configuration validates real 153m24.830s user 152m17.036s sys 0m2.553s [root@virt-100 cluster]# cat cluster.conf | tr '\t' ' ' <?xml version="1.0"?> <cluster config_version="40" name="virt-100"> <clusternodes> <clusternode name="virt-100" nodeid="1"> <fence> <method name="Method"> <device name="Emerson" port="22"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_emerson" community="public" ipaddr="emerson.example.com" login="emerson" login_timeout="10" name="Emerson" passwd="fittipaldi" passwd_script="/usr/local/bin/myscript" power_timeout="30" power_wait="10" retry_on="5" shell_timeout="30" snmp_auth_prot="SHA" snmp_priv_passwd="fittipaldi" snmp_priv_passwd_script="/usr/local/bin/myscript" snmp_priv_prot="AES" snmp_sec_level="noAuthNoPriv" snmp_version="2c" udpport="161"/> </fencedevices> </cluster> *********** [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 3m5.494s user 3m5.057s sys 0m0.054s <fencedevice agent="fence_emerson" community="public" ipaddr="emerson.example.com" login="emerson" login_timeout="10" name="Emerson" passwd="fittipaldi" power_timeout="30" power_wait="10" retry_on="5" shell_timeout="30" snmp_auth_prot="SHA" snmp_priv_passwd="fittipaldi" snmp_priv_prot="AES" snmp_sec_level="noAuthNoPriv" snmp_version="2c" udpport="161"/> *********** [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m35.295s user 0m35.174s sys 0m0.036s <fencedevice agent="fence_emerson" community="public" ipaddr="emerson.example.com" login="emerson" login_timeout="10" name="Emerson" passwd="fittipaldi" power_timeout="30" power_wait="10" retry_on="5" shell_timeout="30" snmp_auth_prot="SHA" snmp_priv_passwd="fittipaldi" snmp_priv_prot="AES" snmp_version="2c" udpport="161"/> *********** [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m7.432s user 0m7.363s sys 0m0.032s <fencedevice agent="fence_emerson" community="public" ipaddr="emerson.example.com" login="emerson" login_timeout="10" name="Emerson" passwd="fittipaldi" power_timeout="30" power_wait="10" retry_on="5" shell_timeout="30" snmp_auth_prot="SHA" snmp_priv_passwd="fittipaldi" snmp_version="2c" udpport="161"/> *********** [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m2.054s user 0m2.003s sys 0m0.020s <fencedevice agent="fence_emerson" community="public" ipaddr="emerson.example.com" login="emerson" login_timeout="10" name="Emerson" passwd="fittipaldi" power_timeout="30" power_wait="10" retry_on="5" shell_timeout="30" snmp_auth_prot="SHA" snmp_version="2c" udpport="161"/> *********** [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m0.676s user 0m0.625s sys 0m0.023s <fencedevice agent="fence_emerson" community="public" ipaddr="emerson.example.com" login="emerson" login_timeout="10" name="Emerson" passwd="fittipaldi" power_timeout="30" power_wait="10" retry_on="5" shell_timeout="30" snmp_version="2c" udpport="161"/> *********** [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m0.269s user 0m0.219s sys 0m0.022s <fencedevice agent="fence_emerson" community="public" ipaddr="emerson.example.com" login="emerson" login_timeout="10" name="Emerson" passwd="fittipaldi" power_timeout="30" power_wait="10" retry_on="5" shell_timeout="30" udpport="161"/> SAP resource agent with validation times depending on # of attributes --------------------------------------------------------------------- [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 10m1.649s user 9m59.879s sys 0m0.156s [root@virt-100 tmp]# cat /tmp/cluster.conf | tr '\t' ' ' <?xml version="1.0"?> <cluster config_version="54" name="virt-100"> <clusternodes> <clusternode name="virt-100" nodeid="1"> <fence> <method name="Method"/> </fence> </clusternode> </clusternodes> <rm> <service autostart="0" name="sg1" recovery="relocate"> <SAPDatabase AUTOMATIC_RECOVER="1" DBJ2EE_ONLY="1" DBTYPE="ORA" DB_JARS="java" DIR_BOOTSTRAP="/opt/j2ee" DIR_EXECUTABLE="/sap" DIR_SECSTORE="/opt/sec" JAVA_HOME="/opt/java" NETSERVICENAME="oracle" POST_START_USEREXIT="/opt/bin/script" POST_STOP_USEREXIT="/opt/bin/script" PRE_START_USEREXIT="/opt/bin/script" PRE_STOP_USEREXIT="/opt/bin/script" SID="sap" STRICT_MONITORING="1" __enforce_timeouts="1" __failure_expire_time="30" __independent_subtree="2" __max_failures="5"/> </service> </rm> </cluster> ************* [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m19.818s user 0m19.596s sys 0m0.056s <SAPDatabase AUTOMATIC_RECOVER="1" DBJ2EE_ONLY="1" DBTYPE="ORA" DB_JARS="java" DIR_BOOTSTRAP="/opt/j2ee" DIR_EXECUTABLE="/sap" DIR_SECSTORE="/opt/sec" JAVA_HOME="/opt/java" NETSERVICENAME="oracle" POST_START_USEREXIT="/opt/bin/script" POST_STOP_USEREXIT="/opt/bin/script" PRE_START_USEREXIT="/opt/bin/script" PRE_STOP_USEREXIT="/opt/bin/script" SID="sap" STRICT_MONITORING="1" __enforce_timeouts="1" __independent_subtree="2"/> ************* [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m4.489s user 0m4.418s sys 0m0.023s <SAPDatabase AUTOMATIC_RECOVER="1" DBJ2EE_ONLY="1" DBTYPE="ORA" DB_JARS="java" DIR_BOOTSTRAP="/opt/j2ee" DIR_EXECUTABLE="/sap" DIR_SECSTORE="/opt/sec" JAVA_HOME="/opt/java" NETSERVICENAME="oracle" POST_START_USEREXIT="/opt/bin/script" POST_STOP_USEREXIT="/opt/bin/script" PRE_START_USEREXIT="/opt/bin/script" PRE_STOP_USEREXIT="/opt/bin/script" SID="sap" STRICT_MONITORING="1" __independent_subtree="2"/> ************* [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m1.202s user 0m1.136s sys 0m0.018s <SAPDatabase AUTOMATIC_RECOVER="1" DBJ2EE_ONLY="1" DBTYPE="ORA" DB_JARS="java" DIR_BOOTSTRAP="/opt/j2ee" DIR_EXECUTABLE="/sap" DIR_SECSTORE="/opt/sec" JAVA_HOME="/opt/java" NETSERVICENAME="oracle" POST_START_USEREXIT="/opt/bin/script" POST_STOP_USEREXIT="/opt/bin/script" PRE_START_USEREXIT="/opt/bin/script" PRE_STOP_USEREXIT="/opt/bin/script" SID="sap" STRICT_MONITORING="1"/> ************* [root@virt-100 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m0.113s user 0m0.065s sys 0m0.023s <SAPDatabase AUTOMATIC_RECOVER="1" DBJ2EE_ONLY="1" DBTYPE="ORA" DB_JARS="java" DIR_BOOTSTRAP="/opt/j2ee" DIR_EXECUTABLE="/sap" DIR_SECSTORE="/opt/sec" JAVA_HOME="/opt/java" NETSERVICENAME="oracle" SID="sap" STRICT_MONITORING="1"/> This is really suprising from this first sight. Not so much when the messy schema definition is taken into account (multiply repeated permutation of the same parameters for fence devices in a single bucket of alternative parameters being a most obvious example). By any chance, could you modify that script to invoke xmllint with --stream option added if it helps in any way, please? No real improvement with --stream: Without --stream: [root@host-091 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m29.010s user 0m28.797s sys 0m0.073s With --stream: [root@host-091 tmp]# time ccs_config_validate -f /tmp/cluster.conf Configuration validates real 0m28.765s user 0m28.582s sys 0m0.071s Hmm, jing tool that I introduced in RHEL 7 as an optional component (side-effect of the goal of having trang tool available in buildroots, [bug 908010]) doesn't suffer from this unability to scale reasonably: $ time jing cluster-6.7.rng cluster-big.conf > > real 0m0.427s > user 0m0.635s > sys 0m0.036s Note that cluster-big.conf directly matches to the example from [comment 1] that took xmllint/libxml2 over 2.5 hours (i.e., the boost factor here is more 1e8!). Per the profiling numbers (callgrind), there seems to be an issue with state space being increased by factor of 4 per a parameter added, hence the difference in complexity between 2 and 19 parameters (as an initial example in [comment 1]) is expected to be 4^(19-2) = 17179869184 (rough approximation of how many times longer it will take to validate a document with a single FA defining 19 parameters than 2). Exponential complexity O(4^N) of this sub-validation seems to be a blocker for using agents supporting more than certain amount of parameters and hence for general use. As a workaround, one can put "CONFIG_VALIDATION=NONE" line in /etc/sysconfig/cman. * * * fencedevice with 13 parameters (incl. name) 790,073,031 < xmlRelaxNGValidateDefinition'2 (54118x) 596,695,744 < xmlRelaxNGValidateState'2 (82372x) 1,386,009,941 * xmlRelaxNGAddStates 741,345 > xmlRelaxNGFreeValidState (25468x) fencedevice with 14 parameters (incl. name) 3,015,031,787 < xmlRelaxNGValidateDefinition'2 (95058x) 2,263,635,221 < xmlRelaxNGValidateState'2 (143812x) 5,277,365,570 * xmlRelaxNGAddStates 1,276,591 > xmlRelaxNGFreeValidState fencedevice with 15 parameters (incl. name) 11,815,907,357 < xmlRelaxNGValidateDefinition'2 (176958x) 8,849,057,938 < xmlRelaxNGValidateState'2 (266692x) 20,662,572,459 * xmlRelaxNGAddStates 2,345,819 > xmlRelaxNGFreeValidState (80764x) Closing this as it's too big a change to make in RHEL-6 now. There is a workaround to speed up startup as shown in the previous comment. |