Hide Forgot
On 12/02/2013 08:34 AM, Ted Ross wrote: > Adding Alan Conway to the thread... > > On 11/29/2013 06:33 AM, Fabio M. Di Nitto wrote: >> On 11/29/2013 12:07 PM, Christine Caulfield wrote: >>> On 28/11/13 07:41, Fabio M. Di Nitto wrote: >>>> Hi all, >>>> >>>> as many of you know I am working on clustering openstack and yada yada... >>>> >>>> I have noticed a strange cpu usage when running qpidd (mrg2.3 channel) >>>> on top of corosync. >>>> >>>> no services are currently using qpidd, qpidd is supposed to be idle, >>>> even so i can see corosync using over 30% cpu for no reasons. >>>> >>>> This happens only with qpidd running in cluster mode. >>>> >>>> it´s very easy to reproduce: >>>> >>>> === QPIDD cluster === >>>> >>>> yum install pacemaker $usual_cluster_suspects qpid-tools >>>> qpid-cpp-server-cluster >>>> >>>> cat > /etc/qpidd.conf << EOF >>>> cluster-name="qpid-cluster" >>>> #cluster-cman=yes >>>> port=5672 >>>> max-connections=500 >>>> worker-threads=17 >>>> connection-backlog=10 >>>> auth=no >>>> realm=QPID >>>> EOF >>>> >>>> chkconfig pacemaker on >>>> pcs cluster setup --name rhos4-qpidd rhos4-node3 rhos4-node4 >>>> >>>> cluster.conf: add uidgid <uidgid uid="qpidd" gid="qpidd" /> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1025054 >>>> >>>> pcs cluster start >>>> sleep 30 >>>> >>>> pcs stonith create virt-fencing fence_xvm op monitor interval=60s >>>> pcs resource create qpidd lsb:qpidd --clone >>>> qpid-cluster <- to verify qpid is in cluster mode. >>>> >>>> ==== >>>> >>>> run top and see the results. >>>> >>>> I am on some semi-beefy VMs.. 2 CPU+4GB of ram each. 2 node cluster as >>>> you can see. >>>> >>>> I have the gut feeling that the problem is somewhere in qpidd-cluster >>>> cpg integration layer (poll vs select/dispatch?) >>>> >>>> Jan/Chrissie FYI: >>>> >>>> [mrg-2.3] >>>> name=Red Hat MRG 2.3 >>>> baseurl=http://download.devel.redhat.com/released/RHEL-6-MRG/2.3/$basearch/MRG/ >>>> >>>> enabled=1 >>>> gpgcheck=0 >>>> >>>> >>>> I didn´t file a bug yet because I don´t have enough details and it could >>>> easily be just a problem in my setup. >>>> >>> >>> I tried this and it looks to me that qpidd is continually sending via >>> cpg_mcast_joined() - I have no idea why that should be but it sounds >>> wrong to me. >> Chrissie, >> >> thanks for looking into it. >> >> Ted, any idea if it´s been fixed already upstream? or is it expected >> behavior? bugzilla? this is worth fixing somehow. >> >> Fabio >> > qpidd does send a regular "timer" event to the cluster so that all members can agree on when to execute tasks that are set on a timer. It should be possible to see what is happening by running qpidd --log-enable=trace+:cluster if it is the timer events that are using so much CPU we can probably optimize them. They aren't really needed when the cluster is idle, so perhaps they can be sent more intelligently only when there is something set on a timer. I would need to dig into the code to see how hard/easy that would be. BZ and assign it to me, I will look into it Thanks, Alan.
The cause of the problem is qpidd updating the "cluster clock" via CPG every 10ms. The cluster clock is used to synchronize expiry of messages with a Time To Live setting. The default interval of 10ms is probably unreasonably low. The consequence of a longer interval is that expiry of TTL messages may be delayed by up to the interval, which is probably not a serious problem Unless openstack has a requirement for sub-second accuracy for TTL expiry, we can resolve this by setting a longer interval in qpidd.conf e.g. cluster-clock-interval=1000 This will reduce the traffic to one message per second which is probably acceptable, if not put this Bugzilla back to ASSIGNED and I will look further.
testing with cluster-clock-interval=100 seems to reduce cpu usage enough to be almost unnoticeable (down to 6~10% CPU vs 30-35%). On real baremetal with lots of horse powers, it´s barely noticeable at 10 (4% cpu), 100 (1~2% cpu), 1000 can´t detect the process moving at all ;) I´ll check with the RHOS team what expectations they have on this functionality and let you know.
FYI: TTL is an optional feature of Qpid, it's not used in all applications. It allows you to specify that a message will be dropped if it is not consumed after a specified expiry time. Typically it is used for messages that become irrelevant if not consumed in time and/or to avoid build-up of non-critical messages when consumers are slow to drain a queue. TTL is specified in milliseconds, but in practice it is often set to multiple seconds - something considered "too long" in the context of the system. At each clock "tick" the cluster expires all messages that are due to expire up to the present time. A longer duration, e.g. 1 sec, means that a message may be delivered up to 1 sec after it should have expired. However, there is no long-term build up of late messages. With a clock-duration of 1 sec, at most 1 seconds worth of "late-expired" messages are available for delivery at any time. > openstack is basically using qpidd as RPC implementation (given or > taken). Classic is client send message for servers to consume. > Servers take the message out of the queue and act on it.. (news at 11 ;)). I believe openstack uses TTL in the RPC interface to drop request messages if they are not processed inside a time limit [see topic_send() https://github.com/openstack/oslo-incubator/blob/master/openstack/common/rpc/impl_qpid.py] I think that is a use case that can tolerate some lateness. > Would a change of that setting affects performance? throughput? will all > servers still receive all the messages? It would only affect the system when there is a backlog and requests are timing out. Assuming the interval is 1 sec, It could result in requests being processed up to 1 sec after they should have timed out. So it can result in more messages being delivered to the server under load. However since they are using TTL, the system must already be coded to handle uncertainty about delivery of these messages in a situation of overload. So adding a little extra uncertainty is probably not a problem provided it's not drastically out of line with the TTL values that they are using. I don't know what TTL values they are using so not sure what's reasonable here.
Testing with 100 seems to work fine for the current workload. I don´t expect any issue for now. The config change has been documented in the RHOS+RHEL-HA etherpad and will propagate around in due course. It might be a good idea tho to have kb article on this topic for future generations :)
We will need a small RHOS+MRG task force to debug a series of issues I am seeing. It's not trivial to reproduce some of them and some appears to be related to cluster timing. https://bugzilla.redhat.com/show_bug.cgi?id=1036523 https://bugzilla.redhat.com/show_bug.cgi?id=1036518
Needinfo for Alan. (In reply to Fabio Massimo Di Nitto from comment #5) > We will need a small RHOS+MRG task force to debug a series of issues I am > seeing. > > It's not trivial to reproduce some of them and some appears to be related to > cluster timing. > > https://bugzilla.redhat.com/show_bug.cgi?id=1036523 > https://bugzilla.redhat.com/show_bug.cgi?id=1036518
Closing since this has been addressed via configuration. Please raise new bugzillas for any new problems.
clearing needinfo flag