Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 545793

Summary: ipvsadm connection counters reporting zero
Product: Red Hat Enterprise Linux 5 Reporter: Robin Bowes <robin.bowes>
Component: kernelAssignee: Jiri Pirko <jpirko>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: cluster-maint, jolsa, jpirko, nhorman, rkhan, tgraf, uwe.knop
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-10 12:59:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robin Bowes 2009-12-09 11:43:38 UTC
Description of problem:

The connections per second counters seem to reset to 0

ie. the cps column produced by ipvsadm --list --rate

Version-Release number of selected component (if applicable):

ipvsadm-1.24-10

How reproducible:

I have an LVS set up using 8 real servers. If I use ipvsadm --list --rate to see the current statistics and put some traffic through the system, the CPS reported by ipvsadm --list --rate initially increases as would be expected, but then returns to 0.

Steps to Reproduce:

1. Setup an LVS load balancing service. Mine has 8 real servers on port 80.
2. On the LVS director, I run the following to monitor traffic:

watch -n1 ipvsadm --list --rate

This produces a table (updated every second) like this:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port                 CPS    InPPS   OutPPS    InBPS   OutBPS
  -> RemoteAddress:Port
TCP  192.168.55.60:http                  0        0        0       19        0
  -> b011.private.b.example.com          0        0        0        7        0
  -> b013.private.b.example.com          0        0        0        0        0
  -> b012.private.b.example.com          0        0        0        0        0
  -> b010.private.b.example.com          0        0        0        2        0
  -> b008.private.b.example.com          0        0        0        9        0
  -> b009.private.b.example.com          0        0        0        0        0
  -> b007.private.b.example.com          0        0        0        0        0
  -> b006.private.b.example.com          0        0        0        0        0

3. From another host, use ab to put traffic through the LB:

ab -n 1000000 -c 10  "http://192.168.55.60/counter.php?sc_project=5336943&security=65b9268f"

4. Monitor the IPVS stats.

Actual results:

Initially, all reported values climb normally. But, CPS seems to get to a value under 300 (I've seen 279) and then begin to fall again, even though the other values continue to increase or remain consistent. CPS eventually hits 0 stays there for a short period of time (not sure how long).

If I stop the traffic and allow the counters to all fall to 0 then restart the traffic, the CPS counter doesn't begin to climb again, ie. it stays at 0.

If I stop the traffic and wait a short while (5 mins) then restart the traffic, the CPS counter beings working again, but hits the same limit and falls to 0.

Expected results:

I would expect the CPS figure to be reported correctly.

Additional info:

Using CentOS 5.4, kernel 2.6.18-164.6.1.el5, ipvsadm 1.24-10

I initially saw this issue when viewing statistics over snmp with the net-snmp-lvs-module but it seems that the problem is in (lib-)ipvsadm somewhere as te ipvsadm tool shows the same problem.

Comment 1 Jan Friesse 2010-05-20 12:00:06 UTC
(In reply to comment #0)
I was trying to reproduce problem on RHEL 5.5 unsuccessfully.

Can you please send:
- Which scheduler are you using?
- what type of LVS you are using (NAT/DR/Tunnel)?
- are you able to reproduce issue on RHEL 5.5 (CentOS 5.5)?

If you are able to reproduce issue, can you please take a look to /proc/net/ip_vs_stats and see what content is there? Because if there will be 0 in Conns/s column, it is kernel issue and we can safely reassign bug to kernel.

Honza

Comment 2 Robin Bowes 2010-05-20 12:30:21 UTC
I've since upgraded to 5.5 and see the same issue.

Here's some output with this ab command running:

ab -n 1000000 -c 10 "http://xxx.xxx.xxx.xx/counter.php?sc_project=5336943&security=65b9268f"

# cat /proc/net/ip_vs_stats
   Total Incoming Outgoing         Incoming         Outgoing
   Conns  Packets  Packets            Bytes            Bytes
    7175   2930AB        0          D09459F                0

 Conns/s   Pkts/s   Pkts/s          Bytes/s          Bytes/s
       0     5758        0           1BA2BE                0

# ipvsadm --list --rate
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port                 CPS    InPPS   OutPPS    InBPS   OutBPS
  -> RemoteAddress:Port
TCP  xxx.xxx.xxx.xx:http                 0    22390        0  1813591        0
  -> a009:http                           0     1318        0   106728        0
  -> a017:http                           0     1317        0   106653        0
  -> a011:http                           0     1317        0   106695        0
  -> a040:http                           0     1317        0   106689        0
  -> a038:http                           0     1317        0   106705        0
  -> a042:http                           0     1317        0   106713        0
  -> a036:http                           0     1318        0   106736        0
  -> a031:http                           0     1318        0   106734        0
  -> a029:http                           0     1317        0   106702        0
  -> a027:http                           0     1317        0   106651        0
  -> a023:http                           0     1318        0   106727        0
  -> a025:http                           0     1316        0   106598        0
  -> a021:http                           0     1316        0   106621        0
  -> a015:http                           0     1318        0   106719        0
  -> a019:http                           0     1317        0   106644        0
  -> a013:http                           0     1316        0   106615        0
  -> a007:http                           0     1317        0   106661        0

Strangely, I have another server with the same config that appears to be working OK and handling up to 10k connections/sec.

Output from that machine, with live traffic, is as follows:

# cat /proc/net/ip_vs_stats
   Total Incoming Outgoing         Incoming         Outgoing
   Conns  Packets  Packets            Bytes            Bytes
 FF0DFDA 5811F880        0       515ABA2BFB                0

 Conns/s   Pkts/s   Pkts/s          Bytes/s          Bytes/s
    1BD8     988D        0           8BB3A5                0


# ipvsadm --list --rate
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port                 CPS    InPPS   OutPPS    InBPS   OutBPS
  -> RemoteAddress:Port
TCP  xx.xxx.xx.xx:http                7230    39499        0  9291717        0
  -> b036:http                         365     2012        0   473402        0
  -> b040:http                         420     2200        0   516896        0
  -> b042:http                         315     1713        0   398149        0
  -> b038:http                         406     2307        0   540885        0
  -> b031:http                         513     2898        0   690224        0
  -> b029:http                         512     2639        0   608223        0
  -> b027:http                         240     1373        0   326439        0
  -> b013:http                         510     2805        0   671695        0
  -> b022:http                         505     2786        0   661684        0
  -> b025:http                         513     2702        0   628412        0
  -> b011:http                         479     2602        0   619499        0
  -> b019:http                         244     1265        0   292559        0
  -> b021:http                         460     2581        0   605980        0
  -> b016:http                         243     1248        0   289122        0
  -> b015:http                         342     1944        0   458657        0
  -> b009:http                         339     1782        0   422163        0
  -> b007:http                         345     1924        0   448836        0
  -> b005:http                         480     2716        0   638891        0


I don't think I've changed the default scheduling - how do I check? I'm using deadline for /dev/sda IO scheduling, but I'm guessing that's not what you mean.

I'm using DR.

R.

Comment 3 Jan Friesse 2010-05-20 13:01:50 UTC
(In reply to comment #2)
Hi,
I was talking about ipvs scheduler (rr - Round Robin, wrr -  Weighted Round Robin, lc - Least-Connection, ...).

But from what you send it looks like it is really kernel issue (because of cat /proc/net/ip_vs_stats), so ipvsadm behaves correctly.

Take a look on differences between machines specially in:
- iptables configuration
- kernels (specially architecture)

Were you able to find that issue in previous versions (COS 5.1, ...?)

Comment 4 Jan Friesse 2010-07-08 09:33:28 UTC
Changing component to kernel, because bug seems to be kernel problem. If not, please feel free to change component back.

Comment 5 Neil Horman 2010-11-10 18:30:57 UTC
Triage assignment.  If you feel this bug doesn't belong to you, or that it cannot be handled in a timely fashion, please contact me for re-assignment

Comment 6 Jiri Pirko 2011-01-10 08:25:56 UTC
Robin, do you see the issue also using upstream kernel?

Comment 7 Robin Bowes 2011-01-10 09:33:09 UTC
Jiri,

I'm currently using kernel-2.6.18-194.26.1.el5 and I too am unable to reproduce the problem in the same way.

R.

Comment 8 Jiri Pirko 2011-01-10 12:59:21 UTC
(In reply to comment #7)
> Jiri,
> 
> I'm currently using kernel-2.6.18-194.26.1.el5 and I too am unable to reproduce
> the problem in the same way.
> 
> R.

Okay, closing this as CURRENTRELEASE then. Feel free to reopen in case this happens again.