404831 – "nfsstat -s" shows negative value

Bug 404831 - "nfsstat -s" shows negative value

Summary: "nfsstat -s" shows negative value

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	nfs-utils
Sub Component:
Version:	4.6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Layton
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	440057 447530
Blocks:	391511 456483
TreeView+	depends on / blocked

Reported:	2007-11-29 18:51 UTC by Fabio Olive Leite
Modified:	2018-10-20 00:23 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	The 'nfsstat' command now displays correct statistics. In previous versions, performing more than 2^31 RPC calls could cause the 'nfsstat' command to incorrectly display the number of calls as "negative". This was because 'nfsstat' printed statistics from /proc/net/rpc/* files as signed integers; with this version of nfs-utils, 'nfsstat' now reads and prints these statistics as unsigned integers.
Clone Of:
Environment:
Last Closed:	2009-05-18 19:02:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch to parse and print values correctly. (660 bytes, patch) 2007-11-29 18:51 UTC, Fabio Olive Leite	no flags	Details \| Diff
patch -- parse and print values correctly (1.89 KB, patch) 2008-04-16 16:05 UTC, Jeff Layton	no flags	Details \| Diff
kernel patch for verification (8.30 KB, patch) 2008-04-16 16:07 UTC, Jeff Layton	no flags	Details \| Diff
patch -- parse and print values correctly (2.26 KB, patch) 2008-04-16 16:45 UTC, Jeff Layton	no flags	Details \| Diff
Show Obsolete (2) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:0955	0	normal	SHIPPED_LIVE	Moderate: nfs-utils security and bug fix update	2009-05-18 13:24:03 UTC

Description Fabio Olive Leite 2007-11-29 18:51:27 UTC

Description of problem:
"nfsstat -s" shows negative value.

How reproducible:
When nfsd has taken enough calls to overflow a signed int.

Steps to Reproduce:
Run nfsd service for a long time with many clients doing operations, and 
run "nfsstat -s".

# nfsstat -s
   2007年 10月 24日 水曜日 05:29:02 JST
   Server rpc stats:
   calls      badcalls   badauth    badclnt    xdrcall
>> -2147459460   0          0          0          0         
   Server nfs v3:
   null       getattr    setattr    lookup     access     readlink   
   3       0% 961806770 44% 34884113  1% 356848932 16% 311303203 14% 0  0% 
   read       write      create     mkdir      symlink    mknod      
   58992809  2% 63067804  2% 69266297  3% 12816   0% 0       0% 0       0% 
   remove     rmdir      rename     link       readdir    readdirplus
   103983458  4% 2808    0% 5945055  0% 36564688  1% 34420909  1% 73349994  3% 
   fsstat     fsinfo     pathconf   commit     
   31820   0% 177     0% 0       0% 36532676  1% 

Actual results:
"nfsstat -s" shows negative value.

Expected results:
"nfsstat -s" shows correct value.

Additional info:
/proc/net/rpc/nfsd formats its value as an unsigned int, but nfsstat parses 
this unsigned int with atoi(). It should use atoll().

Comment 1 Fabio Olive Leite 2007-11-29 18:51:28 UTC

Created attachment 273161 [details]
Patch to parse and print values correctly.

Comment 3 RHEL Program Management 2007-11-29 18:54:35 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Jeff Layton 2008-01-04 15:18:43 UTC

Looks simple enough...

Comment 7 Jeff Layton 2008-04-01 14:46:35 UTC

This patch is not upstream and needs to go there first. I'm going to reset this
for 4.8 and we can consider it there. We'll also need to clone this for 5.2.

Comment 8 Jeff Layton 2008-04-01 14:59:22 UTC

...err make that 5.3...

Comment 9 Jeff Layton 2008-04-01 15:15:50 UTC

As far as testing this...

The easiest thing might be to custom roll a kernel that initializes the /proc
fields to values that would cause this. I'll plan to do that to verify the fix
and will post the kernel patch here so that QA can use it too...

Comment 10 Jeff Layton 2008-04-01 18:50:31 UTC

Setting to NEEDINFO for now. Since customer rolled the patch, I'll give them the
option of pushing it upstream. It looks sane AFAICT...

If they'd rather I push it upstream, please let me know the name of the person
to whom I should attribute it.

Comment 13 Jeff Layton 2008-04-16 16:05:02 UTC

Created attachment 302629 [details]
patch -- parse and print values correctly

The other patch missed the heavily-used print_callstats() function. This
corrects that and also changes the atoll() call to be a strtoul() (since the
kernel should be printing unsigned numbers for these anyway).

Tested with a hacked up kernel that initialized many of these counters to
2^31+1.

Comment 14 Jeff Layton 2008-04-16 16:07:24 UTC

Created attachment 302630 [details]
kernel patch for verification

This patch initializes most of the counters that nfsstat reads to 2^31+1. I
missed a few, which are still initialized to 0, but this turns out to be enough
to convince me that the nfsstat patch works as expected when counters overflow.

Comment 15 Jeff Layton 2008-04-16 16:45:14 UTC

Created attachment 302638 [details]
patch -- parse and print values correctly

Respun patch. David Richter pointed out that the sum variable in
parse_pretty_statfile() should also be an unsigned int.

Comment 17 Jeff Layton 2008-06-30 14:07:32 UTC

Patch is now in upstream nfs-utils...

Comment 20 RHEL Program Management 2008-09-18 19:15:45 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 21 Jeff Layton 2008-10-08 11:57:23 UTC

Committed in nfs-utils-1.0.6-88.EL4

Comment 31 Don Domingo 2009-04-29 02:06:28 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The 'nfsstat' command now displays correct statistics. In previous versions, performing more than 2^31 RPC calls could cause the 'nfsstat' command to incorrectly display the number of calls as "negative". This was because 'nfsstat' printed statistics from /proc/net/rpc/* files as signed integers; with this version of nfs-utils, 'nfsstat' now reads and prints these statistics as unsigned integers.

Comment 32 errata-xmlrpc 2009-05-18 19:02:26 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0955.html

Comment 34 IBM Bug Proxy 2011-03-28 16:32:08 UTC

------- Comment From pradeep.com 2010-09-22 13:42 EDT-------
(In reply to comment #5)
> Hello,
> Please confirm the comment or package this is requested for. Please confirm the
> URL to the patch. Please confirm if the patch has been tested by IBM.

Here is the relevant portion of the diff for only this bug (diff is between
dapl-2.0.25 which is the current version in RHEL5.5 and dapl-2.0.30 which is
currently the latest dapl version)

elm3b198:/home/pradeep # diff -Nup dapl-2.0.25/dapl/openib_cma/device.c
dapl-2.0.30/dapl/openib_cma/device.c
--- db2/RHEL5.5/dapl-2.0.25/dapl/openib_cma/device.c    2009-09-28
12:28:31.000000000 -0500
+++ dapl-2.0.30/dapl/openib_cma/device.c        2010-05-19 17:48:47.000000000
-0500

@@ -474,12 +401,6 @@ DAT_RETURN dapls_ib_close_hca(IN DAPL_HC
dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " close_hca: %p->%p\n",
hca_ptr, hca_ptr->ib_hca_handle);

-       if (hca_ptr->ib_hca_handle != IB_INVALID_HANDLE) {
-               if (rdma_destroy_id(hca_ptr->ib_trans.cm_id))
-                       return (dapl_convert_errno(errno, "ib_close_device"));
-               hca_ptr->ib_hca_handle = IB_INVALID_HANDLE;
-       }
-
dapl_os_lock(&g_hca_lock);
if (g_ib_thread_state != IB_THREAD_RUN) {
dapl_os_unlock(&g_hca_lock);
@@ -508,6 +429,23 @@ DAT_RETURN dapls_ib_close_hca(IN DAPL_HC
dapl_os_sleep_usec(1000);
}
bail:
+
+       if (hca_ptr->ib_trans.ib_cq)
+               ibv_destroy_comp_channel(hca_ptr->ib_trans.ib_cq);
+
+       if (hca_ptr->ib_trans.ib_cq_empty) {
+               struct ibv_comp_channel *channel;
+               channel = hca_ptr->ib_trans.ib_cq_empty->channel;
+               ibv_destroy_cq(hca_ptr->ib_trans.ib_cq_empty);
+               ibv_destroy_comp_channel(channel);
+       }
+
+       if (hca_ptr->ib_hca_handle != IB_INVALID_HANDLE) {
+               if (rdma_destroy_id(hca_ptr->ib_trans.cm_id))
+                       return (dapl_convert_errno(errno, "ib_close_device"));
+               hca_ptr->ib_hca_handle = IB_INVALID_HANDLE;
+       }
+
return (DAT_SUCCESS);
}

This patch has been tested within IBM and yes this is requested for RHEL5.5z-stream DAT-2.0 i.e. the dapl-2.0 package.

Note You need to log in before you can comment on or make changes to this bug.