175372 – Reading in different-sized chunks from /proc/cluster/services gives different results

Bug 175372 - Reading in different-sized chunks from /proc/cluster/services gives different results

Summary: Reading in different-sized chunks from /proc/cluster/services gives different...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	cman
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Christine Caulfield
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	175033
TreeView+	depends on / blocked

Reported:	2005-12-09 16:27 UTC by Lon Hohberger
Modified:	2009-04-16 20:00 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2006-0559
Clone Of:
Environment:
Last Closed:	2006-08-10 21:32:13 UTC
Embargoed:

Attachments	(Terms of Use)
Source code to little reader program. (667 bytes, text/plain) 2005-12-09 16:27 UTC, Lon Hohberger	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0559	0	normal	SHIPPED_LIVE	cman-kernel bug fix update	2006-08-10 04:00:00 UTC

Description Lon Hohberger 2005-12-09 16:27:16 UTC

Description of problem:

A customer of ours has a lot of GFS file systems mounted.  This causes
/proc/cluster/services to exceed a page in size, which has caused #175033 to appear.

The initial solution in #175033 was to issue a read and retry reading the whole
entry if we exceeded the buffer size.  Unfortunately, this does not work,
because /proc entries can not be read in chunks >1 page in size.

In investigating #175033 further, I found a general problem with
/proc/cluster/services read handling.  When you issue multiple read() calls to
/proc/cluster/services, you get different results depending on the service group
configuration and the read size.  Basically, if you are forced to issue multiple
reads, some of the service lines may be missing from the output.  I do not think
that reading in page size chunks will guarantee that this loss-of-output will
not occur.

Examples, on my 2-node cluster:

[root@blue ~]# ./reader 4 /proc/cluster/services print
Service          Name                              GID LID State     Code
DLM Lock Space:  "clvmd"                             2   3 run       -
[2 1]

DLM Lock Space:  "_mnt_gfs"                          5   6 run       -
[2 1]

User:            "usrm::manager"                     9   4 run       -
[2 1]

total = 308

--- versus ---

[root@blue ~]# ./reader 128 /proc/cluster/services print
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 1]

DLM Lock Space:  "Magma"                            10   5 run       -
[2 1]

DLM Lock Space:  "_mnt_gfs"                          5   6 run       -
[2 1]

User:            "usrm::manager"                     9   4 run       -
[2 1]

total = 386

--- versus ---

[root@blue ~]# ./reader 4096 /proc/cluster/services print
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[2 1]

DLM Lock Space:  "Magma"                            10   5 run       -
[2 1]

DLM Lock Space:  "_mnt_gfs"                          5   6 run       -
[2 1]

GFS Mount Group: "_mnt_gfs"                          6   7 run       -
[2 1]

User:            "usrm::manager"                     9   4 run       -
[2 1]

total = 542 

Version-Release number of selected component (if applicable): 1.0.2

How reproducible: 100%
Steps to Reproduce:
1. gcc -o reader reader.c 
2. ./reader 4 /proc/cluster/services print
3. ./reader 16 /proc/cluster/services print
4. ./reader 128 /proc/cluster/services print
5. ./reader 4096 /proc/cluster/services print
  
Actual results: Some service group lines missing from output.

Expected results: All service group lines, irrespective of the size of the
read call issued.

Additional info:

The header is always displayed, even with a read size of 1.  The reader program
works fine with other large /proc entries, like /proc/kallsyms.  There does not
seem to be a correlation with what types of service entries are missing, but
rerunning with the same read size always yields the same results.

[root@blue ~]# ./reader 1 /proc/kallsyms 
total = 736632
[root@blue ~]# ./reader 4 /proc/kallsyms 
total = 736632
[root@blue ~]# ./reader 4096 /proc/kallsyms 
total = 736632

Comment 1 Lon Hohberger 2005-12-09 16:27:17 UTC

Created attachment 122078 [details]
Source code to little reader program.

Comment 2 Christine Caulfield 2005-12-14 09:27:20 UTC

The pointer was not being initialised in sm_seq_start when reading was resumed
in the middle of the file. 

This checkin fixes it on -rSTABLE

Checking in sm_misc.c;
/cvs/cluster/cluster/cman-kernel/src/sm_misc.c,v  <--  sm_misc.c
new revision: 1.2.2.1.6.3; previous revision: 1.2.2.1.6.2
done

Comment 3 Christine Caulfield 2005-12-20 12:00:53 UTC

And on -rRHEL4

Checking in sm_misc.c;
/cvs/cluster/cluster/cman-kernel/src/sm_misc.c,v  <--  sm_misc.c
new revision: 1.2.2.4; previous revision: 1.2.2.3
done

Comment 6 Red Hat Bugzilla 2006-08-10 21:32:13 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0559.html

Note You need to log in before you can comment on or make changes to this bug.