Bug 175372

Summary: Reading in different-sized chunks from /proc/cluster/services gives different results
Product: [Retired] Red Hat Cluster Suite Reporter: Lon Hohberger <lhh>
Component: cmanAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0559 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 21:32:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 175033    
Attachments:
Description Flags
Source code to little reader program. none

Description Lon Hohberger 2005-12-09 16:27:16 UTC
Description of problem:

A customer of ours has a lot of GFS file systems mounted.  This causes
/proc/cluster/services to exceed a page in size, which has caused #175033 to appear.

The initial solution in #175033 was to issue a read and retry reading the whole
entry if we exceeded the buffer size.  Unfortunately, this does not work,
because /proc entries can not be read in chunks >1 page in size.

In investigating #175033 further, I found a general problem with
/proc/cluster/services read handling.  When you issue multiple read() calls to
/proc/cluster/services, you get different results depending on the service group
configuration and the read size.  Basically, if you are forced to issue multiple
reads, some of the service lines may be missing from the output.  I do not think
that reading in page size chunks will guarantee that this loss-of-output will
not occur.

Examples, on my 2-node cluster:

[root@blue ~]# ./reader 4 /proc/cluster/services print
Service          Name                              GID LID State     Code
DLM Lock Space:  "clvmd"                             2   3 run       -
[2 1]

DLM Lock Space:  "_mnt_gfs"                          5   6 run       -
[2 1]

User:            "usrm::manager"                     9   4 run       -
[2 1]

total = 308

--- versus ---

[root@blue ~]# ./reader 128 /proc/cluster/services print
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 1]

DLM Lock Space:  "Magma"                            10   5 run       -
[2 1]

DLM Lock Space:  "_mnt_gfs"                          5   6 run       -
[2 1]

User:            "usrm::manager"                     9   4 run       -
[2 1]

total = 386

--- versus ---

[root@blue ~]# ./reader 4096 /proc/cluster/services print
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[2 1]

DLM Lock Space:  "Magma"                            10   5 run       -
[2 1]

DLM Lock Space:  "_mnt_gfs"                          5   6 run       -
[2 1]

GFS Mount Group: "_mnt_gfs"                          6   7 run       -
[2 1]

User:            "usrm::manager"                     9   4 run       -
[2 1]

total = 542 

Version-Release number of selected component (if applicable): 1.0.2

How reproducible: 100%
Steps to Reproduce:
1. gcc -o reader reader.c 
2. ./reader 4 /proc/cluster/services print
3. ./reader 16 /proc/cluster/services print
4. ./reader 128 /proc/cluster/services print
5. ./reader 4096 /proc/cluster/services print
  
Actual results: Some service group lines missing from output.

Expected results: All service group lines, irrespective of the size of the
read call issued.

Additional info:

The header is always displayed, even with a read size of 1.  The reader program
works fine with other large /proc entries, like /proc/kallsyms.  There does not
seem to be a correlation with what types of service entries are missing, but
rerunning with the same read size always yields the same results.

[root@blue ~]# ./reader 1 /proc/kallsyms 
total = 736632
[root@blue ~]# ./reader 4 /proc/kallsyms 
total = 736632
[root@blue ~]# ./reader 4096 /proc/kallsyms 
total = 736632

Comment 1 Lon Hohberger 2005-12-09 16:27:17 UTC
Created attachment 122078 [details]
Source code to little reader program.

Comment 2 Christine Caulfield 2005-12-14 09:27:20 UTC
The pointer was not being initialised in sm_seq_start when reading was resumed
in the middle of the file. 

This checkin fixes it on -rSTABLE

Checking in sm_misc.c;
/cvs/cluster/cluster/cman-kernel/src/sm_misc.c,v  <--  sm_misc.c
new revision: 1.2.2.1.6.3; previous revision: 1.2.2.1.6.2
done

Comment 3 Christine Caulfield 2005-12-20 12:00:53 UTC
And on -rRHEL4

Checking in sm_misc.c;
/cvs/cluster/cluster/cman-kernel/src/sm_misc.c,v  <--  sm_misc.c
new revision: 1.2.2.4; previous revision: 1.2.2.3
done

Comment 6 Red Hat Bugzilla 2006-08-10 21:32:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0559.html