Bug 175372

Summary:

Reading in different-sized chunks from /proc/cluster/services gives different results

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Lon Hohberger <lhh>

Component:

cman

Assignee:

Christine Caulfield <ccaulfie>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

cluster-maint, teigland

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2006-0559

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2006-08-10 21:32:13 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

175033

Attachments:

Description	Flags
Source code to little reader program.	none

Description Lon Hohberger 2005-12-09 16:27:16 UTC

Description of problem:

A customer of ours has a lot of GFS file systems mounted.  This causes
/proc/cluster/services to exceed a page in size, which has caused #175033 to appear.

The initial solution in #175033 was to issue a read and retry reading the whole
entry if we exceeded the buffer size.  Unfortunately, this does not work,
because /proc entries can not be read in chunks >1 page in size.

In investigating #175033 further, I found a general problem with
/proc/cluster/services read handling.  When you issue multiple read() calls to
/proc/cluster/services, you get different results depending on the service group
configuration and the read size.  Basically, if you are forced to issue multiple
reads, some of the service lines may be missing from the output.  I do not think
that reading in page size chunks will guarantee that this loss-of-output will
not occur.

Examples, on my 2-node cluster:

[root@blue ~]# ./reader 4 /proc/cluster/services print
Service          Name                              GID LID State     Code
DLM Lock Space:  "clvmd"                             2   3 run       -
[2 1]

DLM Lock Space:  "_mnt_gfs"                          5   6 run       -
[2 1]

User:            "usrm::manager"                     9   4 run       -
[2 1]

total = 308

--- versus ---

[root@blue ~]# ./reader 128 /proc/cluster/services print
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 1]

DLM Lock Space:  "Magma"                            10   5 run       -
[2 1]

DLM Lock Space:  "_mnt_gfs"                          5   6 run       -
[2 1]

User:            "usrm::manager"                     9   4 run       -
[2 1]

total = 386

--- versus ---

[root@blue ~]# ./reader 4096 /proc/cluster/services print
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[2 1]

DLM Lock Space:  "Magma"                            10   5 run       -
[2 1]

DLM Lock Space:  "_mnt_gfs"                          5   6 run       -
[2 1]

GFS Mount Group: "_mnt_gfs"                          6   7 run       -
[2 1]

User:            "usrm::manager"                     9   4 run       -
[2 1]

total = 542 

Version-Release number of selected component (if applicable): 1.0.2

How reproducible: 100%
Steps to Reproduce:
1. gcc -o reader reader.c 
2. ./reader 4 /proc/cluster/services print
3. ./reader 16 /proc/cluster/services print
4. ./reader 128 /proc/cluster/services print
5. ./reader 4096 /proc/cluster/services print
  
Actual results: Some service group lines missing from output.

Expected results: All service group lines, irrespective of the size of the
read call issued.

Additional info:

The header is always displayed, even with a read size of 1.  The reader program
works fine with other large /proc entries, like /proc/kallsyms.  There does not
seem to be a correlation with what types of service entries are missing, but
rerunning with the same read size always yields the same results.

[root@blue ~]# ./reader 1 /proc/kallsyms 
total = 736632
[root@blue ~]# ./reader 4 /proc/kallsyms 
total = 736632
[root@blue ~]# ./reader 4096 /proc/kallsyms 
total = 736632

Comment 1 Lon Hohberger 2005-12-09 16:27:17 UTC

Created attachment 122078 [details]
Source code to little reader program.

Comment 2 Christine Caulfield 2005-12-14 09:27:20 UTC

The pointer was not being initialised in sm_seq_start when reading was resumed
in the middle of the file. 

This checkin fixes it on -rSTABLE

Checking in sm_misc.c;
/cvs/cluster/cluster/cman-kernel/src/sm_misc.c,v  <--  sm_misc.c
new revision: 1.2.2.1.6.3; previous revision: 1.2.2.1.6.2
done

Comment 3 Christine Caulfield 2005-12-20 12:00:53 UTC

And on -rRHEL4

Checking in sm_misc.c;
/cvs/cluster/cluster/cman-kernel/src/sm_misc.c,v  <--  sm_misc.c
new revision: 1.2.2.4; previous revision: 1.2.2.3
done

Comment 6 Red Hat Bugzilla 2006-08-10 21:32:13 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0559.html