Bug 126531 - dlm slowness with shared file IO from multiple nodes
dlm slowness with shared file IO from multiple nodes
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Ken Preslan
Derek Anderson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-06-22 17:15 EDT by Dean Jansa
Modified: 2010-01-11 21:52 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-14 10:55:42 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dean Jansa 2004-06-22 17:15:20 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.1; Linux)

Description of problem:

Description of problem:
Shared file IO using lock_dlm is slow.

A test running to a single file in a gfs fs, from a single node is getting ~5000 requests/sec -- ~2,560,000 bytes/sec  (Each request is a 512 byte read/write).

Add a second node going after the same file, and you get poor performance.

The two processes, started at the same time will run for a short while at about 250 ops per sec, for perhaps a few seconds, then one of the
nodes stops (actually the IO rate drops to 1 op every 4+ hours) and
the other node runs in fits and spurts, perhaps 1-2 ops per minute.

For reference:

IO to a clvm vol from one node:
1200 req/seq ~ 614,400 bytes/sec

clvm from two nodes to same vol:
1000 req/seq ~ 512,000 bytes/sec - 512 k/sec

IO to raw disk from one node:
10,000 ops/sec  5,120,000 bytes/sec  ~ 5000 k /sec

from 2 nodes:
8,000 ops/sec  4,096,000 bytes/sec  ~ 4000 k /sec


Version-Release number of selected component (if applicable):
Lock_DLM (built Jun 17 2004 10:54:06) installed 

How reproducible:
Always

Steps to Reproduce:
1. Assume you are currently cd'ed into a gfs fs.
2. Run: iogen -t 1b -T 1b  10000b:sharefile | doio -m 1 on each node.
Start with a single node,  then try 2 nodes starting at once.

3. b_iogen -t 1b -T 1b  -d /dev/dean/lvol0 | b_doio -m 100
 and b_iogen -t 1b -T 1b  -d /dev/sda | b_doio -m 1000 was used for the clvm and raw disk IO numbers.

   

Additional info:
Comment 1 Christine Caulfield 2004-07-14 12:59:12 EDT
an email from Ken:

It's a problem I'd seen once or twice before, but it didn't seem to happen
too much and fell to the bottom of the todo pile.  But aparently there's
something about the way that the DLM threads work that triggers it.

The solution to the problem should be part of GFS.  Either GFS refuses to
respond to a callback for some minimum amount of time after a page fault,
or GFS somehow hooks into the scheduling code so the lock isn't released
until after the process that faulted gets run for a timeslice.

Either way, I think you can blame the bug on me and stop looking at it.
:-)
Comment 2 Ken Preslan 2004-09-13 18:53:40 EDT
I just checked in code that should fix this.

Would QA please verify that this solves the problem for them.
Thanks.

Comment 3 Ken Preslan 2004-09-13 18:55:47 EDT
Oops.  Didn't mean to close the bug.

Comment 4 Dean Jansa 2004-09-14 10:55:42 EDT
Looks good now...  I ran up to 6 nodes read/write to the shared file 
and other than the expected slow down as each node was added it 
seemed OK. 
Comment 5 Kiersten (Kerri) Anderson 2004-11-16 14:03:29 EST
Updating version to the right level in the defects.  Sorry for the storm.

Note You need to log in before you can comment on or make changes to this bug.