Bug 113490

Summary: LVM Volumes hang processes under load
Product: Red Hat Enterprise Linux 3 Reporter: nathan r. hruby <nhruby>
Component: lvmAssignee: Stephen Tweedie <sct>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-01-14 22:40:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of echo t > /proc/sysrq-trigger none

Description nathan r. hruby 2004-01-14 16:54:00 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Gecko/20030225

Description of problem:
This may or may not be a bug with lvm... Not having a ton of experince
with linux lvm, I ain't sure.

Hardware: 
 - Dell PE 6450
 - (4) p3 Katami @ 550Mhz
 - 3GB RAM 
 - Emulex 9802 FC HBA w/ EMC approved OSS lpfcdd 1.32a driver
 - Perc2/Si with 3 8GB disks for OS

Software:
 - RHEL AS 3 (real, not white box or a beta) with all updates
including kernel

alfredo.cc# uname -a Linux alfredo.cc.uga.edu 2.4.21-4.0.2.ELsmp #1
SMP Thu Dec 18 19:18:04 EST 2003 i686 i686 i386 GNU/Linux
alfredo.cc# 

So we have a this machine hooked to our EMC SAN.  It can find see and
use the 60 drives we've assigned it and we've stuck them into a single
large LVM VG and created 1 LV from that:

alfredo.cc# vgdisplay
--- Volume group ---
VG Name               mirror_vg
VG Access             read/write
VG Status             available/resizable
VG #                  0
MAX LV                256
Cur LV                1
Open LV               1
MAX LV Size           511.98 GB
Max PV                256
Cur PV                60
Act PV                60
VG Size               504.84 GB
PE Size               8 MB
Total PE              64620
Alloc PE / Size       64620 / 504.84 GB
Free  PE / Size       0 / 0
VG UUID               eCQ1OK-kbQE-B4KQ-4IM8-P83Z-ij0p-UXwP88

alfredo.cc# lvdisplay /dev/mirror_vg/mirror_lv 
--- Logical volume ---
LV Name                /dev/mirror_vg/mirror_lv
VG Name                mirror_vg
LV Write Access        read/write
LV Status              available
LV #                   1
# open                 1
LV Size                504.84 GB
Current LE             64620
Allocated LE           64620
Stripes                60
Stripe size (KByte)    32
Allocation             next free
Read ahead sectors     1024
Block device           58:0

# Also made a ext3 filesystem on this with mke2fs defaults plus -j option

alfredo.cc# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda3              3020172    223408   2643344   8% /
/dev/sda1               202220     22019    169761  12% /boot
none                   1547400         0   1547400   0% /dev/shm
/dev/sda2              3700100    828016   2684124  24% /usr
/dev/sda5              1573272     60364   1432988   5% /var
/dev/mirror_vg/mirror_lv
                     521060480     48084 510425056   1% /array1


What happens is that anytime we put any sort of load on the LVM volume
whatever process is doing the writing gets locked ina  waiting for I/O
loop and is unkillable.  We end up needing to hard reboot the box to
get it to go away.  That has happend with bonnie++ aas well as rsync.

alfredo.cc# ps ax | grep D
  PID TTY      STAT   TIME COMMAND
   14 ?        DW     0:00 [kupdated]
 3079 ?        DW     0:00 [kjournald]
 3094 pts/0    D      0:01 rsync -avz aldente.cc.uga.edu::ftp ./
 3250 pts/1    S      0:00 grep D
alfredo.cc# 

Note that the rsync has been "running" for an hour while I searched
bugzilla about this :)  Thinking that we just made the LVM mirror
wrong, we recreated the LV today with the values above, previous the
LV had a 4k stripe size.  With the 4k stripe size, it'd take about an
hour to get into a I/O wait state, witht he 32k stripe size it takes
about 10 minutes.

I've tried to get a system wide trace (as presribed in another
bugzilla ticket) with  "echo -n t > /proc/sysrq-trigger" but nothing
seems to be written.  This machine is in the dadta center so I don;t
have easy console access, so if it's there, I'm a dumbass and will go
walk over there and get it, just LMK :)

So to me this looks like we're just asking too much from LVM and it's
just hanging processes.  Any thoughts?

Version-Release number of selected component (if applicable):
lvm-1.0.3-15

How reproducible:
Always

Steps to Reproduce:
1. Create large 500GB+ LVM LV
2. Write to it
3.
    

Actual Results:  Process doing the write hangs in I/O

Expected Results:  Process writes data as needed happily and quickly :)

Additional info:

Comment 1 Stephen Tweedie 2004-01-14 17:46:14 UTC
We really need the stack data from alt-sysrq-t to diagnose this. 
Netconsole and serial console may both be ways you could get this. 
LVM should not be responsible for this --- I've seen perfectly
reliable disk arrays using ~800GB of LVM on top of software raid5 ---
so it's legitimate to ask LVM to work in this case!  But the problem
may be elsewhere, and we'll need console logs and a driver list to
diagnose it.


Comment 2 nathan r. hruby 2004-01-14 20:20:50 UTC
Created attachment 96991 [details]
output of echo t > /proc/sysrq-trigger

Comment 3 nathan r. hruby 2004-01-14 20:25:09 UTC
trace attached above.. Assuming when you say "driver list" you mean
the following?  If not Let me know :)  Also, jsut to be pedantic, the
disk array in question is an EMC Symmetirx connected over Fiber
Channel (via the lpfcdd module), not plain old direct attached SCSI disks.

alfredo.cc# lsmod
Module                  Size  Used by    Not tainted
netconsole             16428   0  (unused)
e100                   59140   1 
floppy                 59056   0  (autoclean)
microcode               5248   0  (autoclean)
loop                   12888   0  (autoclean)
ext3                   95784   4 
jbd                    56856   4  [ext3]
lvm-mod                65312   3 
lpfcdd                283368  60 
megaraid               31212   5 
sd_mod                 13744 130 
scsi_mod              116904   3  [lpfcdd megaraid sd_mod]
alfredo.cc# 


Comment 4 Stephen Tweedie 2004-01-14 22:40:26 UTC
Looks from the trace like there are a few processes stuck waiting for
IO completion, but there's no sign of an LVM footprint.  This looks
more like a driver bug than anything else, and I'm afraid we can't
support EMC's own drivers, especially not via bugzilla. 

For help in getting a supported driver up and running, please lodge a
support ticket.

Comment 5 nathan r. hruby 2004-01-15 03:12:25 UTC
[genbral note in case this ever shows up in someone else's bugzilla
searches]

Well, they're Emulex's drivers certified for use on RHEL 2.1 and RHL
8.0/9  with EMC gear, so it's alittle bit of everybody :)  

Thanks for the analysis and sorry for the noise.  I'm going to try
rebuilding the driver with an alternate option set and see if that
helps, if not I'll have a nice chat with RH Support as well as Emulex
and EMC.

Thanks again for the quick response!

Comment 6 Stephen Tweedie 2004-01-15 11:38:56 UTC
Yep, but I've seen plenty of cases where vendor-supplied drivers have
had curious and interesting problems under load. :-)  The support
folks will be able to connect you to the people most likely to have
had experience of drivers for that specific hardware, in case there
are known problems, it's not a setup I've ever used myself.

Comment 7 nathan r. hruby 2004-01-18 20:11:08 UTC
Just to update the archives...

After much fussing with the driver, SAN fabric and other fun things,
on a whim I blew away the LV and VG and recreated the VG with a 32M PE
size (instead of teh default 8MB) and made the LV a concat instead of
a stripe and all is well (rsync happliy filling the disk as I type
this).  

I think the LV being a concat and not a stripe has more to do with it
than the bigger PE size as it was probably just overrunning our Symm.

Thanks again!

-n