Bug 113490
Summary: | LVM Volumes hang processes under load | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | nathan r. hruby <nhruby> | ||||
Component: | lvm | Assignee: | Stephen Tweedie <sct> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | ||||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-01-14 22:40:26 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
nathan r. hruby
2004-01-14 16:54:00 UTC
We really need the stack data from alt-sysrq-t to diagnose this. Netconsole and serial console may both be ways you could get this. LVM should not be responsible for this --- I've seen perfectly reliable disk arrays using ~800GB of LVM on top of software raid5 --- so it's legitimate to ask LVM to work in this case! But the problem may be elsewhere, and we'll need console logs and a driver list to diagnose it. Created attachment 96991 [details]
output of echo t > /proc/sysrq-trigger
trace attached above.. Assuming when you say "driver list" you mean the following? If not Let me know :) Also, jsut to be pedantic, the disk array in question is an EMC Symmetirx connected over Fiber Channel (via the lpfcdd module), not plain old direct attached SCSI disks. alfredo.cc# lsmod Module Size Used by Not tainted netconsole 16428 0 (unused) e100 59140 1 floppy 59056 0 (autoclean) microcode 5248 0 (autoclean) loop 12888 0 (autoclean) ext3 95784 4 jbd 56856 4 [ext3] lvm-mod 65312 3 lpfcdd 283368 60 megaraid 31212 5 sd_mod 13744 130 scsi_mod 116904 3 [lpfcdd megaraid sd_mod] alfredo.cc# Looks from the trace like there are a few processes stuck waiting for IO completion, but there's no sign of an LVM footprint. This looks more like a driver bug than anything else, and I'm afraid we can't support EMC's own drivers, especially not via bugzilla. For help in getting a supported driver up and running, please lodge a support ticket. [genbral note in case this ever shows up in someone else's bugzilla searches] Well, they're Emulex's drivers certified for use on RHEL 2.1 and RHL 8.0/9 with EMC gear, so it's alittle bit of everybody :) Thanks for the analysis and sorry for the noise. I'm going to try rebuilding the driver with an alternate option set and see if that helps, if not I'll have a nice chat with RH Support as well as Emulex and EMC. Thanks again for the quick response! Yep, but I've seen plenty of cases where vendor-supplied drivers have had curious and interesting problems under load. :-) The support folks will be able to connect you to the people most likely to have had experience of drivers for that specific hardware, in case there are known problems, it's not a setup I've ever used myself. Just to update the archives... After much fussing with the driver, SAN fabric and other fun things, on a whim I blew away the LV and VG and recreated the VG with a 32M PE size (instead of teh default 8MB) and made the LV a concat instead of a stripe and all is well (rsync happliy filling the disk as I type this). I think the LV being a concat and not a stripe has more to do with it than the bigger PE size as it was probably just overrunning our Symm. Thanks again! -n |