Bug 254185 - lvcreate causes "kernel: general protection fault" then future lvm processes hang
Summary: lvcreate causes "kernel: general protection fault" then future lvm processes ...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 6
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-08-24 16:46 UTC by Philip Spencer
Modified: 2007-11-30 22:12 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-09-05 21:04:05 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Log file of kernel gpf and lvcreate process stack trace (4.06 KB, text/plain)
2007-08-24 16:46 UTC, Philip Spencer
no flags Details

Description Philip Spencer 2007-08-24 16:46:55 UTC
Not sure if this belongs under kernel or lvm2 -- I'm putting it under kernel
since I *think* from looking at the logs that the problem is occurring in
kernelspace (and it happend shortly after updating the kernel), but it's
triggered by lvcreate so please move it if needed.

Description of problem:

When nightly backup script tried to create a snapshot of /var with
"lvcreate -L 1G -s -n snapvar /dev/vg1/var", the lvcreate command hung in
an unkillable wait stage (or perhaps the first lvcreate segfaulted and exited
and the script tried it again and the second one hung in the unkillable wait
stage) and the kernel logged

"kernel: general protection fault: 0000 [1] SMP"

followed by more details and stack trace of lvcreate (attached). A reboot was
needed to get the system back into a state where lvm/dm commands would work
again. There was heavy disk usage on /var at the time due to a mail loop. The
file system type is reiserfs.

The backups on the previous night and the subsequent nights proceeded without
errors so this is not yet a reproducible problem. Prior to this, with earlier
kernels, backups proceeded nightly for years with no issues like this.

Version-Release number of selected component (if applicable):

kernel: kernel-2.6.22.2-42.fc6
lvm2:   lvm2-2.02.17-1.fc6

How reproducible:

So far, it has happened only once. However, we've only been running this kernel
version for three days, so if this is a new problem introduced with the 2.6.22
kernel we may see it again. If it does happen again, I'll update this bug report.

Steps to Reproduce:

It may be possible to reproduce by creating a reiserfs filesystem on a 2-CPU
system, putting it under heavy disk load, then using lvcreate to make a
snapshot, remove it again, and repeat until the problem occurs. So far, though,
I have not been able to reproduce it, so I am reporting this bug just in case
someone with more kernel or lvm knowledge has some ideas based on the log file
attached.

Additional info:

See attached log file.

Note: I set this as severity:high since it does render many lvm commands
unusable until a reboot, but priority:low since I realize more information or
reproducibility may be needed first.

Comment 1 Philip Spencer 2007-08-24 16:46:55 UTC
Created attachment 172435 [details]
Log file of kernel gpf and lvcreate process stack trace

Comment 2 Philip Spencer 2007-08-31 19:20:41 UTC
It's been a week (running the same set of lvcreate's every night) and the
problem has not recurred, so perhaps this isn't a new problem in
kernel-2.6.22.2-42.fc6 after all, but something obscure and nonreproducible.

I'll wait a week longer and see if it recurs or if anyone else reports it
happening to them as well.

Comment 3 Chuck Ebbert 2007-09-05 21:04:05 UTC
There are some very obscure bugs in the sysfs_hash_and_remove/kref_put code that
won't be fully fixed until 2.6.23. Will close this bug, as the fixes are already
upstream (and will be in Fedora 8.) The code was completely rewritten to solve
this problem.



Note You need to log in before you can comment on or make changes to this bug.