Bug 254185 - lvcreate causes "kernel: general protection fault" then future lvm processes hang
lvcreate causes "kernel: general protection fault" then future lvm processes ...
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2007-08-24 12:46 EDT by Philip Spencer
Modified: 2007-11-30 17:12 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-09-05 17:04:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Log file of kernel gpf and lvcreate process stack trace (4.06 KB, text/plain)
2007-08-24 12:46 EDT, Philip Spencer
no flags Details

  None (edit)
Description Philip Spencer 2007-08-24 12:46:55 EDT
Not sure if this belongs under kernel or lvm2 -- I'm putting it under kernel
since I *think* from looking at the logs that the problem is occurring in
kernelspace (and it happend shortly after updating the kernel), but it's
triggered by lvcreate so please move it if needed.

Description of problem:

When nightly backup script tried to create a snapshot of /var with
"lvcreate -L 1G -s -n snapvar /dev/vg1/var", the lvcreate command hung in
an unkillable wait stage (or perhaps the first lvcreate segfaulted and exited
and the script tried it again and the second one hung in the unkillable wait
stage) and the kernel logged

"kernel: general protection fault: 0000 [1] SMP"

followed by more details and stack trace of lvcreate (attached). A reboot was
needed to get the system back into a state where lvm/dm commands would work
again. There was heavy disk usage on /var at the time due to a mail loop. The
file system type is reiserfs.

The backups on the previous night and the subsequent nights proceeded without
errors so this is not yet a reproducible problem. Prior to this, with earlier
kernels, backups proceeded nightly for years with no issues like this.

Version-Release number of selected component (if applicable):

kernel: kernel-
lvm2:   lvm2-2.02.17-1.fc6

How reproducible:

So far, it has happened only once. However, we've only been running this kernel
version for three days, so if this is a new problem introduced with the 2.6.22
kernel we may see it again. If it does happen again, I'll update this bug report.

Steps to Reproduce:

It may be possible to reproduce by creating a reiserfs filesystem on a 2-CPU
system, putting it under heavy disk load, then using lvcreate to make a
snapshot, remove it again, and repeat until the problem occurs. So far, though,
I have not been able to reproduce it, so I am reporting this bug just in case
someone with more kernel or lvm knowledge has some ideas based on the log file

Additional info:

See attached log file.

Note: I set this as severity:high since it does render many lvm commands
unusable until a reboot, but priority:low since I realize more information or
reproducibility may be needed first.
Comment 1 Philip Spencer 2007-08-24 12:46:55 EDT
Created attachment 172435 [details]
Log file of kernel gpf and lvcreate process stack trace
Comment 2 Philip Spencer 2007-08-31 15:20:41 EDT
It's been a week (running the same set of lvcreate's every night) and the
problem has not recurred, so perhaps this isn't a new problem in
kernel- after all, but something obscure and nonreproducible.

I'll wait a week longer and see if it recurs or if anyone else reports it
happening to them as well.
Comment 3 Chuck Ebbert 2007-09-05 17:04:05 EDT
There are some very obscure bugs in the sysfs_hash_and_remove/kref_put code that
won't be fully fixed until 2.6.23. Will close this bug, as the fixes are already
upstream (and will be in Fedora 8.) The code was completely rewritten to solve
this problem.

Note You need to log in before you can comment on or make changes to this bug.