Bug 254185

Summary: lvcreate causes "kernel: general protection fault" then future lvm processes hang
Product: [Fedora] Fedora Reporter: Philip Spencer <pspencer>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 6   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-05 21:04:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log file of kernel gpf and lvcreate process stack trace none

Description Philip Spencer 2007-08-24 16:46:55 UTC
Not sure if this belongs under kernel or lvm2 -- I'm putting it under kernel
since I *think* from looking at the logs that the problem is occurring in
kernelspace (and it happend shortly after updating the kernel), but it's
triggered by lvcreate so please move it if needed.

Description of problem:

When nightly backup script tried to create a snapshot of /var with
"lvcreate -L 1G -s -n snapvar /dev/vg1/var", the lvcreate command hung in
an unkillable wait stage (or perhaps the first lvcreate segfaulted and exited
and the script tried it again and the second one hung in the unkillable wait
stage) and the kernel logged

"kernel: general protection fault: 0000 [1] SMP"

followed by more details and stack trace of lvcreate (attached). A reboot was
needed to get the system back into a state where lvm/dm commands would work
again. There was heavy disk usage on /var at the time due to a mail loop. The
file system type is reiserfs.

The backups on the previous night and the subsequent nights proceeded without
errors so this is not yet a reproducible problem. Prior to this, with earlier
kernels, backups proceeded nightly for years with no issues like this.

Version-Release number of selected component (if applicable):

kernel: kernel-2.6.22.2-42.fc6
lvm2:   lvm2-2.02.17-1.fc6

How reproducible:

So far, it has happened only once. However, we've only been running this kernel
version for three days, so if this is a new problem introduced with the 2.6.22
kernel we may see it again. If it does happen again, I'll update this bug report.

Steps to Reproduce:

It may be possible to reproduce by creating a reiserfs filesystem on a 2-CPU
system, putting it under heavy disk load, then using lvcreate to make a
snapshot, remove it again, and repeat until the problem occurs. So far, though,
I have not been able to reproduce it, so I am reporting this bug just in case
someone with more kernel or lvm knowledge has some ideas based on the log file
attached.

Additional info:

See attached log file.

Note: I set this as severity:high since it does render many lvm commands
unusable until a reboot, but priority:low since I realize more information or
reproducibility may be needed first.

Comment 1 Philip Spencer 2007-08-24 16:46:55 UTC
Created attachment 172435 [details]
Log file of kernel gpf and lvcreate process stack trace

Comment 2 Philip Spencer 2007-08-31 19:20:41 UTC
It's been a week (running the same set of lvcreate's every night) and the
problem has not recurred, so perhaps this isn't a new problem in
kernel-2.6.22.2-42.fc6 after all, but something obscure and nonreproducible.

I'll wait a week longer and see if it recurs or if anyone else reports it
happening to them as well.

Comment 3 Chuck Ebbert 2007-09-05 21:04:05 UTC
There are some very obscure bugs in the sysfs_hash_and_remove/kref_put code that
won't be fully fixed until 2.6.23. Will close this bug, as the fixes are already
upstream (and will be in Fedora 8.) The code was completely rewritten to solve
this problem.