Bug 215237

Summary: lvcreate --snapshot crashes machine
Product: Red Hat Enterprise Linux 4 Reporter: Kirchner <w.kirchner>
Component: lvm2Assignee: Milan Broz <mbroz>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Corey Marthaler <cmarthal>
Severity: high Docs Contact:
Priority: medium    
Version: 4.4CC: agk, dwysocha, jbrassow, mbroz, m.hanisch, prockai, pvrabec
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-01-29 17:20:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kirchner 2006-11-12 18:50:06 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1)

Description of problem:
the backup script is crashing sometimes (about once a month) when creating a LVM snapshot with lvcreate.


Version-Release number of selected component (if applicable):
lvm2-2.02.06-6.0.RHEL4 kernel-2.6.9-42.0.3.EL

How reproducible:
Couldn't Reproduce


Steps to Reproduce:
1. suspending mailserver to get the mailstore consistent
2. /usr/sbin/lvcreate --size 3G --snapshot --name snap /dev/VolGroup00/LogVol03
3. mount -r /dev/VolGroup00/snap /mnt/snap
4. resume mailserver
5. start tar backup
6. umount partition
7. lvremove

Actual Results:
regarding to log the backup script is writing it seems that the server crashes after 2. or 3.
- crash is so heavy that you don't get a screen on the console. I am not able to even see a kernel message

Expected Results:
no crash

Additional info:
Dell 2650 with Perc4DC Raid 5 on 4 discs, latest firmware on the computer and the controller (I even flashed the harddrives!)

backup script ran since 188days and we had 7 crashes during that time. Higher frequency against the end. To be sure that it is not a hardware issue we changed the complete server (except the harddrives) but got a crash 2 days after with a mailstore corruption.

I found no hints in the net that someone else is having that problem. There were issues with RHEL and LVM but it looks like they are fixed since RHEL4 Update 3. We thought about buying an expensive software for backup but it also uses LVM snapshots.

Looking forward for any help.


Here is the part of the backup script:
I added a lot of sync and sleeps to see crashes better in the logfile due to raidcontroller and OS cashes.
=================================
echo_and_log "suspending Scalix operation"
/opt/scalix/bin/omsuspend -s 299 &
[ "$?" != "0" ] && exit_with_error "unable to suspend scalix operation"
sleep 150
sync
sleep 10
sync
echo_and_log "creating LVM snapshot"
sync;sleep 5
/usr/sbin/lvcreate --size 3G --snapshot --name snap /dev/VolGroup00/LogVol03
[ "$?" != "0" ] && exit_with_error "unable to create LVM snapshot"
echo_and_log "mounting snapshot"
sync; sleep 20
mount -r /dev/VolGroup00/snap /mnt/snap
[ "$?" != "0" ] && exit_with_error "unable to mount /mnt/snap read only"
echo_and_log "resume Scalix operation"
sync; sleep 5
/opt/scalix/bin/omsuspend -r
[ "$?" != "0" ] && exit_with_error "unable to resume scalix operation"
echo_and_log "creating backup"
sync; sleep 5
backup_file="$BACKUP_DIR/snapshots/snap-${DATE}-mail.tgz"
tar -czf $backup_file /mnt/snap
[ "$?" != "0" ] && echo_and_log "tar of /mnt/snap ended with an error"

df -h
=====================
[root@mail backup]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      6.0G  3.9G  1.8G  69% /
/dev/sda2              99M   39M   56M  42% /boot
none                 1014M     0 1014M   0% /dev/shm
/dev/mapper/VolGroup00-LogVol02
                      3.0G  514M  2.4G  18% /home
/dev/mapper/VolGroup00-LogVol01
                      3.0G  2.4G  492M  83% /var
/dev/mapper/VolGroup00-LogVol03
                      366G   45G  302G  13% /var/opt/scalix
/dev/mapper/VolGroup01-backup
                      275G  218G   43G  84% /mnt/backup


mount
==============================
[root@mail backup]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/sda2 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/mapper/VolGroup00-LogVol02 on /home type ext3 (rw)
/dev/mapper/VolGroup00-LogVol01 on /var type ext3 (rw)
/dev/mapper/VolGroup00-LogVol03 on /var/opt/scalix type ext3 (rw)
/dev/mapper/VolGroup01-backup on /mnt/backup type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

Comment 1 Milan Broz 2006-11-14 16:23:44 UTC
Please can you provide more information about this crash ?

- anything related in syslog
- is it SMP system ? How much memory is there ?

- run script with debugging info - add 'activation = 1' in the log section of
lvm.conf and run lvm commands with -vvvv to log debug messages

Please attach output of (before and after crash - of course if it is possible)
dmsetup info -c
dmsetup table
dmsetup status

and if possible full process and memory info after crash (to syslog)
  echo t > /proc/sysrq-trigger
  echo m > /proc/sysrq-trigger
(or enable sysrq key and use it from console)

(You can try to create separate volume group for root volume and separate for
data volume + snapshots to avoid some locking issues.
Also significant is amount of free physical memory - not swap.)

Comment 2 Milan Broz 2007-01-29 17:20:55 UTC
Since there are insufficient details provided in this report I am closing this bug.
Please if you see this issue again, reopen the bug and provide requested
information, thank you.