Red Hat Bugzilla – Bug 215237
lvcreate --snapshot crashes machine
Last modified: 2013-02-28 23:04:22 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1)
Description of problem:
the backup script is crashing sometimes (about once a month) when creating a LVM snapshot with lvcreate.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. suspending mailserver to get the mailstore consistent
2. /usr/sbin/lvcreate --size 3G --snapshot --name snap /dev/VolGroup00/LogVol03
3. mount -r /dev/VolGroup00/snap /mnt/snap
4. resume mailserver
5. start tar backup
6. umount partition
regarding to log the backup script is writing it seems that the server crashes after 2. or 3.
- crash is so heavy that you don't get a screen on the console. I am not able to even see a kernel message
Dell 2650 with Perc4DC Raid 5 on 4 discs, latest firmware on the computer and the controller (I even flashed the harddrives!)
backup script ran since 188days and we had 7 crashes during that time. Higher frequency against the end. To be sure that it is not a hardware issue we changed the complete server (except the harddrives) but got a crash 2 days after with a mailstore corruption.
I found no hints in the net that someone else is having that problem. There were issues with RHEL and LVM but it looks like they are fixed since RHEL4 Update 3. We thought about buying an expensive software for backup but it also uses LVM snapshots.
Looking forward for any help.
Here is the part of the backup script:
I added a lot of sync and sleeps to see crashes better in the logfile due to raidcontroller and OS cashes.
echo_and_log "suspending Scalix operation"
/opt/scalix/bin/omsuspend -s 299 &
[ "$?" != "0" ] && exit_with_error "unable to suspend scalix operation"
echo_and_log "creating LVM snapshot"
/usr/sbin/lvcreate --size 3G --snapshot --name snap /dev/VolGroup00/LogVol03
[ "$?" != "0" ] && exit_with_error "unable to create LVM snapshot"
echo_and_log "mounting snapshot"
sync; sleep 20
mount -r /dev/VolGroup00/snap /mnt/snap
[ "$?" != "0" ] && exit_with_error "unable to mount /mnt/snap read only"
echo_and_log "resume Scalix operation"
sync; sleep 5
[ "$?" != "0" ] && exit_with_error "unable to resume scalix operation"
echo_and_log "creating backup"
sync; sleep 5
tar -czf $backup_file /mnt/snap
[ "$?" != "0" ] && echo_and_log "tar of /mnt/snap ended with an error"
[root@mail backup]# df -h
Filesystem Size Used Avail Use% Mounted on
6.0G 3.9G 1.8G 69% /
/dev/sda2 99M 39M 56M 42% /boot
none 1014M 0 1014M 0% /dev/shm
3.0G 514M 2.4G 18% /home
3.0G 2.4G 492M 83% /var
366G 45G 302G 13% /var/opt/scalix
275G 218G 43G 84% /mnt/backup
[root@mail backup]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/sda2 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/mapper/VolGroup00-LogVol02 on /home type ext3 (rw)
/dev/mapper/VolGroup00-LogVol01 on /var type ext3 (rw)
/dev/mapper/VolGroup00-LogVol03 on /var/opt/scalix type ext3 (rw)
/dev/mapper/VolGroup01-backup on /mnt/backup type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
Please can you provide more information about this crash ?
- anything related in syslog
- is it SMP system ? How much memory is there ?
- run script with debugging info - add 'activation = 1' in the log section of
lvm.conf and run lvm commands with -vvvv to log debug messages
Please attach output of (before and after crash - of course if it is possible)
dmsetup info -c
and if possible full process and memory info after crash (to syslog)
echo t > /proc/sysrq-trigger
echo m > /proc/sysrq-trigger
(or enable sysrq key and use it from console)
(You can try to create separate volume group for root volume and separate for
data volume + snapshots to avoid some locking issues.
Also significant is amount of free physical memory - not swap.)
Since there are insufficient details provided in this report I am closing this bug.
Please if you see this issue again, reopen the bug and provide requested
information, thank you.