From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830 Description of problem: From IT#49956: The customer is having an intermittent problem with a kernel panic on their production web course teaching system. Each time the panic occurs when they are doing a backup to an LTO tape drive of an LVM snapshot volume. Not all backups produce a panic. There are 4 file systems dumped, but the panic has only occurred during the one file system that is a snapshot volumes. All 4 volumes being backed up are LVM volumes. The panic route: EIP is at lvm_find_exception_table [lvm-mod] 0x7b (2.4.21-15.0.3.ELsmp/i686) eax: f8a52fe8 ebx: f909c220 ecx: f8a7ac60 edx: 00000000 esi: 02dd2280 edi: 00000009 ebp: 00000801 esp: d0581d60 ds: 0068 es: 0068 ss: 0068 Process dump (pid: 16383, stackpage=d0581000) Stack: 0001ffff f6b85e18 f52eca00 00000008 00000000 d0581de6 f89441fe 00000801 02dd2280 f52eca00 02dd2280 00000000 02dd0008 02dd2280 f52eca00 00000001 f894065d d0581de6 d0581de0 02dd2280 f52eca00 c0435280 00000000 c0426200 Call Trace: [<f89441fe>] lvm_snapshot_remap_block [lvm-mod] 0x7e (0xd0581d78) [<f894065d>] lvm_map [lvm-mod] 0x20d (0xd0581da0) [<f8940a57>] lvm_make_request_fn [lvm-mod] 0x17 (0xd0581df8) [<c01cd29a>] generic_make_request [kernel] 0xea (0xd0581e04) [<c01cd339>] submit_bh_rsector [kernel] 0x49 (0xd0581e2c) [<c01642d6>] block_read_full_page [kernel] 0x266 (0xd0581e48) [<c01459da>] add_to_page_cache_unique [kernel] 0x5a (0xd0581e98) [<c0145c31>] page_cache_read [kernel] 0xe1 (0xd0581eac) [<c0168c40>] blkdev_get_block [kernel] 0x0 (0xd0581eb4) [<c01465f7>] generic_file_readahead [kernel] 0xd7 (0xd0581ed4) [<c0146b89>] do_generic_file_read [kernel] 0x489 (0xd0581ef0) [<c0147435>] generic_file_new_read [kernel] 0xc5 (0xd0581f30) [<c0147270>] file_read_actor [kernel] 0x0 (0xd0581f40) [<c021e926>] sock_read [kernel] 0x96 (0xd0581f50) [<c014755f>] generic_file_read [kernel] 0x2f (0xd0581f7c) [<c01607b7>] sys_read [kernel] 0x97 (0xd0581f94) Version-Release number of selected component (if applicable): 2.4.21-15.0.3.ELsmp How reproducible: Didn't try Additional info:
The system died with "dump" command in lvm_find_exception_table with one entry's "next" field of the hash_table being "0" during a list entry remove. 108 lvm_find_exception_table(kdev_t org_dev, unsigned long org_start, lv_t * lv) 109 { 110 struct list_head * hash_table = lv->lv_snapshot_hash_table, * ne xt; 111 unsigned long mask = lv->lv_snapshot_hash_mask; 112 int chunk_size = lv->lv_chunk_size; 113 lv_block_exception_t * ret; 114 int i = 0; 115 116 if (!hash_table) 117 BUG(); 118 hash_table = &hash_table[hashfn(org_dev, org_start, mask, chunk_ size)]; 119 ret = NULL; 120 for (next = hash_table->next; next != hash_table; next = next->n ext) 121 { 122 lv_block_exception_t * exception; 123 124 exception = list_entry(next, lv_block_exception_t, hash) ; 125 if (exception->rsector_org == org_start && 126 exception->rdev_org == org_dev) 127 { 128 if (i) 129 { 130 /* fun, isn't it? :) */ 131 #ifdef list_move 132 list_move(next, hash_table); 133 #else 134 list_del(next); 135 list_add(next, hash_table); static inline void __list_del(struct list_head *prev, struct list_head *next) { 109e: 8b 41 04 mov 0x4(%ecx),%eax 10a1: 8b 11 mov (%ecx),%edx next->prev = prev; prev->next = next; 10a3: 89 10 mov %edx,(%eax) } /** * list_del - deletes entry from list. * @entry: the element to delete from the list. * Note: list_empty on entry does not return true after this, the entry is in an undefined state. */ static inline void list_del(struct list_head *entry) { __list_del(entry->prev, entry->next); entry->next = (void *) 0; 10a5: c7 01 00 00 00 00 movl $0x0,(%ecx) 10ab: 89 42 04 mov %eax,0x4(%edx) <----- edx is 0 10ae: 8b 03 mov (%ebx),%eax 10b0: 89 48 04 mov %ecx,0x4(%eax) 10b3: 89 01 mov %eax,(%ecx) 10b5: 89 59 04 mov %ebx,0x4(%ecx) 10b8: 89 0b mov %ecx,(%ebx) 10ba: 89 c8 mov %ecx,%eax 10bc: eb ce jmp 108c <lvm_find_exception_table+0x5c> 10be: 0f 0b ud2a 10c0: 75 00 jne 10c2 <lvm_find_exception_table+0x92> 10c2: 0d 00 00 00 eb or $0xeb000000,%eax 10c7: 97 xchg %eax,%edi
1. The vmcore is located at (looged in as anonymous ftp account): ftp://enterprise.redhat.com/incoming/vmcore-361320.gz 2. Found an identical issue in: http://www.spinics.net/lists/lvm/msg11750.html. 3. Just freshly built a test kernel with option #1 as discussed in previous link for customer to test (as a workaround).
Adding alias to IT ticket of another customer having the same problem.
A fix for this problem has just been committed to the RHEL3 U5 patch pool this afternoon (in kernel version 2.4.21-27.9.EL).
*** Bug 152959 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html