Bug 113702
Summary: | LTC5820-Segmentation fault while lvremove --autobackup y /dev/test/snap001 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | IBM Bug Proxy <bugproxy> |
Component: | lvm | Assignee: | Heinz Mauelshagen <heinzm> |
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | sct |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | powerpc | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-07-02 19:51:54 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
IBM Bug Proxy
2004-01-16 17:46:42 UTC
----- Additional Comments From zhouwu.com 2004-01-16 22:16 ------- Hi, Thinh/Glen I am doing some debugging on this defect. I found that after calling " while ( ( c = getopt_long ( argc, argv, options, long_options, NULL)) != EOF) " at line 84 of lvremove.c, the value of optarg is 0, so poped the "segmentation fault" at line 93 : if ( strcmp ( optarg, "y") == 0); Following is the debugging process: [root@plinuxt9 tools]# gdb -q ./lvremove Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) set args --autobackup y /dev/test/snap001 (gdb) b main Breakpoint 1 at 0x10001da0: file lvremove.c, line 57. (gdb) r Starting program: /usr/src/redhat/BUILD/LVM/1.0.3/tools/lvremove --autobackup y /dev/test/snap001 Breakpoint 1, main (argc=4, argv=0xffffe9b4) at lvremove.c:57 57 int c = 0; (gdb) l 52 int opt_d = 0; 53 #endif 54 55 int main ( int argc, char **argv) 56 { 57 int c = 0; 58 int c1 = 0; 59 int opt_A = 1; 60 int opt_A_set = 0; 61 int opt_f = 0; (gdb) ...... ...... (gdb) 82 LVMTAB_CHECK; 83 84 while ( ( c = getopt_long ( argc, argv, options, 85 long_options, NULL)) != EOF) { 86 switch ( c) { 87 case 'A': 88 opt_A_set++; 89 if ( opt_A > 1) { 90 fprintf ( stderr, "%s -- A option already given ", cmd); 91 return LVM_EINVALID_CMD_LINE; (gdb) b 84 Breakpoint 2 at 0x10001f60: file lvremove.c, line 84. (gdb) c Continuing. Breakpoint 2, main (argc=4, argv=0xffffe9b4) at lvremove.c:84 84 while ( ( c = getopt_long ( argc, argv, options, (gdb) n 86 switch ( c) { (gdb) n 88 opt_A_set++; (gdb) n 89 if ( opt_A > 1) { (gdb) n 93 if ( strcmp ( optarg, "y") == 0); (gdb) n Program received signal SIGSEGV, Segmentation fault. 0x0fec6a60 in strcmp () from /lib/tls/libc.so.6 gdb) p optarg $1 = 0x0 (gdb) b 93 Breakpoint 3 at 0x10002020: file lvremove.c, line 93. (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/src/redhat/BUILD/LVM/1.0.3/tools/lvremove --autobackup y /dev/test/snap001 Breakpoint 1, main (argc=4, argv=0xffffe9b4) at lvremove.c:57 57 int c = 0; (gdb) c Continuing. Breakpoint 2, main (argc=4, argv=0xffffe9b4) at lvremove.c:84 84 while ( ( c = getopt_long ( argc, argv, options, (gdb) c Continuing. Breakpoint 3, main (argc=4, argv=0xffffe9b4) at lvremove.c:93 93 if ( strcmp ( optarg, "y") == 0); (gdb) p optarg $2 = 0x0 (gdb) ----- Additional Comments From zhouwu.com 2004-01-17 00:03 ------- Hi, Thinh/Glen Investigating the source of lvremov.c, I found a mismatch between the source and the "man lvremove": lvremove.c define "autobackup" option as no- argument, but "man lvremove" shows that it need an argument. Following is the definition of long_options in lvremove.c(starting from line 66): struct option long_options[] = { { "autobackup", no_argument, NULL, 'A'}, DEBUG_LONG_OPTION { "force", no_argument, NULL, 'f'}, { "help", no_argument, NULL, 'h'}, { "verbose", no_argument, NULL, 'v'}, { NULL, 0, NULL, 0} }; But in file lvrename.c, which have also an "autobackup" option, "autobackup" option was defined as required_argument: struct option long_options[] = { { "autobackup", required_argument, NULL, 'A'}, DEBUG_LONG_OPTION { "help", no_argument, NULL, 'h'}, { "verbose", no_argument, NULL, 'v'}, { "version", no_argument, NULL, 22}, { NULL, 0, NULL, 0} }; After changing the "no_argument" to "required_argument" and recompiling lvremove, snap001 could be removed now: [root@plinuxt9 tools]# ./lvremove --autobackup y /dev/test/snap001 lvremove -- do you really want to remove "/dev/test/snap001"? [y/n]: y lvremove -- doing automatic backup of volume group "test" lvremove -- logical volume "/dev/test/snap001" successfully removed [root@plinuxt9 tools]# ----- Additional Comments From zhouwu.com 2004-01-17 00:28 ------- oops, the "vfree() noonexistent vm area" still exist with the above patch applied. After successfully removing the snapshot lv, "dmesg" will still show the following message: Trying to vfree() nonexistent vm area (d0000000002a9000) Trying to vfree() nonexistent vm area (d0000000002a9000) Trying to vfree() nonexistent vm area (d0000000002a9000) ----- Additional Comments From gjlynx.com(prefers email via gjohnson.com) 2004-01-17 10:27 ------- Is this project/patch OSSC approved? ----- Additional Comments From zhouwu.com 2004-01-18 21:05 ------- Hi, Glen This patch didn't get OSSC approval yet. It is intended to resolve the segmentation fault error in lvremove.c. Could you please help me OSSC approval? Or tell me how to get OSSC approval? Please advise, thanks very much! Wu ----- Additional Comments From thinh.com 2004-01-21 16:25 ------- Wu, you are right. it does require an argument. Your patch is good. The vfree() errors are something else that may come from one of the LVM functions or macro being used, but it is a different issue than this bug. ----- Additional Comments From khoa.com 2004-01-23 09:30 ------- I agree that the patch above does address the root cause of the segmentation fault and that it is separate from the vfree() warning, so in this sense, the patch is valid. But I hope that we address the vfree() warning as well. Wu/Thinh - are you going to look into this as well ? ----- Additional Comments From zhouwu.com 2004-02-01 20:26 ------- Hi, Khoa/Thinh Thanks for your reviewing this patch! And sorry for replying laterly because I am just back from the chinese spring festival. If time is available, I am quite willing to looking into this defect. Should there be any new findings, I will add my comments as soon as possible. BTW, I have one question that need your kind advices. Do I need to get the OSSC approval Glen suggested? And How to get such an approval? Your advices will be highly appreciated! Thanks again! Thanks for the effort on this. As soon as you can get a patch cleared and submitted to Red Hat bugzilla, the better --- for now, we have no access to the patch you refer to so the conversation that we're viewing in our bugzilla lacks a bit of context! ----- Additional Comments From khoa.com 2004-02-06 09:54 ------- I've put this on RHEL3 QU3 list as a Sev 2. ----- Additional Comments From zhouwu.com 2004-02-06 20:24 ------- The vfree() non-existent message was printed out in line 268 of mm/vmalloc.c. To determine the call trace, I add a "BUG();" after that line. After recompiling the kernel, enabling xmon and rerun the above reproduce process, an additional "lvdisplay -v /dev/test/lv001" will panic the kernel, and I could get such backtrace in /var/log/message: ===================== Error message start here =========================== Feb 5 05:57:02 plinuxt9 kernel: Trying to vfree() nonexistent vm area (d0000000002bd000) Feb 5 05:57:02 plinuxt9 kernel: kernel BUG at vmalloc.c:269! Feb 5 05:57:02 plinuxt9 kernel: loop lvm-mod autofs e100 ipt_state iptable_filter iptable_nat ip_conntrack ip_tables sg ext3 jbd sym53c8xx sd_mod scsi_mod Feb 5 05:57:02 plinuxt9 kernel: NIP: c000000000092974 XER: 0000000000000000 LR: c000000000092960 REGS: c0000001ef77f4c0 TRAP: 0700 Not tainted Feb 5 05:57:02 plinuxt9 kernel: NIP is at .vfree [kernel] 0x104 (2.4.21- 9.ELlvm) Feb 5 05:57:02 plinuxt9 kernel: MSR: 9000000000089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 Feb 5 05:57:03 plinuxt9 kernel: TASK = c0000001ef77c000[2130] 'lvdisplay' Last syscall: 54 Feb 5 05:57:03 plinuxt9 kernel: last math 0000000000000000 CPU: 1 Feb 5 05:57:03 plinuxt9 kernel: GPR00: 0000000000000000 c0000001ef77f740 c0000000005f2000 000000000000001d Feb 5 05:57:03 plinuxt9 kernel: GPR04: 0000000000000003 c0000001fa768000 0000000000000000 c0000000005853b0 Feb 5 05:57:03 plinuxt9 kernel: GPR08: c0000000005853a8 c0000000003b0a20 c000000000582410 c000000000582410 Feb 5 05:57:03 plinuxt9 kernel: GPR12: c00000000068feb3 c0000001ef77c000 00000000ffffe7f8 0000000000000000 Feb 5 05:57:03 plinuxt9 kernel: GPR16: 000000000ff97fb8 0000000000000000 0000000000000000 0000000000000000 Feb 5 05:57:03 plinuxt9 kernel: GPR20: 0000000000000001 c00000000050eec0 b000000000009032 0000000000000003 Feb 5 05:57:03 plinuxt9 kernel: GPR24: c0000001ef77fcd0 0000000000000000 000000000ffec9a4 c0000001ef77f8c0 Feb 5 05:57:03 plinuxt9 kernel: GPR28: 00000000ffffe5e8 d0000000002bd000 c0000000003e4b58 0000000000000000 Feb 5 05:57:03 plinuxt9 kernel: Call Trace: Feb 5 05:57:03 plinuxt9 kernel: [<c000000000092960>] .vfree [kernel] 0xf0 Feb 5 05:57:03 plinuxt9 kernel: [<c000000000032044>] .put_lv_t [kernel] 0x50 Feb 5 05:57:04 plinuxt9 kernel: [<c0000000000328fc>] .do_lvm_ioctl [kernel] 0x20c Feb 5 05:57:04 plinuxt9 kernel: [<c000000000034a44>] .sys32_ioctl [kernel] 0x128 ============================================================================= So the error is in function put_lv_t of arch/ppc64/kernel/ioctl32.c: static void put_lv_t(lv_t *l) { if (l->lv_current_pe) vfree(l->lv_current_pe); if (l->lv_block_exception) vfree(l->lv_block_exception); kfree(l); } ----- Additional Comments From zhouwu.com 2004-02-06 20:40 ------- So I continue to investigate where does this l->lv_current_pe and l->lv_block_exception was vmalloc before. They are in function get_lv_t of the same ioctl32.c, which mean to do conversion between 32bit and 64bit native ioctls. In this file, function do_lvm_ioctl will process the ioctl conversion related with lvm. It first get arg from 32-bit user-space, and convert it to 64- bit structure, and then call sys_ioctl to handle the ioctl command: ============ start from line 2600 ====================================== old_fs = get_fs(); set_fs (KERNEL_DS); err = sys_ioctl (fd, cmd, (unsigned long)karg); set_fs (old_fs); and then re-convert the resulted "karg" back to 32-bit structure. the vmalloc was done in function get_lv_t before this sys_ioctl, and vfree was done after the sys_ioctl. After adding some printk, I found that a lv_block_exception was not vmalloc'ed, but after the sys_ioctl, it get a value of non-zero, so comes the "vfree() non-existent" error ----- Additional Comments From zhouwu.com 2004-02-06 21:00 ------- continue the investigation into lvm_chr_ioctl of driver/md/lvm.c, I found it goes into lvm_do_lv_status_byindex(vg_ptr, arg) while ioctl "cmd" equal "LV_STATUS_BYINDEX". In this function of lvm_do_lv_status_byindex: ====================== code of function lvm_do_lv_status_byindex =========== /* * character device support function logical volume status by index */ static int lvm_do_lv_status_byindex(vg_t *vg_ptr,void *arg) { lv_status_byindex_req_t lv_status_byindex_req; void *saved_ptr1; void *saved_ptr2; lv_t *lv_ptr; if (vg_ptr == NULL) return -ENXIO; if (copy_from_user(&lv_status_byindex_req, arg, sizeof(lv_status_byindex_req)) != 0) return -EFAULT; if (lv_status_byindex_req.lv == NULL) return -EINVAL; if ( ( lv_ptr = vg_ptr->lv[lv_status_byindex_req.lv_index]) == NULL) return -ENXIO; /* Save usermode pointers */ if (copy_from_user(&saved_ptr1, &lv_status_byindex_req.lv- >lv_current_pe, sizeof(void*)) != 0) return -EFAULT; if (copy_from_user(&saved_ptr2, &lv_status_byindex_req.lv- >lv_block_exception, sizeof(void*)) != 0) return -EFAULT; if (copy_to_user(lv_status_byindex_req.lv, lv_ptr, sizeof(lv_t)) != 0) return -EFAULT; if (saved_ptr1 != NULL) { if (copy_to_user(saved_ptr1, lv_ptr->lv_current_pe, lv_ptr->lv_allocated_le * sizeof(pe_t)) != 0) return -EFAULT; } /* Restore usermode pointers */ if (copy_to_user(&lv_status_byindex_req.lv->lv_current_pe, &saved_ptr1, sizeof(void *)) != 0) return -EFAULT; return 0; } /* lvm_do_lv_status_byindex() */ ====================================================================== From above, we can see that if will first save lv_current_pe memeber of lv_status_byindex_req.lv into saved_ptr1, and lv_block_exception into saved_ptr2; then copy the indexed lv_ptr memeber of vg_ptr into lv_status_byindex_req.lv; after that it will cope saved_ptr1 back into lv_status_byindex_req.lv->lv_current_pe. from the structure of this function, it should also copy the saved_ptr2 back into lv_status_byindex_req.lv->lv_blcok_exception. but there is no code to handle this. Because I don't clearly know what "lv_current_pe" & "lv_blcok_exception" are used for, So I can only try adding the similar copy- back code for lv_block_exception. After adding the above copy-back code and recompile the kernel and restart the above reproduce process, no "vfree() non-existenet" error this time. But I just don't know why. Anyone could help on this? Thanks very much! BTW, Seen from the changelog at the head of lvm.c: ======================================================================= * 28/07/1999 - implemented snapshot logical volumes * - lvm_chr_ioctl * - LV_STATUS_BYINDEX * - LV_STATUS_BYNAME maybe this could provide some clues. ----- Additional Comments From zhouwu.com 2004-03-24 22:58 ------- This defect still exist in the 03/16 released RHEL3 U2 Beta. ----- Additional Comments From bherren.com(prefers email via benh.com) 2004-05-06 03:20 ------- What is the status here ? We are waiting for redhat or shall one of us try to find a fix ? the code on ppc64 to copy LV metadata in/out is to be found in linux/arch/ppc64/kernel/ioctl32.c rather than in linux/drivers/md/lvm.c. With no access to ppc64 right now, i'll carry on spotting... ----- Additional Comments From zhouwu.com 2004-05-28 05:41 ------- Hello all, Latest status for BZ #5820 and #5821 from my side: 1. Both defects was not fixed yet in official RHEL3 U2. 2. BZ #5820 is an application bug, it is fixed by the patch I provided in Comment #9. Seen from the CVS of lvm-1.08, this defect is also fixed and it is same as mine. It seems that Redhat guys can't see the patch on our side, could anyone who have the right priviledge put it on RedHat's side. 3. BZ #5821 is a kernel bug as far as I tell. It was found at the same time as this one. So many discussion about 5821 also took place here. But to help resolve the problem, I'd like to distinguish between these two: application defect discussion at here, and the kernel defect at 5821. So I only report here shortly about the kernel defect: I ever created a patch, which could remedy the vfree and kernel panic error. Now I am running LTP testcase on this patched kernel. by the time I write this comment, it runs smoothly. could anyone tell me what you thought about this plan? These two defects have been here for already 4 monthes. And in fact, we could have made some progress on them. Thanks for reading this long comments :) - Wu 1. Yes, planned for RHEL U3 2. Yes, fixed in lvm-1_0_8-2. No further activity needed on our side IMO. 3. Please provide the kernel patch once you're done with your LTP testing and I'll integrate it. Thanks, Heinz. ----- Additional Comments From zhouwu.com 2004-05-30 22:26 ------- Hello Heinz, I have attached the patch in LTC BZ# 5821(RH113704). You could have a look at there. Please be noted that patch didn't go through any review, so before integrating that, please make double check. In the above comment, you mention a new version LVM utility. Would you please tell me where could I get that? The SEGV fault still exist after applying that patch. So I wish to have a look at the new version LVM. Thanks a lot. - Wu ----- Additional Comments From khoa.com 2004-06-07 04:37 ------- We need Red Hat to review and accept the patch from Wu. Thanks. ----- Additional Comments From thinh.com 2004-06-28 13:34 ------- Can anyone verify/update this bug? is it fixed? ----- Additional Comments From zhouwu.com 2004-06-30 22:13 ------- No segv fault with the latest 1.0.8-3 lvm package. It is fixed. Close it. Thanks. |