From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.4) Gecko/20050318 Red Hat/1.4.4-1.3.5 Description of problem: When an iscsi-ls -l is issued to a configuration consisting of EMCPower-4.3.2 and 212 devices spread across two arrays I get a sementation fault. After the fault the stack is corupted so if service PowerPath stop is performed the segmentation fault will contine to occur. If PowerPath is not started and an iscsi-ls -l is performed against the same 212 devices it works fine. Below is an analysis from one of our developers: May 18 2005 2:20PM Zhimin Jiang: Two bugs: 1) array overflow - this is the bug that causes the segmentation fault. line 206 of scsi-info.c(in function do_scsi_83_inquiry) should be changed from *tmpid = malloc(sizeof(char) * (id_length * 2)); to *tmpid = malloc(sizeof(char) * (id_length * 2) + 1); and line 244 of scsi-info.c(in do_scsi_80_inquiry) should be changed from info->page80 = malloc(sizeof(char) * (id_length * 2)); to info->page80 = malloc(sizeof(char) * (id_length * 2) + 1); The above change is to provide an extra byte of space to accommodate the string determinator('\0'). Otherwise overflow will occur. The segmentation happens when the iscsi-ls's memory usage reaches some amount. With PP, iscsi-ls internally allocates more memory to process PP devices such as /dev/emcpower**, this triggers the segmentation fault. I am pretty sure if you add more devices to the host, the command will seg fault even without PP. In fact, I malloced a 10KB at the beginning of the command's main function and it caused it to seg fault right away. 2) memory leak in function _get_devs_from_proc (defined in iscsi-ls.c). This function should release the memory allocated to local variable 'procbuf' upon it exits. So its last statement(line 193 in iscsi-ls.c) should be changed from return 1; to if(procbuf) free(procbuf); return 1; One interesting thing was the segmentation fault also happened when I added the line of code to free 'procbuf' before fixing bug #1. So freeing 'procbuf' will also trigger bug #1. Wondering why the free(procbuf) was missing at the first place. May 18 2005 2:38PM Zhimin Jiang: BTW, the iscsi-ls I built on 4/29/2005 is an oder version(3.6.2-4). The version comes with RHEL 3.0 U5 is 3.6.2-7. Version 3.6.2-7 processes /dev/sd?X(X is an alphabet letter) devices which are not processed in the older version. Which means the new version uses more memory so the segmentation fault happens, while the old version uses less memory so the segmentation has not been triggered. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.configure 256 LUNs - system disks 2.install EMCpower.LINUX-4.3.2-011 3.perform iscsi-ls -l Actual Results: SEGMETATION FAULT Expected Results: You should get a target list and the associated LUNs to each target. Additional info:
Tom, please attach the straces I sent you last week to this BZ. Thank you, Wayne.
Great - i'll get those fixes into the tree. I'm a bit confused about the comment about the stack being corrupted though. Once iscsi-ls has ended, it's stack space has been closed. Are you saying the iscsi driver's stack space is getting corrupted by iscsi-ls?
In test I found that if I removed PowerPath after the segmentation fault occurred, subsequent iscsi-ls -l calls would yield a segmentation fault; however, if I started with a fresh server without PowerPath running iscsi-ls -l works fine in the same configuration without any faults.
Created attachment 114768 [details] iscsi-ls -l trace before PowerPath is started
Created attachment 114769 [details] iscsi-ls -l trace after PowerPath is started This is an strace of the failure with PowerPath running.
Created attachment 114770 [details] iscsi-ls -l trace after PowerPath is stopped but after the seg fault This is a trace of the seg fault after PowerPath has been stopped but the server has not been rebooted. iscsi-ls -l will continue to seg fault until the server has been rebooted.
Memory allocation fixes committed to the 3.6 upstream tree.
Patch tested successfully. RHEL 3.0 U6 beta (lk 2.4.21-35) in test to confirm fix.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-548.html
This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 3.8 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 3.8 release.