Bug 549771 - brcm_iscsiuio daemon segfaults after boot
Summary: brcm_iscsiuio daemon segfaults after boot
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: iscsi-initiator-utils
Version: 5.4
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Chris Leech
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-12-22 16:17 UTC by Phillip Sorensen
Modified: 2014-10-13 17:41 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-02 13:07:31 UTC
Target Upstream Version:
Embargoed:
pas37: needinfo-


Attachments (Terms of Use)
/var/log/messages file (74.55 KB, application/octet-stream)
2010-01-07 19:22 UTC, Phillip Sorensen
no flags Details
/var/log/brcm_iscsi.log file (798.44 KB, application/octet-stream)
2010-01-07 19:23 UTC, Phillip Sorensen
no flags Details
Output of the 'ps -e' command after boot (3.47 KB, application/octet-stream)
2010-01-14 17:20 UTC, Phillip Sorensen
no flags Details
Result of dmesg command after boot (24.36 KB, application/octet-stream)
2010-01-14 17:21 UTC, Phillip Sorensen
no flags Details

Description Phillip Sorensen 2009-12-22 16:17:43 UTC
While debugging Bug #545999, I noticed that the  brcm_iscsiuio daemon used by the bnx2i interface disappeared and reported a segfault in the /var/log/messages file.


Version-Release number of selected component (if applicable):

iscsi-initiator-utils-6.2.0.871-0.10.el5
iscsi-initiator-utils-6.2.0.871-0.12.el5


Additional info:

These are the entries in the /var/log/messages file.  There were multiple boots while debugging the selinux issue.

Dec 16 22:53:43 iscsi-test kernel: brcm_iscsiuio[2099]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 0000000040d6fec8 error 4
Dec 16 22:58:07 iscsi-test kernel: brcm_iscsiuio[2079]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 0000000043b00ec8 error 4
Dec 16 23:02:47 iscsi-test kernel: brcm_iscsiuio[2118]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 00000000434c3ec8 error 4
Dec 16 23:17:53 iscsi-test kernel: brcm_iscsiuio[2096]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 00000000444a5ec8 error 4
Dec 16 23:24:55 iscsi-test kernel: brcm_iscsiuio[2096]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 00000000414caec8 error 4
Dec 16 23:41:49 iscsi-test kernel: brcm_iscsiuio[2096]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 0000000041360ec8 error 4
Dec 17 00:21:10 iscsi-test kernel: brcm_iscsiuio[2100]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 000000004172bec8 error 4
Dec 17 00:34:15 iscsi-test kernel: brcm_iscsiuio[2094]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 00000000430c7ec8 error 4
Dec 17 01:04:50 iscsi-test kernel: brcm_iscsiuio[2106]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 00000000431e3ec8 error 4
Dec 17 17:47:48 iscsi-test kernel: brcm_iscsiuio[2102]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 0000000043cbfec8 error 4
Dec 17 18:00:22 iscsi-test kernel: brcm_iscsiuio[2441]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 0000000046616ec8 error 4
Dec 17 18:33:27 localhost kernel: brcm_iscsiuio[2340]: segfault at 00002aaaaaac4
000 rip 000000383b27bf0b rsp 0000000044653ec8 error 4
Dec 18 10:42:07 localhost kernel: brcm_iscsiuio[4865]: segfault at 00002aaaaaadb
000 rip 000000383b27bf0b rsp 0000000045297ec8 error 4
Dec 18 10:43:23 localhost kernel: brcm_iscsiuio[4972]: segfault at 00002aaaaaadb
000 rip 000000383b27bf0b rsp 00000000457fdec8 error 4
Dec 18 10:50:55 localhost kernel: brcm_iscsiuio[2529]: segfault at 00002aaaaaadb
000 rip 000000383b27bf0b rsp 0000000045e4dec8 error 4
Dec 18 10:59:37 iscsi-test kernel: brcm_iscsiuio[2451]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 0000000040b14ec8 error 4
Dec 18 12:09:32 iscsi-test kernel: brcm_iscsiuio[2102]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 0000000043794ec8 error 4
Dec 21 11:27:33 iscsi-test kernel: brcm_iscsiuio[1881]: segfault at 000000000000
0000 rip 000000000040ea01 rsp 00007fff2b5efe60 error 6
Dec 21 13:42:11 iscsi-test kernel: brcm_iscsiuio[2100]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 000000004184cec8 error 4
Dec 21 14:43:46 iscsi-test kernel: brcm_iscsiuio[1906]: segfault at 000000000000
0000 rip 000000000040eda1 rsp 00007fff319ad9d0 error 6
Dec 21 14:56:28 iscsi-test kernel: brcm_iscsiuio[2124]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 00000000428cfec8 error 4
Dec 21 15:12:41 iscsi-test kernel: brcm_iscsiuio[1906]: segfault at 000000000000
0000 rip 000000000040eda1 rsp 00007fff17170bf0 error 6
Dec 21 15:31:58 iscsi-test kernel: brcm_iscsiuio[2120]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 0000000042ec6ec8 error 4
Dec 21 15:42:27 iscsi-test kernel: brcm_iscsiuio[2100]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 00000000431e6ec8 error 4
Dec 21 16:10:51 iscsi-test kernel: brcm_iscsiuio[2098]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 00000000439c1ec8 error 4
Dec 21 16:19:44 iscsi-test kernel: brcm_iscsiuio[2119]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 0000000043a06ec8 error 4
Dec 22 10:58:01 iscsi-test kernel: brcm_iscsiuio[2128]: segfault at 00002aaaaaad
b000 rip 000000383b27bf0b rsp 0000000041345ec8 error 4
Dec 22 11:02:27 iscsi-test kernel: brcm_iscsiuio[2994]: segfault at 00002aaaaaac
4000 rip 000000383b27bf0b rsp 0000000043a14ec8 error 4

Comment 1 Phillip Sorensen 2009-12-22 16:20:24 UTC
Proper bug number for selinux trouble is Bug #548599

Comment 2 Mike Christie 2009-12-23 02:05:55 UTC
Adding broadcom's Ben Li.

Comment 3 Benjamin Li 2009-12-23 02:34:44 UTC
Hi Phillip,

Could you provide more context on what was happening when the segfault occurred?  Also would you have the core file or a stack trace?  Thanks again.

-Ben

Comment 4 Phillip Sorensen 2010-01-04 20:54:28 UTC
I have not been able to tell the context to much.  It seems to happen if I reboot and never login, or if I login and run various programs.  I loaded the debuginfo files for iscsi-initiator-utils and glibc and then attached to the deamon with gdb.

I got the following backtrace (bt full):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x4458b940 (LWP 2114)]
0x000000383b27bf0b in memcpy () from /lib64/libc.so.6
(gdb) bt full
#0  0x000000383b27bf0b in memcpy () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000000040d75a in cnic_read ()
No symbol table info available.
#2  0x0000000000405adb in process_packets ()
No symbol table info available.
#3  0x0000000000406242 in nic_loop ()
No symbol table info available.
#4  0x000000383c206617 in start_thread (arg=<value optimized out>)
    at pthread_create.c:297
	__res = <value optimized out>
	pd = Could not find the frame base for "start_thread".
	unwind_buf = Could not find the frame base for "start_thread".
	not_first_call = <value optimized out>
	robust = <value optimized out>
#5  0x000000383b2d3c2d in clone () from /lib64/libc.so.6
	fstab_state = {fs_fp = 0x0, fs_buffer = 0x0, fs_mntres = {
    mnt_fsname = 0x0, mnt_dir = 0x0, mnt_type = 0x0, mnt_opts = 0x0, 
    mnt_freq = 0, mnt_passno = 0}, fs_ret = {fs_spec = 0x0, fs_file = 0x0, 
    fs_vfstype = 0x0, fs_mntops = 0x0, fs_type = 0x0, fs_freq = 0, 
    fs_passno = 0}}
	__elf_set___libc_subfreeres_element_fstab_free__ = (
    const void *) 0x383b30a9e0



I am attaching the core file I got with the gdb command generate-core-file.  I don't know how good it is. I got the following errors when I ran the command:

warning: Memory read failed for corefile section, 77824 bytes at 0x00002aaaaaaad000.
warning: Memory read failed for corefile section, 77824 bytes at 0x00002aaaaaac4000.

Comment 5 Phillip Sorensen 2010-01-04 21:07:03 UTC
The core file is to big to attach.  It can be downloaded from http://staff.chess.cornell.edu/~sorensen/core.1899

Comment 6 Benjamin Li 2010-01-06 18:46:08 UTC
Unfortunately, gdb didn't like the core provided.  I think we will have to debug this via the system logs and the brcm_iscsiuio logs.  Phillip could you also attach the /var/log/messages* and /var/log/brcm_iscsiuio log files?  Thanks again.

Comment 7 Phillip Sorensen 2010-01-07 19:22:21 UTC
Created attachment 382315 [details]
/var/log/messages file

Comment 8 Phillip Sorensen 2010-01-07 19:23:35 UTC
Created attachment 382316 [details]
/var/log/brcm_iscsi.log file

Attaching /var/log/messages and /var/log/brcm_iscsi.log

Comment 9 Benjamin Li 2010-01-08 00:49:40 UTC
Also added Emory to see if he had seen anything like this before during his testing.

Comment 10 Benjamin Li 2010-01-08 00:55:21 UTC
After looking through the /var/log/messages and /var/log/brcm-iscsi.log files I didn't see anything suspicious.  But, I did notice that you were using an older version of brcm_iscsiuio.  In the logs it showed version 0.4.3, this must mean you are using iscsi-initiator-utils-6.2.0.871-0.10.el5

If you get a chance could you try iscsi-initiator-utils-6.2.0.871-0.12.el5?  There were number bugs fixes when going from 0.4.3 -> 0.4.8.  Some of the bug fixes have to do with resource allocation and cleanup.

Thanks again.

Comment 11 Phillip Sorensen 2010-01-08 15:43:36 UTC
I am seeing the same thing with the updated iscsi-initiator-utils-6.2.0.871-0.12.el5.  The backtrace shows the same call sequence.

Comment 12 Benjamin Li 2010-01-12 00:10:28 UTC
Thanks Phillip for trying the later version of the iscsi-initiator-utils.  But, I think we will need to reproduce this problem here in the Broadcom Lab to better understand what is going on.   Could you provide a description of your test machine configuration (number of test machines, RHEL configuration, iSCSI iface files... so that we can mimic that here in our lab).  And also could you provide the reproduction steps of what you did to cause this segfault (daemon running, commands executed)?

Emory do you have a machine in the lab where we can try Phillip's configuration?

Thanks again.

Comment 13 Gideon Naim 2010-01-14 14:27:07 UTC
Phillip,

As Ben wrote can you please provide us the exact details as requested?

Thanks,
Gidi

Comment 14 Phillip Sorensen 2010-01-14 17:19:24 UTC
My initial test machine is a Cybertron build SuperMicro system based on the PDSML+ motherboard with a Intel X3220 processor. We are using the HP NC382T card for the iSCSI.  The install is standard RHEL 5.4 with iscsi-initiator-utils-6.2.0.871-0.12.el5.  I am connecting to a RHEL5 host running scsi-target-utils-0.0-0.20070620snap.  I will attach the output of ps -e and dmesg after reboot and login.

The network settings are:

::::::::::::::
/etc/sysconfig/network-scripts/ifcfg-eth2
::::::::::::::
# Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express
DEVICE=eth2
BOOTPROTO=dhcp
HWADDR=18:A9:05:78:B7:FC
ONBOOT=yes
::::::::::::::
/etc/sysconfig/network-scripts/ifcfg-eth3
::::::::::::::
# Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express
DEVICE=eth3
BOOTPROTO=static
HWADDR=18:A9:05:78:B7:FE
IPADDR=192.168.182.132
NETMASK=255.255.255.0
ONBOOT=yes


the iface files look like

::::::::::::::
ifaces/bnx2i.18:a9:05:78:b7:fd
::::::::::::::
# BEGIN RECORD 2.0-871
iface.iscsi_ifacename = bnx2i.18:a9:05:78:b7:fd
iface.ipaddress = 128.84.182.243
iface.hwaddress = 18:a9:05:78:b7:fd
iface.transport_name = bnx2i
# END RECORD
::::::::::::::
ifaces/bnx2i.18:a9:05:78:b7:ff
::::::::::::::
# BEGIN RECORD 2.0-871
iface.iscsi_ifacename = bnx2i.18:a9:05:78:b7:ff
iface.ipaddress = 192.168.182.133
iface.hwaddress = 18:a9:05:78:b7:ff
iface.transport_name = bnx2i
# END RECORD

The segfault occurs under different conditions.  All I have to do is boot and wait.  It will happen without me even logging in.  Sometimes is seem to be right after boot, sometimes it will take 10 or 15 minutes.



Yesterday I set up one of our Dell R210 (Xeon 3440 base system with the Dell BCM5709 card) production servers.  I have not done much testing with it yet, but my initial testing show a segfualt.  I still need to test if it looks the same.

Let me know if there are additional details.

Comment 15 Phillip Sorensen 2010-01-14 17:20:38 UTC
Created attachment 383853 [details]
Output of the 'ps -e' command after boot

Comment 16 Phillip Sorensen 2010-01-14 17:21:28 UTC
Created attachment 383876 [details]
Result of dmesg command after boot

Comment 17 Benjamin Li 2010-01-19 00:19:54 UTC
Hi Philip,

I was wondering if Emory and I could get access to bugzilla 

https://bugzilla.redhat.com/show_bug.cgi?id=545999

to see if there any additional setup we would need for the brcm_iscsiuio daemon to segfault.

Thanks again.

-Ben

Comment 18 Michael Chan 2010-01-19 00:25:02 UTC
Ben, it's actually bug #548599.

Comment 20 Ludek Smid 2010-03-11 12:19:33 UTC
Since it is too late to address this issue in RHEL 5.5, it has been proposed for RHEL 5.6.  Contact your support representative if you need to escalate this issue.

Comment 23 RHEL Program Management 2010-09-02 03:56:56 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 26 RHEL Program Management 2014-03-07 12:43:58 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 27 RHEL Program Management 2014-06-02 13:07:31 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).


Note You need to log in before you can comment on or make changes to this bug.