Bug 459969

Summary: [NetApp 4.8 bug] OOM gets triggered while discovering a large number of LUNs
Product: Red Hat Enterprise Linux 4 Reporter: Tanvi <tanvi>
Component: udevAssignee: Harald Hoyer <harald>
Status: CLOSED ERRATA QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.7CC: andriusb, babu.moger, bmarzins, coughlan, cward, dkovalsk, dmair, marting, michael.hagmann, pknirsch, rajashekhar.a, rsarraf, syeghiay, tanvi, xdl-redhat-bugzilla, ykopkova
Target Milestone: rcKeywords: OtherQA, ZStream
Target Release: 4.8   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 20:28:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 487502    
Bug Blocks: 445585, 450897, 481277    
Attachments:
Description Flags
pstree -p output none

Description Tanvi 2008-08-25 12:35:33 UTC
Description of problem:

On a RHEL4.7 box, I mapped 256 LUNs with 4 paths each (256 *4 = 1024 devices). When I discover the newly mapped LUNs by rescanning the iSCSI sessions, I see a OOM getting triggered on the host. I see the following messages in syslog when I hit this issue -

kernel: oom-killer: gfp_mask=0x1d2
kernel: Mem-info:
kernel: Node 0 DMA per-cpu:
kernel: cpu 0 hot: low 2, high 6, batch 1
kernel: cpu 0 cold: low 0, high 2, batch 1

After hitting this issue, the kernel starts killing processes -

kernel: Out of Memory: Killed process 27088 (pidof).
kernel: rpc.idmapd: page allocation failure. order:0, mode:0x1d2
kernel:
kernel: Call Trace:<ffffffff8015fd61>{__alloc_pages+789} <ffffffff80162b41>{do_page_cache_readahead+209}
kernel:        <ffffffff8015c9f5>{filemap_nopage+338} <ffffffff8016bf93>{do_no_page+1111}
kernel:        <ffffffff80318368>{thread_return+0} <ffffffff8016c388>{handle_mm_fault+373}
kernel:        <ffffffff80140a8c>{__mod_timer+293} <ffffffff801248d0>{do_page_fault+520}
kernel:        <ffffffff80135fc0>{remove_wait_queue+16} <ffffffff801a149f>{sys_epoll_wait+1115}
kernel:        <ffffffff80141e12>{recalc_sigpending+15} <ffffffff80110e1d>{error_exit+0}
kernel:


Version-Release number of selected component (if applicable):

# rpm -qa | grep kernel
kernel-2.6.9-78.EL
kernel-utils-2.4-14.1.117
kernel-smp-2.6.9-78.EL
# rpm -qa | grep udev
udev-039-10.22.el4


How reproducible:

Intermittant (Not Always, but frequent, can be reproducible)

Steps to Reproduce:

1. Map 256 LUNs with 4 paths each to a RHEL 4.7 box.
2. Discover mapped LUNs by doing a iSCSI rescan:
	# iscsi-rescan

  
Actual results:

While discovering the devices, we see the host hitting OOM state and processes getting killed.
The host eventually hangs.

Expected results:

All the devices should get configured on the host successfully.


Additional info:

We did initial root-cause analysis and following is our finding.
udev in RHEL 4.7 uses Trivial Database which uses locking mechanism. When heavily loaded with many requests we see the following messages in syslog:

Aug 25 16:16:18 lnx200-172 udev[18562]: tdb_brlock failed (fd=3) at offset 652 rw_type=1 lck_type=7
Aug 25 16:16:18 lnx200-172 udev[18562]: tdb_brlock failed (fd=3) at offset 652 rw_type=1 lck_type=7
Aug 25 16:16:27 lnx200-172 kernel: SCSI device sdju: 2097152 512-byte hdwr sectors (1074 MB)
Aug 25 16:16:27 lnx200-172 kernel: SCSI device sdju: 2097152 512-byte hdwr sectors (1074 MB)
Aug 25 16:16:27 lnx200-172 kernel: SCSI device sdju: 2097152 512-byte hdwr sectors (1074 MB)
Aug 25 16:16:30 lnx200-172 udev[13883]: tdb_brlock failed (fd=3) at offset 164 rw_type=1 lck_type=7
Aug 25 16:16:30 lnx200-172 udev[13883]: tdb_brlock failed (fd=3) at offset 164 rw_type=1 lck_type=7
Aug 25 16:16:31 lnx200-172 udev[18562]: tdb_lock failed on list 121 ltype=1 (Interrupted system call)
Aug 25 16:16:31 lnx200-172 udev[18562]: tdb_lock failed on list 121 ltype=1 (Interrupted system call)

The db then goes into a dead lock situation and all remaining requests are locked, which keep consuming memory and eventually leading to OS Out-Of-Memory.

Comment 1 RHEL Program Management 2008-09-05 17:25:08 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 16 Harald Hoyer 2009-02-02 11:35:56 UTC
Please! test:
http://people.redhat.com/harald/downloads/udev/udev-039-10.25.el4/

Comment 17 Tanvi 2009-02-03 14:07:54 UTC
I tested with 512 devices (256 LUNs * 2 paths). The OOM issue is still present.

Comment 19 Harald Hoyer 2009-02-03 14:29:20 UTC
can you do:

# killall udevd
# export UDEVD_MAX_CHILDS=10
# export UDEVD_MAX_CHILDS_RUNNING=10
# udevd &

and then retrigger 

# iscsi-rescan

Comment 20 Tanvi 2009-02-04 11:15:53 UTC
After following above steps, OOM was not seen, but it took around 3 hours to add 1024 devices.

Comment 21 Harald Hoyer 2009-02-04 12:27:00 UTC
ok, it seems, I have to lower the child limits..

Can you try to find out your maximum limit? How much memory has the machine?

Comment 22 Tanvi 2009-02-04 13:05:08 UTC
Following is the memory information of the machine

[root@localhost ~]# free
             total       used       free     shared    buffers     cached
Mem:       2055296     520644    1534652          0      35596     127512
-/+ buffers/cache:     357536    1697760
Swap:      4095992      70944    4025048

Comment 23 Harald Hoyer 2009-02-04 13:33:01 UTC
when you set 
udev_log="yes"
in /etc/udev/udev.conf

and do:
# unset UDEVD_MAX_CHILDS
# unset UDEVD_MAX_CHILDS_RUNNING
# killall udevd
# udevd &

you will see in /var/log/messages
Feb  4 14:23:21 rhel4 udevd[3333]: initialize max_childs to 190
Feb  4 14:23:21 rhel4 udevd[3333]: initialize max_childs_running to 16

can you report what udev sets for your system?

Comment 24 Tanvi 2009-02-04 13:49:30 UTC
Following is what I get on my machine

Feb  4 19:19:05 localhost udevd[11761]: initialize max_childs to 629
Feb  4 19:19:05 localhost udevd[11761]: initialize max_childs_running to 24

Comment 25 Harald Hoyer 2009-02-04 13:55:28 UTC
So, I might suggest trying to play with lower UDEVD_MAX_CHILDS for an acceptable value, which does not OOM (try e.g. 400, 300, 200...).

# killall udevd
# export UDEVD_MAX_CHILDS=xyz
# udevd &
 
and then retrigger 

# iscsi-rescan

Comment 26 Harald Hoyer 2009-02-04 13:58:43 UTC
oh, and you might disable logging again in /etc/udev/udev.conf

Comment 27 Tanvi 2009-02-05 12:10:37 UTC
I tried with various values. For UDEVD_MAX_CHILDS=100, I did not see any OOM. But still it took around 1 hour to add all the SCSI (/dev/sd*) and multipath (/dev/mapper/*) entries for 1024 devices (512 LUNs * 2 paths).

Comment 28 Harald Hoyer 2009-02-05 12:18:39 UTC
oh, wow.. 100 is very low.. is this the highest you can get?

Comment 29 Harald Hoyer 2009-02-05 12:25:10 UTC
I hope you disabled logging, which may have slowed down the process

Comment 30 Tanvi 2009-02-05 12:38:42 UTC
Yes, I did disable the logging. I tried with 600, 500, 400, 300, 200 and 100. Saw OOM for 200. Will try with 150 and will update.

Comment 31 Tanvi 2009-02-05 13:34:16 UTC
OOM triggered with 150 as well.

Comment 32 Harald Hoyer 2009-02-05 14:07:18 UTC
So, I will fix the formula to give you ~100 max_childs. Additionally, I will add a kernel command line parameter to set the maximum value manually.

Comment 33 Harald Hoyer 2009-02-06 14:13:31 UTC
please test
http://people.redhat.com/harald/downloads/udev/udev-039-10.26.el4

Comment 34 Tanvi 2009-02-07 07:07:00 UTC
I did not see any OOM if iscsi service is already running and we simply rescan the sessions after mapping all the LUNs. But there is one scenario when OOM gets triggered,i.e., when all the device are already discovered and iscsi service is restarted. Again, if we stop iscsi service and wait for some time before starting it again, OOM does not happen. Following are the relevant values  

Feb  7 12:14:40 localhost udevd[30513]: initialize max_childs to 99
Feb  7 12:14:40 localhost udevd[30513]: initialize max_childs_running to 24

Do we have to live with the long time taken to discover the devices and sluggishness of the host during that period?

Comment 35 Harald Hoyer 2009-02-09 11:00:23 UTC
Does it help to lower the UDEVD_MAX_CHILDS limit for a iscsi restart?

And yes, I guess, you will have to live with the slow process to prevent OOM.

Comment 36 Ritesh Raj Sarraf 2009-02-09 14:30:19 UTC
(In reply to comment #35)
> Does it help to lower the UDEVD_MAX_CHILDS limit for a iscsi restart?
> 
> And yes, I guess, you will have to live with the slow process to prevent OOM.

That would end up making the OS startup time to (1+ hr of device discovery + fsck time) a very large number.

Comment 37 Tanvi 2009-02-10 10:17:39 UTC
(In reply to comment #35)
> Does it help to lower the UDEVD_MAX_CHILDS limit for a iscsi restart?
> 
> And yes, I guess, you will have to live with the slow process to prevent OOM.

I decreased the value of UDEVD_MAX_CHILDS to 80. Did not see any OOM for iscsi restart.

Comment 38 Harald Hoyer 2009-02-10 11:05:17 UTC
last question: if you raise UDEVD_MAX_CHILDS_RUNNING (the other limit) does it get faster without OOM?

Comment 39 Tanvi 2009-02-11 06:05:04 UTC
I raised the value of UDEV_MAX_CHILDS_RUNNING to 80. The OOM did not happen but it did not help in decreasing the time either.

Comment 40 Harald Hoyer 2009-02-12 10:26:15 UTC
Ok, hopefully this runs out of the box without parameter tweaking. 
Please test:
http://people.redhat.com/harald/downloads/udev/udev-039-10.27.el4

What interests me also, is what processes bring your system down?
Can you provide a task list, when the iscsis are triggered with e.g.

# pstree -p

Comment 41 Tanvi 2009-02-13 12:52:02 UTC
Created attachment 331825 [details]
pstree -p output

Comment 42 Tanvi 2009-02-13 12:53:20 UTC
No OOM was seen for iscsi restart. Following are the values which udev sets on the machine

Feb 13 18:18:56 localhost udevd[6950]: initialize max_childs to 78
Feb 13 18:18:56 localhost udevd[6950]: initialize max_childs_running to 24

Attached the pstree -p output.

Comment 45 Ritesh Raj Sarraf 2009-02-26 09:58:44 UTC
Harald,

In RHEL4's udev, is this necessary?

if /sbin/lsmod | /bin/grep "^dm_multipath" > /dev/null 2>&1 && \
   [ ${mpath_ctl_fail} -eq 0 ] && \
   [ "${ACTION}" = "add" -a "${DEVPATH:7:3}" != "dm-" ] ; then
        /sbin/multipath -v0 ${DEVNAME} > /dev/null 2>&1
fi

This is part of multipath.dev script.

There was a similar situation in RHEL5.x (Bugzilla #460301). This looks similar to what has been commented in Comment #6 there.

If disabling the above helps, we could just document it in the Rel. Notes.

Comment 46 Harald Hoyer 2009-02-26 11:04:58 UTC
(In reply to comment #45)
> Harald,
> 
> In RHEL4's udev, is this necessary?

It is not _udev_ ... this is device-mapper-multipath

$ rpm -qf /etc/dev.d/block/multipath.dev
device-mapper-multipath-....

> 
> if /sbin/lsmod | /bin/grep "^dm_multipath" > /dev/null 2>&1 && \
>    [ ${mpath_ctl_fail} -eq 0 ] && \
>    [ "${ACTION}" = "add" -a "${DEVPATH:7:3}" != "dm-" ] ; then
>         /sbin/multipath -v0 ${DEVNAME} > /dev/null 2>&1
> fi
> 
> This is part of multipath.dev script.
> 
> There was a similar situation in RHEL5.x (Bugzilla #460301). This looks similar
> to what has been commented in Comment #6 there.
> 
> If disabling the above helps, we could just document it in the Rel. Notes.

Does it help? If yes, please open a bug against device-mapper-multipath

Comment 47 Ritesh Raj Sarraf 2009-02-26 11:39:20 UTC
Okay!!
I've cloned the RHEL5 bugzilla to target for RHEL4.
This is for the delayed discovery issue.

https://bugzilla.redhat.com/show_bug.cgi?id=487502

Comment 48 Chris Ward 2009-03-10 13:58:25 UTC
Netapp, could you please test the package builds available at the url below as well? This package fixes this bug for 4.7.z (EUS z-stream), bug 481277. If you need additional arch packages to test, other than what's provided, please let me know. Thanks.

http://people.redhat.com/cward/4.7.z/udev/

Comment 49 Andrius Benokraitis 2009-03-10 15:06:03 UTC
Ritesh, can you get this tested in both 4.8 and 4.7.z tests in both BZs?

Comment 50 Chris Ward 2009-03-13 14:03:59 UTC
~~ Attention Partners!  ~~
RHEL 4.8Beta has been released on partners.redhat.com. There should
be a fix present, which addresses this bug. Please test and report back results on this OtherQA Partner bug at your earliest convenience.

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe any issues you encountered. If you have found a NEW bug, clone this bug and describe the issues you've encountered. Further questions can be directed to your Red Hat Partner Manager.

If you have VERIFIED the bug fix. Please select your PartnerID from the Verified field above. Please leave a comment with your test results details. Include which arches tested, package version and any applicable logs.

 - Red Hat QE Partner Management

Comment 52 Tanvi 2009-03-19 13:05:26 UTC
I tested on both RHEL 4.7 and RHEL 4.8. The OOM is no more seen. 1024 devices (512 LUNs * 2 paths) were mapped to the host. It took almost 1 hour to make all the SCSI and multipath entries. A point to be noted is that it takes around 20 mins to add all the /dev/sd* entries, rest of the time is taken to create multipath entries.

Comment 54 errata-xmlrpc 2009-05-18 20:28:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1004.html