Description of problem: On a RHEL4.7 box, I mapped 256 LUNs with 4 paths each (256 *4 = 1024 devices). When I discover the newly mapped LUNs by rescanning the iSCSI sessions, I see a OOM getting triggered on the host. I see the following messages in syslog when I hit this issue - kernel: oom-killer: gfp_mask=0x1d2 kernel: Mem-info: kernel: Node 0 DMA per-cpu: kernel: cpu 0 hot: low 2, high 6, batch 1 kernel: cpu 0 cold: low 0, high 2, batch 1 After hitting this issue, the kernel starts killing processes - kernel: Out of Memory: Killed process 27088 (pidof). kernel: rpc.idmapd: page allocation failure. order:0, mode:0x1d2 kernel: kernel: Call Trace:<ffffffff8015fd61>{__alloc_pages+789} <ffffffff80162b41>{do_page_cache_readahead+209} kernel: <ffffffff8015c9f5>{filemap_nopage+338} <ffffffff8016bf93>{do_no_page+1111} kernel: <ffffffff80318368>{thread_return+0} <ffffffff8016c388>{handle_mm_fault+373} kernel: <ffffffff80140a8c>{__mod_timer+293} <ffffffff801248d0>{do_page_fault+520} kernel: <ffffffff80135fc0>{remove_wait_queue+16} <ffffffff801a149f>{sys_epoll_wait+1115} kernel: <ffffffff80141e12>{recalc_sigpending+15} <ffffffff80110e1d>{error_exit+0} kernel: Version-Release number of selected component (if applicable): # rpm -qa | grep kernel kernel-2.6.9-78.EL kernel-utils-2.4-14.1.117 kernel-smp-2.6.9-78.EL # rpm -qa | grep udev udev-039-10.22.el4 How reproducible: Intermittant (Not Always, but frequent, can be reproducible) Steps to Reproduce: 1. Map 256 LUNs with 4 paths each to a RHEL 4.7 box. 2. Discover mapped LUNs by doing a iSCSI rescan: # iscsi-rescan Actual results: While discovering the devices, we see the host hitting OOM state and processes getting killed. The host eventually hangs. Expected results: All the devices should get configured on the host successfully. Additional info: We did initial root-cause analysis and following is our finding. udev in RHEL 4.7 uses Trivial Database which uses locking mechanism. When heavily loaded with many requests we see the following messages in syslog: Aug 25 16:16:18 lnx200-172 udev[18562]: tdb_brlock failed (fd=3) at offset 652 rw_type=1 lck_type=7 Aug 25 16:16:18 lnx200-172 udev[18562]: tdb_brlock failed (fd=3) at offset 652 rw_type=1 lck_type=7 Aug 25 16:16:27 lnx200-172 kernel: SCSI device sdju: 2097152 512-byte hdwr sectors (1074 MB) Aug 25 16:16:27 lnx200-172 kernel: SCSI device sdju: 2097152 512-byte hdwr sectors (1074 MB) Aug 25 16:16:27 lnx200-172 kernel: SCSI device sdju: 2097152 512-byte hdwr sectors (1074 MB) Aug 25 16:16:30 lnx200-172 udev[13883]: tdb_brlock failed (fd=3) at offset 164 rw_type=1 lck_type=7 Aug 25 16:16:30 lnx200-172 udev[13883]: tdb_brlock failed (fd=3) at offset 164 rw_type=1 lck_type=7 Aug 25 16:16:31 lnx200-172 udev[18562]: tdb_lock failed on list 121 ltype=1 (Interrupted system call) Aug 25 16:16:31 lnx200-172 udev[18562]: tdb_lock failed on list 121 ltype=1 (Interrupted system call) The db then goes into a dead lock situation and all remaining requests are locked, which keep consuming memory and eventually leading to OS Out-Of-Memory.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Please! test: http://people.redhat.com/harald/downloads/udev/udev-039-10.25.el4/
I tested with 512 devices (256 LUNs * 2 paths). The OOM issue is still present.
can you do: # killall udevd # export UDEVD_MAX_CHILDS=10 # export UDEVD_MAX_CHILDS_RUNNING=10 # udevd & and then retrigger # iscsi-rescan
After following above steps, OOM was not seen, but it took around 3 hours to add 1024 devices.
ok, it seems, I have to lower the child limits.. Can you try to find out your maximum limit? How much memory has the machine?
Following is the memory information of the machine [root@localhost ~]# free total used free shared buffers cached Mem: 2055296 520644 1534652 0 35596 127512 -/+ buffers/cache: 357536 1697760 Swap: 4095992 70944 4025048
when you set udev_log="yes" in /etc/udev/udev.conf and do: # unset UDEVD_MAX_CHILDS # unset UDEVD_MAX_CHILDS_RUNNING # killall udevd # udevd & you will see in /var/log/messages Feb 4 14:23:21 rhel4 udevd[3333]: initialize max_childs to 190 Feb 4 14:23:21 rhel4 udevd[3333]: initialize max_childs_running to 16 can you report what udev sets for your system?
Following is what I get on my machine Feb 4 19:19:05 localhost udevd[11761]: initialize max_childs to 629 Feb 4 19:19:05 localhost udevd[11761]: initialize max_childs_running to 24
So, I might suggest trying to play with lower UDEVD_MAX_CHILDS for an acceptable value, which does not OOM (try e.g. 400, 300, 200...). # killall udevd # export UDEVD_MAX_CHILDS=xyz # udevd & and then retrigger # iscsi-rescan
oh, and you might disable logging again in /etc/udev/udev.conf
I tried with various values. For UDEVD_MAX_CHILDS=100, I did not see any OOM. But still it took around 1 hour to add all the SCSI (/dev/sd*) and multipath (/dev/mapper/*) entries for 1024 devices (512 LUNs * 2 paths).
oh, wow.. 100 is very low.. is this the highest you can get?
I hope you disabled logging, which may have slowed down the process
Yes, I did disable the logging. I tried with 600, 500, 400, 300, 200 and 100. Saw OOM for 200. Will try with 150 and will update.
OOM triggered with 150 as well.
So, I will fix the formula to give you ~100 max_childs. Additionally, I will add a kernel command line parameter to set the maximum value manually.
please test http://people.redhat.com/harald/downloads/udev/udev-039-10.26.el4
I did not see any OOM if iscsi service is already running and we simply rescan the sessions after mapping all the LUNs. But there is one scenario when OOM gets triggered,i.e., when all the device are already discovered and iscsi service is restarted. Again, if we stop iscsi service and wait for some time before starting it again, OOM does not happen. Following are the relevant values Feb 7 12:14:40 localhost udevd[30513]: initialize max_childs to 99 Feb 7 12:14:40 localhost udevd[30513]: initialize max_childs_running to 24 Do we have to live with the long time taken to discover the devices and sluggishness of the host during that period?
Does it help to lower the UDEVD_MAX_CHILDS limit for a iscsi restart? And yes, I guess, you will have to live with the slow process to prevent OOM.
(In reply to comment #35) > Does it help to lower the UDEVD_MAX_CHILDS limit for a iscsi restart? > > And yes, I guess, you will have to live with the slow process to prevent OOM. That would end up making the OS startup time to (1+ hr of device discovery + fsck time) a very large number.
(In reply to comment #35) > Does it help to lower the UDEVD_MAX_CHILDS limit for a iscsi restart? > > And yes, I guess, you will have to live with the slow process to prevent OOM. I decreased the value of UDEVD_MAX_CHILDS to 80. Did not see any OOM for iscsi restart.
last question: if you raise UDEVD_MAX_CHILDS_RUNNING (the other limit) does it get faster without OOM?
I raised the value of UDEV_MAX_CHILDS_RUNNING to 80. The OOM did not happen but it did not help in decreasing the time either.
Ok, hopefully this runs out of the box without parameter tweaking. Please test: http://people.redhat.com/harald/downloads/udev/udev-039-10.27.el4 What interests me also, is what processes bring your system down? Can you provide a task list, when the iscsis are triggered with e.g. # pstree -p
Created attachment 331825 [details] pstree -p output
No OOM was seen for iscsi restart. Following are the values which udev sets on the machine Feb 13 18:18:56 localhost udevd[6950]: initialize max_childs to 78 Feb 13 18:18:56 localhost udevd[6950]: initialize max_childs_running to 24 Attached the pstree -p output.
Harald, In RHEL4's udev, is this necessary? if /sbin/lsmod | /bin/grep "^dm_multipath" > /dev/null 2>&1 && \ [ ${mpath_ctl_fail} -eq 0 ] && \ [ "${ACTION}" = "add" -a "${DEVPATH:7:3}" != "dm-" ] ; then /sbin/multipath -v0 ${DEVNAME} > /dev/null 2>&1 fi This is part of multipath.dev script. There was a similar situation in RHEL5.x (Bugzilla #460301). This looks similar to what has been commented in Comment #6 there. If disabling the above helps, we could just document it in the Rel. Notes.
(In reply to comment #45) > Harald, > > In RHEL4's udev, is this necessary? It is not _udev_ ... this is device-mapper-multipath $ rpm -qf /etc/dev.d/block/multipath.dev device-mapper-multipath-.... > > if /sbin/lsmod | /bin/grep "^dm_multipath" > /dev/null 2>&1 && \ > [ ${mpath_ctl_fail} -eq 0 ] && \ > [ "${ACTION}" = "add" -a "${DEVPATH:7:3}" != "dm-" ] ; then > /sbin/multipath -v0 ${DEVNAME} > /dev/null 2>&1 > fi > > This is part of multipath.dev script. > > There was a similar situation in RHEL5.x (Bugzilla #460301). This looks similar > to what has been commented in Comment #6 there. > > If disabling the above helps, we could just document it in the Rel. Notes. Does it help? If yes, please open a bug against device-mapper-multipath
Okay!! I've cloned the RHEL5 bugzilla to target for RHEL4. This is for the delayed discovery issue. https://bugzilla.redhat.com/show_bug.cgi?id=487502
Netapp, could you please test the package builds available at the url below as well? This package fixes this bug for 4.7.z (EUS z-stream), bug 481277. If you need additional arch packages to test, other than what's provided, please let me know. Thanks. http://people.redhat.com/cward/4.7.z/udev/
Ritesh, can you get this tested in both 4.8 and 4.7.z tests in both BZs?
~~ Attention Partners! ~~ RHEL 4.8Beta has been released on partners.redhat.com. There should be a fix present, which addresses this bug. Please test and report back results on this OtherQA Partner bug at your earliest convenience. If you encounter any issues, please set the bug back to the ASSIGNED state and describe any issues you encountered. If you have found a NEW bug, clone this bug and describe the issues you've encountered. Further questions can be directed to your Red Hat Partner Manager. If you have VERIFIED the bug fix. Please select your PartnerID from the Verified field above. Please leave a comment with your test results details. Include which arches tested, package version and any applicable logs. - Red Hat QE Partner Management
I tested on both RHEL 4.7 and RHEL 4.8. The OOM is no more seen. 1024 devices (512 LUNs * 2 paths) were mapped to the host. It took almost 1 hour to make all the SCSI and multipath entries. A point to be noted is that it takes around 20 mins to add all the /dev/sd* entries, rest of the time is taken to create multipath entries.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1004.html