750357 – In firebird booting of Fedora16 is failing if the lpar is allocated with 2 GB RAM

Bug 750357 - In firebird booting of Fedora16 is failing if the lpar is allocated with 2 GB RAM

Summary: In firebird booting of Fedora16 is failing if the lpar is allocated with 2 GB...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	16
Hardware:	ppc64
OS:	All
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-10-31 19:40 UTC by IBM Bug Proxy
Modified:	2012-03-05 14:30 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-02-29 00:18:47 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
2gb boot output (109.71 KB, text/plain) 2011-10-31 19:41 UTC, IBM Bug Proxy	no flags	Details
4Gb messages (40.71 KB, text/plain) 2011-10-31 19:41 UTC, IBM Bug Proxy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	76185	0	None	None	None	Never

Description IBM Bug Proxy 2011-10-31 19:40:57 UTC

1. We have an LPAR on a Firebird (system type 7895) allocated with 2GB RAM 
2. Installed Fedora 16 in that, which is successful.
3. In the end when it is booting back up, the reboot got hung by telling the message

Kernel Panic - not syncing: Out of memory and no killable processes

But once we allocate 4GB to it's RAM, then we found rebooting is happening is successfully. Where as in Juno iocl (8246) lpars it seems rebooting is successful for 2GB memory also.

setup details - 
fbfirebird02 - 2GB - booting unsucessful
fbfirebird07 - 4GB - booting successful

We have attached two documents
1. Attached document is the /var/log/message of 4GB lpar where booting of fedora is successful.
2. Whereas another document refers to the terminal console message which appears during booting of fedora

We tried patching the kernel with the oom-killer patch from RH Bug 741207, but the issue was not resolved.

Comment 1 IBM Bug Proxy 2011-10-31 19:41:11 UTC

Created attachment 531028 [details]
2gb boot output

Comment 2 IBM Bug Proxy 2011-10-31 19:41:22 UTC

Created attachment 531029 [details]
4Gb messages

Comment 3 IBM Bug Proxy 2011-11-02 09:50:33 UTC

------- Comment From anton.com 2011-11-02 05:40 EDT-------
I had a look at this issue by booting one of my POWER6 Fedora16 boxes with mem=2G. A few observations:

1. The initramfs isn't removed after boot:

# du -s /run/initramfs
127040	/run/initramfs/

That might be by design, but it does use up over 100MB of memory.

2. There is over 1GB of memory in slab:

OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
118745  13237  11%    2.30K   2159       55    276352K kmalloc-2048
292000  17944   6%    0.80K   3650       80    233600K kmalloc-512
402714   4723   1%    0.55K   3442      117    220288K kmalloc-256
349948   6545   1%    0.36K   1966      178    125824K kmalloc-64
3596    550  15%   32.00K    230       16    117760K thread_info
4650   4171  89%   16.30K    150       31     76800K kmalloc-16384

Notice in particular how low the utilisation is for the top 5: 15% or below. This box has 4 NUMA nodes and 64 HW threads. If I boot with smt=0 nr_cpus=1 such that we only start 1 HW thread, we see a much nicer picture:

OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
4095   4093  99%   16.30K    273       15     69888K kmalloc-16384
10719  10697  99%    2.30K    397       27     25408K kmalloc-2048
17150  17105  99%    1.30K    350       49     22400K kmalloc-1024
25705  25608  99%    0.66K    265       97     16960K blkdev_requests
24684  24561  99%    0.62K    242      102     15488K skbuff_head_cache
6420   6413  99%    2.07K    214       30     13696K blkdev_queue

Only 200MB in slab usage, much better.

I would have expected some wastage due to slub SMP optimisations (per cpu and per node pools), but utilisations of 6% and even 1% are pretty awful. It looks like there is an issue with slub and SMP.

Comment 4 Dave Jones 2011-11-03 19:27:44 UTC

filed 751189 for the initramfs thing.

I trust you'll bring up the slub problem with upstream ?

thanks.

Comment 5 Harald Hoyer 2011-11-04 15:10:20 UTC

(In reply to comment #3)
> ------- Comment From anton.com 2011-11-02 05:40 EDT-------
> I had a look at this issue by booting one of my POWER6 Fedora16 boxes with
> mem=2G. A few observations:
> 
> 1. The initramfs isn't removed after boot:
> 
> # du -s /run/initramfs
> 127040 /run/initramfs/
> 
> That might be by design, but it does use up over 100MB of memory.
> 

To turn it off:
# echo "unset prefix" >> /etc/dracut.conf.d/99-my.conf
# dracut -f

Comment 6 IBM Bug Proxy 2011-11-07 10:51:15 UTC

------- Comment From anton.com 2011-11-07 05:43 EDT-------
I found a memory leak in the SCSI layer that was responsible for about 15MB on my POWER7 box.

commit f7c9c6bb14f3 ([SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev), marked stable@ so it should make it's way back to 3.1.X.

Dave: the slub issue is next on my list.

Comment 7 IBM Bug Proxy 2011-11-07 11:00:32 UTC

------- Comment From anton.com 2011-11-07 05:54 EDT-------
The ehea driver (IBM 1G/10G ethernet) is always filling the jumbo ring. That's worth about 60-70M of memory per interface.

Since jumbo frame usage should be very rare, we should only fill the jumbo ring when we increase the MTU. I've brought this issue up with the ehea maintainer.

Comment 8 IBM Bug Proxy 2011-12-03 00:20:34 UTC

------- Comment From anton.com 2011-12-02 19:14 EDT-------
I worked out why my low memory tests caused slub to consume much more memory. By clamping memory to 2GB, I had one NUMA node of memory and 4 NUMA nodes of CPUs. For the 3 nodes without memory the slub code would always go through the remote node alloc and free path.

The long term fix is possibly HAVE_MEMORYLESS_NODES, but there is no chance of that for FC16. It would be nice to fix it more generally though, I would think other architectures see this with unbalanced CPU/memory layouts,

Comment 9 Josh Boyer 2012-02-29 00:18:47 UTC

Comments #4 and #5 address the initramfs thing.

The scsi fix mentioned in comment #6 is in the 3.2 kernel.

The ehea fix in comment #7 seems to be aa9084a01a7893a9f4bed98aa29081f15d403a88, which is also in 3.2.

Comment #8 basically suggests this is a NUMA balance problem that isn't going to be fixed anytime soon.

Now that F16 is on the 3.2 kernel, I think this bug can be closed out.

Note You need to log in before you can comment on or make changes to this bug.