491363 – fwparam_ppc() causes stack corruption

Bug 491363 - fwparam_ppc() causes stack corruption

Summary: fwparam_ppc() causes stack corruption

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	iscsi-initiator-utils
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Mike Christie
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-03-20 16:15 UTC by Hans de Goede
Modified:	2012-06-26 16:08 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-04-27 20:23:23 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dont use strncat for ppc OF boot path setup (1.12 KB, application/octet-stream) 2009-03-20 19:09 UTC, Mike Christie	no flags	Details
fix double close and unint var (1.23 KB, application/octet-stream) 2009-04-12 03:36 UTC, Mike Christie	no flags	Details
PATCH fixing crash when parsing ppc firmware for a second time. (1.67 KB, patch) 2009-04-27 18:40 UTC, Hans de Goede	no flags	Details \| Diff
Show Obsolete (2) View All

Description Hans de Goede 2009-03-20 16:15:08 UTC

On systems which have a /proc/device-tree directory, and where fwparam_ppc() thus
is supposed to actually do something, it causes memory (probably stack) corruption, see: bug 490515

Comment 1 Mike Christie 2009-03-20 17:28:46 UTC

(In reply to comment #0)
> On systems which have a /proc/device-tree directory, and where fwparam_ppc()
> thus
> is supposed to actually do something, it causes memory (probably stack)
> corruption, see: bug 490515  

In those setups, is iscsi being used? Is nfs or some other net based boot used?

Comment 2 Hans de Goede 2009-03-20 17:45:09 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > On systems which have a /proc/device-tree directory, and where fwparam_ppc()
> > thus
> > is supposed to actually do something, it causes memory (probably stack)
> > corruption, see: bug 490515  
> 
> In those setups, is iscsi being used? Is nfs or some other net based boot used?  

Adding jlaska to the CC, He should know, James can you answer this please?

Comment 3 James Laska 2009-03-20 17:55:03 UTC

I've not yet tested any open-firmware enabled iSCSI systems.  I believe these might be the QS22 (ps3 cell) blades, but I have yet to test this hardware.  The systems I see this on all have local storage

Comment 4 Mike Christie 2009-03-20 19:09:43 UTC

Created attachment 336106 [details]
dont use strncat for ppc OF boot path setup

It looks like the strncat use in fwparam_ppc is not right. Not sure if this is the problem, but if we called fwparam_ppc multiple times I think it could.

1. if fwparam_ppc is run multiple times, it looks like we would continue adding  on to the previous use's string.

2. if filepath is not passed in (it is not by default), then it looks like we pass in the size of the buffer, because fplen is never set in that path.

This patch just converts the code to use snprintf. I also memsetted the buffer to make sure it was clear before each call.

Comment 5 Hans de Goede 2009-03-23 19:31:23 UTC

jlaska,

I've made a new updates.img, which contains a version of libiscsi, which reverts
my workaround and instead uses mchristie's patch, can you give this one a try please? :

http://people.atrpms.net/~hdegoede/updates.img

Comment 6 James Laska 2009-03-24 13:05:55 UTC

Testing latest rawhide with updates=http://people.atrpms.net/~hdegoede/updates.img and connecting to VNC shows only a black screen.

Comment 7 Hans de Goede 2009-03-24 14:30:04 UTC

(In reply to comment #6)
> Testing latest rawhide with
> updates=http://people.atrpms.net/~hdegoede/updates.img and connecting to VNC
> shows only a black screen.  

mchristie: translated: your patch attached here does not fix the problem.

Comment 8 Mike Christie 2009-04-12 03:36:03 UTC

Created attachment 339207 [details]
fix double close and unint var

Maybe this is it. I found a double close in the ppc code. This fixes that and the uninitialized string.

Comment 9 Hans de Goede 2009-04-17 09:27:41 UTC

Mike, I doubt this new patch is going to help. If you really want me too I can create an updates.img with a new libiscsi with this patch in for jlaska to test, but I don't think that will get us anywhere. closing an already closed fd will
simply make the kernel return EBADF. Now if this was an fclose() then this might
be the cause, but repeating a normal close I seriously doubt causing this.

What you can do is try to create a simple test.c witha  main which calls fwparam_ppc(), and run that through valgrind on ppc (do we have valgrind on ppc?)

jlaska, could you give us ssh access to this box with a recent F-11 install on it, then I can poke around and see if I can reproduce this.

Comment 10 Mike Christie 2009-04-17 15:25:15 UTC

(In reply to comment #9)
> Mike, I doubt this new patch is going to help. If you really want me too I can
> create an updates.img with a new libiscsi with this patch in for jlaska to
> test, but I don't think that will get us anywhere. closing an already closed fd
> will
> simply make the kernel return EBADF. Now if this was an fclose() then this
> might
> be the cause, but repeating a normal close I seriously doubt causing this.
> 

Yeah, not sure what I was thinking. I thought this had happened before.

> What you can do is try to create a simple test.c witha  main which calls
> fwparam_ppc(), and run that through valgrind on ppc (do we have valgrind on
> ppc?)

I was actually running valgrind on iscsistart.  I was only able to get a ppc box with RHEL 5 (the rhts ppc fedora installs kept failing at the time). I then ran the fedora code.

Let me try to get a ppc box with F-11 beta.


> 
> jlaska, could you give us ssh access to this box with a recent F-11 install on
> it, then I can poke around and see if I can reproduce this.

Comment 11 Hans de Goede 2009-04-27 18:37:11 UTC

Ok,

I've been debugging this today with remote access to jlaska's ibm ppc machine
(thanks James) and I've fixed this.

The problem was both the use of strncat where strncpy (or better snprintf) should be used and the fact that it is using a global counter for dev_count and nic_count, and doesn't reset that when fwparam_ppc() gets called a second time.

To make things worse it also used the dev_count global var as a local var in
fwparam_ppc(), so that it got set to 1 even if there were no devices triggering
the bug even on machines without iscsi.

The trick to reproduce was to call fwparam_ppc() twice. The segfault then
was a simple null pointer deref inside fwparam_ppc.c due to the wrong dev_count.

I'll attach a patch which fixes this, and I'll build a fixed version for F-12, for F-11 its better to stick with the workaround given how close we are to the release.

Comment 12 Hans de Goede 2009-04-27 18:40:04 UTC

Created attachment 341469 [details]
PATCH fixing crash when parsing ppc firmware for a second time.

Mike, can you please make sure this patch gets applied to RHEL-5.4 too ? Let me know if you need a bug for that. Given that I plan to use libiscsi in anaconda in 5.4, we really need this fix.

Comment 13 Hans de Goede 2009-04-27 19:59:23 UTC

Here is an updates.img with a patched ppc build of libiscsi in it:
http://people.atrpms.net/~hdegoede/updates-491363-ppc.img

I would be much obliged if you could do a test install of F-11 on that ibm ppc machine, this removes the do not try to read iscsi config from firmware on ppc hack, and replaces it with a proper fix.

Comment 14 James Laska 2009-04-27 20:22:16 UTC

I have tested this updates.img with latest rawhide (anaconda-11.5.0.47) and I am able to connect to the vnc server ,and proceed through a vnc install.

Comment 15 Hans de Goede 2009-04-27 20:23:23 UTC

closing.

Note You need to log in before you can comment on or make changes to this bug.