Bug 78952

Summary: LTC1453-CS/Linux fails to start on RedHat Adv. Server 2.1
Product: Red Hat Enterprise Linux 2.1 Reporter: Need Real Name <khake>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: khoa
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-01-22 23:04:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Attachment (425.txt) is Message log for CS/Linux showing sna drivers not initiating
none
Traces and source code as per update none

Description Need Real Name 2002-12-03 21:27:10 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4)
Gecko/20011128 Netscape6/6.2.1

Description of problem:
    LTC1453-CS/Linux fails to start on RedHat Adv. Server 2.1

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
Hardware Environment: IA-32 RH AS 2.1

Software Environment:RH AS2.1


Steps to Reproduce:
1.Install RedHad Advance Server
2.Install Open Streams LiS (Streams)
3.Install Communications Server for Linux V6.001
4.try to start Communications Server (sna start) and it fails
	

Actual Results:  Failure reported in error log. Traces show the sna device
drivers are reporting "Device or resource busy"

Expected Results:  Cs/Linux should start. It has successfully started on:
and Linux kernels
- RH7.2 (2.4.7)
- RH7.3 (2.4.18)
- RH8.0 (2.4.18)
- SuSE 8.0 (2.4.18)
- SuSE 8.1 (2.4.19)
- SuSE Linux Enterprise Server (2.4.19)

Additional info:

Additional Information: This problem is blocking a larger roll out of the
Branch Server product for Linux project. The Cs/Linux is needed for the Red hat
Adv. 2.1 system with WAS, DB/2 and a suite of other applications.
Here is more details. The question at the end NEEDS TO BE ADDRESSED by Red Hat
Development:

We have reproduced the problem here with the same symptoms as you have seen.
However the cause appears to be O/S related and we would like you to use
your contacts within Redhat to see if you can get assistance before we
investigate further.

The drivers are loaded correctly (with insmod). The first driver that we
talk to is the non-streams trace driver /dev/sna_trace (snapixt). We issue
an open followed by a number of ioctls from the snaerrlog daemon prior to
starting up the Streams part of the product. On RH AS2.1 we see the
following behaviour:
- The open appears to be sent correctly from the daemon to our driver and an
OK return code of 0 causes a file handle to be assigned.
- The first ioctl that we issue does not get received by our driver's ioctl
code (as indicated by printks we have added). However the O/S replies with
an OK (0) return code.
- The second and subsequent ioctl calls arrive at our open routine in the
driver rather than our ioctl routine where we reject it with EBUSY (since we
only allow one open to this driver). This return code is reported back to
the daemon. This causes CS/Linux to fail to start.

You can reproduce the problem by using snaldmod or just loading snapix0 and
snapixt (and mknod of /dev/sna_trace) and running snaerrlog directly as
root. A failure is indicated by this program exiting. I have modified the
snaerrlog to ignore one bad return code and introduce 1 second delays
between the ioctl calls, I enclose the strace output.

Before I go to the next step of diagnostics which would be to try and write
a cut down driver and daemon that show this problem I wanted to find out if
this was a known issue with ioctls. Note that we are not using the
recommended _IO* macros to define command codes but using our own private
values, my understanding is that this should be OK.

Question:Is there something different about the RH AS 2.1 (2.4.9) that prevents
ioctls working in drivers?

 Additional comments from the submitter
We have found additional Linux Kernel information regarding this problem. It
appears the the 2.5.47 kernel has changed so that the sna drivers cannot
initiate. We need to find what changed so as to isolate a fix around what the
new mechanism is for starting drivers on the new kernel.

Attached are 2 e-mails from developers and I am attaching the logs supplied:

I have been playing around with a 2.5.47 kernel (generic
development kernel, not in any released distribution) to
see how CS/Linux would do on it.  The latest LiS does fine
on this kernel, so after making some changes to snalinux.c
and cc_snalinux I was able to generate an isolation module
and load the CS/Linux drivers.  Then I did 'sna start' and
got the same "Device Busy" message that we are seeing on
RHAS2.1.

Can you send me the debug snaerrlog and instructions so I
can confirm that it is the same cause?

If it is the same cause we would know that it is because of
a kernel change made by RedHat for AS2.1 that is being rolled
into the 2.5.* development kernels and which would eventually
find its way into a 2.6 production kernel.

------- Additionally ----------
Looks the same to me:


RedHat AS 2.1 started with the same base as RedHat 7.2,
but obviously they changed it (otherwise why have an AS2.1
to begin with).  Whether the changes flowed from AS2.1
into the 2.5 kernel or the other way around is irrelavent,
I think examples of both could be found.  The point is that
the change which affects CS/Linux was a deliberate one,
and one that was accepted by Linus & others for inclusion
in 2.5 and later.  Now if we can find what that change was
we would have a better chance of knowing how to work with it.

additional comment re attachment

Comment 1 Need Real Name 2002-12-03 21:29:16 UTC
Created attachment 87269 [details]
Attachment (425.txt) is Message log for CS/Linux showing sna drivers not initiating

Comment 2 Christoph Hellwig 2002-12-04 22:23:53 UTC
> The question at the end NEEDS TO BE ADDRESSED by Red Hat Development:

Don you think that's the right attitude to get a bug fixes?  Anyway your
description is very inaccurate, so if you want your problem debuged I'd suggest
you post some descriptive straces and a pointer to the sources of your module.

Comment 3 Need Real Name 2002-12-05 11:26:59 UTC
Created attachment 87484 [details]
Traces and source code as per update

Comment 4 Need Real Name 2002-12-05 11:27:17 UTC
I am the originator of this problem. I originally submitted it to the IBM Linux 
Technology Center in October, sorry that they did not pass on the strace file 
that I gave them. I include a gzip including:
- the strace showing the opens and ioctls that I described (the device 
is /dev/sna_trace) right at the end
- the messages file showing the printk messages that I added to prove that the 
ioctl commands ended up in our open function
- a part of the user space code that issues the open and ioctls (see line 236 
of svmtdaem.c and following)
- a part of the driver that expects the open and ioctls (see lines 1188 and 818 
of svmtrcdd.c).
It is not possible to send the full code, CS/Linux is a very large product 
comprising some 12500 files with kernel drivers in excess of 2.5M bytes).
Richard Hilditch
SNAP-IX Group
Data Connection Ltd.
Tel:	+44  20 8366 1177	Mail:	richard
Fax:	+44  20 8367 8501	Web:	http://www.dataconnection.com

Comment 5 Christoph Hellwig 2002-12-05 17:13:02 UTC
I guess it would be enough if you send a pointer to the location of that code
(i.e. website, ftp site, sourceforge project page, cvs repository)

Comment 6 Arjan van de Ven 2002-12-05 17:22:33 UTC
LiS is not supported

Comment 7 Jeff L Smith 2002-12-07 04:44:46 UTC
LiS is not being used when these calls fail. The Kernel is not passing the 
CS/linux ioctls to its own trace device driver correctly. The problem has 
nothing to do with LiS.

Jeff L Smith  Comm. Server Development IBM

Comment 8 Need Real Name 2002-12-18 22:09:54 UTC
Can you reinvestigate this since LiS is not being used at the time of call
failure. Reopening for your response.  Thanks.

Comment 9 Need Real Name 2003-01-09 15:20:02 UTC
Raising priority as this is very important to the Comm Server development team.

Comment 10 Need Real Name 2003-01-13 17:10:36 UTC
Adding comments from Comm Serv development
------- Additional Comment #20 From Paul Landay(landay.com)  2003-01-13
07:24 -------

The problem occurs with the RHAS2.1 2.4.9-e.9 kernels:
  http://rhn.redhat.com/errata/RHSA-2002-227.html
We have not tried the 2.4.9-e.10 kernels yet.
------- Additional Comment #21 From Paul Landay(landay.com) 2003-01-13
09:34 -------

I've now tried the 2.4.9-e.10 kernels and it still happens
with that kernel also.

Comment 11 Need Real Name 2003-01-22 23:04:34 UTC
Cancelling bug as problem is determined not to be in Linux code.