Bug 139949

Summary:

sym driver creates voluminous /var/log/messages entries

Product:

Red Hat Enterprise Linux 4

Reporter:

Marc Williams <marcjw53>

Component:

kernel

Assignee:

Mike Christie <mchristi>

Status:

CLOSED ERRATA

QA Contact:

Severity:

high

Docs Contact:

Priority:

medium

Version:

4.0

CC:

asheriff, coughlan, cugase2, davej, djuran, ekanter, galens, jturner, noc, sasha, seauyeen, smann, stefan.skopnik, steved, trevor, wtogami

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

URL:

http://www.redhat.com/archives/fedora-list/2004-November/msg04654.html

Whiteboard:

Fixed In Version:

RHSA-2006-0132

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2006-03-07 18:31:58 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

168348, 168429

Attachments:

Description	Flags
/var/log/messages as described earlier	none
updated patch with fewer parentheses	none

Description Marc Williams 2004-11-18 21:35:10 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0

Description of problem:
Booting to this new kernel this morning produced much and continuous
/var/log/messages both during boot up and after.  It wouldn't stop. 
So I rebboted to a previous kernel.  Here's a sample of what it produced:

Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2e37a784
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:1:0:phase change 6-7 11 2c972784
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2e336b84
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:2:0:phase change 6-7 11 2c972784
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:2:0:phase change 6-7 11 2c972784
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2c972784
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:1:0:phase change 6-7 11 2e336b84
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:2:0:phase change 6-7 11 2c972784
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2c972784
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:1:0:phase change 6-7 11 2e336b84
resid=6.
Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2c972784
resid=6.

This went on for tens of thousands of lines.  The log file was growing
at a phenomenal rate and showed no signes of letting up.  After about
30 minutes and a 10MB log file, I rebooted into the previous kernel.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-1.3_FC2

How reproducible:
Always

Steps to Reproduce:
1. reboot into the new kernel.

2.
3.
    

Actual Results:  the log file entries described above.

Additional info:

I did not try the non-smp kernel.

Comment 1 Marc Williams 2004-11-18 23:26:50 UTC

Here's something else that may (or may not) be usefull.  It's a
section of the log when these entries first started appearing after a
reboot:

Nov 18 07:37:42 server1 kernel: Total of 2 processors activated
(1773.56 BogoMIPS).
Nov 18 07:37:42 server1 kernel: ENABLING IO-APIC IRQs
Nov 18 07:37:42 server1 kernel: ..TIMER: vector=0x31 pin1=2 pin2=0
Nov 18 07:37:42 server1 kernel: checking TSC synchronization across 2
CPUs: passed.
Nov 18 07:37:42 server1 kernel: Brought up 2 CPUs
Nov 18 07:37:42 server1 kernel: zapping low mappings.
Nov 18 07:37:42 server1 kernel: checking if image is initramfs...it
isn't (no cpio magic); looks like an initrd
Nov 18 07:37:42 server1 kernel: Freeing initrd memory: 323k freed
Nov 18 07:37:42 server1 kernel: NET: Registered protocol family 16
Nov 18 07:37:42 server1 kernel: PCI: PCI BIOS revision 2.10 entry at
0xfdaf0, last bus=0
Nov 18 07:37:42 server1 kernel: PCI: Using configuration type 1
Nov 18 07:37:42 server1 kernel: mtrr: v2.0 (20020519)
Nov 18 07:37:42 server1 kernel: ACPI: Subsystemabf84 resid=6.
Nov 18 07:37:42 server1 kernel: sym0:1:0:phase change 6-7 11@2f2abf84
resid=6.
Nov 18 07:37:42 server1 kernel: sym0:2:0:phase change 6-7 11@2f2abf84
resid=6.
Nov 18 07:37:42 server1 kernel: sym0:0:0:phase change 6-7 11@2f2abf84
resid=6.
Nov 18 07:37:43 server1 kernel: sym0:2:0:phase change 6-7 11@2f398384
resid=6.

Almost makes me wonder if it's not an ACPI thing...

Comment 2 Marc Williams 2004-11-19 14:41:07 UTC

I experimented with the command line a bit this morning.  I tried
acpi=on|off, noapic and various combinations.  None made any
difference.  Fwiw, I run acpi=off with my other (working) kernels.

Comment 3 Marc Williams 2004-11-22 15:33:26 UTC

I just updated to the new kernel-smp-2.6.9-1.6_FC2 kernel and it
exhibits the exact same problem behavior as kernel-smp-2.6.9-1.3_FC2 did.

I've go a somewhat slimmed down copy of log/messages (19k gzipped)
showing all the errors that I'll attempt to upload as an attachment.

Comment 4 Marc Williams 2004-11-22 15:35:48 UTC

Created attachment 107183 [details]
/var/log/messages as described earlier

messages file showing described errors for kernel-smp-2.6.9-1.6_FC2

Comment 5 Dave Jones 2004-11-25 06:38:24 UTC

*** Bug 140530 has been marked as a duplicate of this bug. ***

Comment 6 Marc Williams 2005-01-08 13:15:53 UTC

I just tried the newest kernel 2.6.9.1-11 and it too fails with the
same results.  In other words, the last realeased working kernel is
still 2.6.8-1.521.  Which means I am missing out on other bug and
security fixes by not being able to upgrade.

I have now had two other people contact me, including one Suse user,
saying that they too have the same problem and wondering about solutions.

Comment 7 Marc Williams 2005-01-12 01:07:53 UTC

Just brought my system up to date including the latest 2.6.10 kernel
and once again I have to report that the same bug still exists.  It's
still back to 2.6.8-1.521 for a working machine.

How come there has been no progress with this?

Comment 8 Marc Williams 2005-01-16 14:33:14 UTC

Dang it!

Once again I tried with the latest kernel that appeared in my up2date.
 This time it was 2.6.10-1.9 and it too prodecued the same results as
all previous 4 kernels.  So it's back to 2.6.8-1.521.

Comment 9 Marc Williams 2005-02-05 14:27:28 UTC

The 2.6.10-1.12 kernel that I just tried this morning still doesn't
fix this.

Comment 10 Marc Williams 2005-02-06 01:29:46 UTC

Googling for "linux kernel phase change 2.6.9" returned oodles of hits
about this very problem on several different mailing lists.  It seems
this problem was known about since at least 2.6.9-rc2 way back in
Sept, 04.  Weird how this has managed to stay unsolved this long.  FYI

Comment 11 Steffen Mann 2005-02-18 08:13:08 UTC

*** Bug 140023 has been marked as a duplicate of this bug. ***

Comment 12 Marc Williams 2005-02-18 12:44:33 UTC

Rats.  I just tried 2.6.10-1.14 the results of which are exactly the same, i.e.
still not fixed.

Comment 13 Jim Hook 2005-03-05 13:04:44 UTC

I am a newbie to Linux, I also have a similar system log issue.  This 
is my first posting to Bugzilla so I am not 100% sure of the 
procedure - I hope the information below is what is needed.

The system log has line after line of messages similar to:

Mar 5 04:38:34 localhost kernel: sym1:0:0:phase change 6-7 
11@01e59f84 resid=2.

The number: 11@01e59f84 changes.

From the Gnome Log Viewer - RPM Packages log I show:

kernel-2.6.10-1.760_FC3.i386.rpm
and
kernel-smp-2.6.10-1.760_FC3.i386.rpm

I am using an older (circa 1997) Intergraph TDZ2000 GT1 computer with 
dual 400 mhz chips.  I have two buddies who purchased similar 
machines (firesale - $10 each) - one uses Redhat9 and the other uses 
Gentoo.  They do not have this issue.

(As an aside, what is the best procedure for clearing this file (i.e. 
deleting it) from time to time?)

Comment 14 Marc Williams 2005-03-16 12:34:48 UTC

Trying yet another, latest kernel - 2.6.10-1.770_FC2 - still has the same
problem.  The latest kernel to not exhibit this issue is still 2.6.8-1.521 and
that's what I am still reverting back to after trying each of the latest kernels.

Quite discouraging thinking that I'll never get to upgrade the kernel on this
very nice machine.

Comment 15 Sasha Borodin 2005-03-21 16:33:38 UTC

> Trying yet another, latest kernel - 2.6.10-1.770_FC2 - still has the same
> problem.  The latest kernel to not exhibit this issue is still 2.6.8-1.521 and
> that's what I am still reverting back to after trying each of the latest kernels.

Please excuse the novice nature of this comment... but should we report this on some kernel 
development list?  Or has this already been done?

Comment 16 Dave Jones 2005-03-21 20:39:29 UTC

the folks at linux-scsi.org will probably be interested, though I'm
not sure if this has already been reported.  It's worth trying the .11 test kernel
(http://people.redhat.com/davej/kernels/Fedora/FC3/) first to be sure it hasn't
already been fixed.

Comment 17 Sasha Borodin 2005-03-21 22:55:58 UTC

Is there any commonalities between those reporting the problem other than smp machines?  Are any/all 
of you using software RAID for the root partition?

Comment 18 Marc Williams 2005-03-22 00:19:27 UTC

(In reply to comment #17)
> Is there any commonalities between those reporting the problem other than smp
machines?  Are any/all 
> of you using software RAID for the root partition?

Yes, I am running software raid.  Here's some info in case it's helpful:

[root@server1 marcw]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md3               4126912   2632512   1284764  68% /
/dev/md0                100890     13305     82376  14% /boot
none                    387272         0    387272   0% /dev/shm
/dev/md4              10056520    196912   9348760   3% /home
/dev/md2               2071224    809996   1156012  42% /var
/dev/hda1            115377640  18140984  91375744  17% /extra1

[root@server1 marcw]# mount
/dev/md3 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/md0 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/md4 on /home type ext3 (rw)
/dev/md2 on /var type ext3 (rw)
/dev/hda1 on /extra1 type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)

[root@server1 marcw]# cat /proc/mdstat
Personalities : [raid1] [raid5]
md3 : active raid5 sdc2[2] sdb2[1] sda2[0]
      4192768 blocks level 5, 256k chunk, algorithm 0 [3/3] [UUU]

md2 : active raid5 sdc3[2] sdb3[1] sda3[0]
      2104320 blocks level 5, 256k chunk, algorithm 0 [3/3] [UUU]

md1 : active raid5 sdc5[2] sdb5[1] sda5[0]
      1043968 blocks level 5, 256k chunk, algorithm 0 [3/3] [UUU]

md4 : active raid5 sdc6[2] sdb6[1] sda6[0]
      10216960 blocks level 5, 256k chunk, algorithm 0 [3/3] [UUU]

md0 : active raid1 sdc1[2] sdb1[1] sda1[0]
      104192 blocks [3/3] [UUU]

unused devices: <none>

Comment 19 Marc Williams 2005-03-22 00:28:37 UTC

(In reply to comment #16)
> the folks at linux-scsi.org will probably be interested, though I'm
> not sure if this has already been reported.  It's worth trying the .11 test kernel
> (http://people.redhat.com/davej/kernels/Fedora/FC3/) first to be sure it hasn't
> already been fixed.

As much as I'd love to try the new FC3 kernel, I wrote up this bug for my FC2
machine.  There's no way I can even try FC3 until this bug is squashed.  At that
point, I would love nothing better than to finally be able to migrate.  Perhaps
you could advise when .11 is made available for us FC2 types.

Comment 20 Sasha Borodin 2005-03-22 05:32:44 UTC

I oppoligize for being slightly off topic, but along the lines of DEALING with this bug:

Marc, you mentioned that the latest "working" kernel is 2.6.8.  Did you upgrade to this version via the 
usual channels (yum or up2date), or did you have to dig up an archived version of this older kernel and 
tweak anything to make it work with FDC (FC3?).

Thanks,

-Sasha

Comment 21 Marc Williams 2005-03-22 12:07:44 UTC

(In reply to comment #20)
> Marc, you mentioned that the latest "working" kernel is 2.6.8.  Did you
upgrade to this version via the 
> usual channels (yum or up2date), or did you have to dig up an archived version
of this older kernel and 
> tweak anything to make it work with FDC (FC3?).

I don't know where you got the idea that I'm running FC3, unless from a typo. 
As is stated several times in this thread, mine is a fc2 machine.  I never had
to "dig up" 2.6.8 - it was current at one time.  It was simply all subsequent
kernels that failed.

Comment 22 Steffen Mann 2005-03-22 12:44:30 UTC

davej,
was checking your kernel kernel-2.6.11-1.7_FC3.i686.rpm that gets rid of the
phase changes in /var/log/messages, will need to verify later when I am at home
if I can see speed inceases as well. I would assume so, as the nasty logging
took time...

Cheers,

Steff

Comment 23 Marc Williams 2005-03-29 16:33:08 UTC

I just tried 2.6.10-1.771_FC2 announced yesterday with the same results as all
the rest after 2.6.8-1.521.

Since Steffan Mann indicated some promise in using 2.6.11 under FC3, I would
still love to try this with FC2 whenever (if?) it becomes available for FC2.

Comment 24 Dave Jones 2005-04-16 05:10:50 UTC

Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.

Comment 25 Eugene Kanter 2005-04-17 00:56:22 UTC

Dave, per your instructions I am reopening this bug, since it is 100%
reporoduced in fresh new Enterprise Linux v4 install.

kernel 2.6.9-5.ELsmp #1 SMP

root filesystem is on a RAID1 volume, similar to (comment #18)

Comment 26 Eugene Kanter 2005-04-17 02:30:49 UTC

Installing latest FC3 kernel 2.6.11-1.14_FC3smp fixes the issue.

Comment 27 Stnet NOC 2005-04-17 07:51:14 UTC

(In reply to comment #26)
> Installing latest FC3 kernel 2.6.11-1.14_FC3smp fixes the issue.

Sorry, I can't agree. Just did it and the file /var/log/messages still
reports hundreds of: 
Apr 17 09:39:39 ncc1701 kernel: sym0:5:0:phase change 6-7 9@36e4bb84 resid=7.

However, performances seems not to be affected.

Comment 28 Eugene Kanter 2005-04-18 22:16:52 UTC

I tested two compaq DL1600 server with different internals.

On an older machine original EL v4 kernel hdparm -t /dev/sdb (SEAGATE  Model:
SX150176LC) shows around 3 mb/sec with tons of messages, 2.6.11-1.14_FC3 shows
over 20 mb/sec with no messages at all.

On a newer machine there still are messages but much less. I'll move disk to it
and test drive speed later. Original kernel speed was the same 3 mb/sec on this
particular disk. What interesting is that old 9 Gig drive speed is over 10
mb/sec. There is definitely some problems with the driver/controller recognizing
big disks. But controllers reported by lspci are the same on both.

Comment 29 Eugene Kanter 2005-04-19 13:58:21 UTC

(In reply to comment #28)
Correction: On both systems kernel 2.6.11-1.14_FC3smp works without filling log
with "phase change" messages.

Comment 30 Trevor Cordes 2005-04-19 22:26:20 UTC

I'm getting this exact same problem.  My experience might help track down the
bug.  I was running FC3 with NO ERRORS up to and including version
kernel-2.6.10-1.770_FC3.  I just tried to boot into kernel-2.6.11-1.14_FC3 and
this is the first time I've seen this bug.  I get about 15 log entries a second
in 2.6.11.  If I switch back to 2.6.10 I have zero problems.

I have an IBM OEM sym53c8xx card (with no BIOS, from an old RS/6000!) that runs
alongside an Adaptec aic7xxx 29160 (with BIOS).  I run 3 CD, 2 tape and 1 ZIP
drives off the sym card.

At this point I would say that 2.6.11 is unusable with the sym card and it
DEFINITELY affects the latest FC3 kernel.

Comment 31 Trevor Cordes 2005-06-26 15:07:39 UTC

Just tested with kernel-2.6.11-1.35_FC3 and the problem persists.  I'm still
stuck having to run kernel-2.6.10-1.770_FC3 so as to not fill my disk with the
errors.  

Is this problem still being looked into?  It seems to have wide appeal.  Anyone
with this problem still, make some noise here so we know you're still there.

Comment 32 Marc Williams 2005-06-26 15:39:44 UTC

(In reply to comment #31)
> Just tested with kernel-2.6.11-1.35_FC3 and the problem persists.  I'm still
> stuck having to run kernel-2.6.10-1.770_FC3 so as to not fill my disk with the
> errors.  
> 
> Is this problem still being looked into?  It seems to have wide appeal.  Anyone
> with this problem still, make some noise here so we know you're still there.
> 

Hey at least you got into 2.6.10.  Because everything newer causes errors, I'm
stuck at 2.6.8-1.521 which means I can't move off of FC2.

Comment 33 Mike Christie 2005-06-28 00:29:44 UTC

I think this is fixed in kernel.org/vanilla 2.6.12. Could you guys try that
kernel out and report back so we can see what we need to do next? Thanks and
sorry for the troubles!

Comment 34 Trevor Cordes 2005-07-29 22:11:39 UTC

I just installed the latest FC3 kernel-2.6.12-1.1372_FC3 and this problem
persists.  I have no idea how to do the "vanilla" kernel -- I only know how to
rpmbuild the FC one.  Is the change mentioned supposed to be in the 1372 kernel?
 If not, will it be in the next one?

Comment 35 Mike Christie 2005-07-29 22:41:19 UTC

No nead to test vanilla. The fix that I thought would fix your problem went into
vanilla 2.6.12 then as the kernel-2.6.12-1.1372_FC3 name suggests we base that
kernel off of vanilla 2.6.12 so the fix was there.

Thanks for testing and replying. I will coninue looking into this now that I
know it does not help everyone.

Comment 36 Trevor Cordes 2005-08-02 02:59:11 UTC

I'm keeping my sym card in my system for the time being but now rmmod it after
boot so I can run 2.6.12 without the bug biting me.  Of course I can't access my
SCSI devices on that bus, but they're not critical.  As such, I can easily test
the bug on each new FC3 kernel release and report back.  If you think something
else might fix it, let me know.  I often recompile the FC kernel from source so
I can also maybe try the odd patch.

Comment 46 Galen Seitz 2005-11-01 08:17:46 UTC

Is it possible that this bug is related to the one that is discussed below?

http://marc.theaimsgroup.com/?t=110048286400004&r=1&w=2
http://comments.gmane.org/gmane.comp.freedesktop.hal/2456
https://bugs.freedesktop.org/show_bug.cgi?id=1852

The quick summary is that some tools perform PCI config reads to addresses
that are really meant to be internal to the Symbios controller.

My system has a Tekram DC-390U2W (SYM53C895) that gives a scsi parity
error coincident with haldaemon starting.  If I prevent haldaemon from starting,
the parity error does not occur.  The kernel is 2.6.9-22.0.1.EL.

Comment 47 Jeff Layton 2005-11-01 14:42:27 UTC

I don't think that problem is related to the one reported here. The phase change
6-7 errors seem to be due to the inappropriate use of PPR on the bus. The
problem you describe seems to be unrelated to this one.

Comment 49 Trevor Cordes 2005-11-02 17:10:32 UTC

I agree, comment #46 links appear to be unrelated.  Interesting though how they
talk about sym problems in the other distros but don't mention anything about
phase change errors.  Perhaps this indicates the problem is RH/FC specific and
not vanilla kernel?  Just a guess.

Comment 50 Trevor Cordes 2005-11-02 17:23:42 UTC

Hey, I hadn't tested this bug on an updated kernel in a while.  I just
modprobe'd in my sym mod and watched the logs and there doesn't appear to be any
phase change errors (yet).

Hey, I think I'm onto something here.  I started the module with my external
SCSI case off.  No phase errors.  So with just my 3 internal SCSI CD drives and
1 SCSI tape drive, there are no errors.

I rmmod'd the sym.  I turned on my external SCSI box, which has another tape
drive and a ZIP drive.  I modprobed the sym back in and instantly I get phase
errors.

So the problem seems related only to when my SCSI bus is using the external
devices.  I just double-checked and my external box is properly connected by an
actual plug-in terminator on the out port of the box.  All the connections seem
secure.

The total SCSI bus length, according to proper SCSI-2 rules is under 6'.

The logs shed some light.  The phase error appears the instant the module starts
poking at the crappy tape drive in my external box.  The drive was working
before these phase issues came into the kernel, but it is a piece of crap QIC or
TR2 or something drive that I don't use anymore.  See log output below.  I hope
I'm not reading the timing of the phase change wrong.  In the output, the phase
change shown is the very first one showing up.

Perhaps the other people with this bug could list what devices appear to trigger
this bug when they first load the module?

I will see if I can remove that tape drive from my system sometime soon and re-test.

As I mentioned way back in this bug, my setup was phase-error free in an older
kernel.  My SCSI setup hasn't changed at all.

Nov  2 11:16:01 pog kernel:   Vendor: IOMEGA    Model: ZIP 100           Rev: E.08
Nov  2 11:16:01 pog kernel:   Type:   Direct-Access                      ANSI
SCSI revision: 02
Nov  2 11:16:01 pog kernel:  target8:0:4: Beginning Domain Validation
Nov  2 11:16:01 pog kernel:  8:0:4:0: phase change 6-7 9@1a77a3a0 resid=7.
Nov  2 11:16:01 pog last message repeated 3 times
Nov  2 11:16:01 pog scsi.agent[26672]: cdrom at
/devices/pci0000:00/0000:00:1e.0/0000:02:04.0/host8/target8:0:3/8:0:3:0
Nov  2 11:16:01 pog kernel:  target8:0:4: Ending Domain Validation
Nov  2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3a0 resid=7.
Nov  2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3ac resid=7.
Nov  2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3a0 resid=7.
Nov  2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3ac resid=7.
Nov  2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3a0 resid=7.
Nov  2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3ac resid=7.
Nov  2 11:16:01 pog kernel: Attached scsi removable disk sdc at scsi8, channel
0, id 4, lun 0
Nov  2 11:16:01 pog kernel:   Vendor: CONNER    Model: CTT8000-S         Rev: 1.17
Nov  2 11:16:01 pog kernel:   Type:   Sequential-Access                  ANSI
SCSI revision: 02
Nov  2 11:16:01 pog kernel:  target8:0:5: Beginning Domain Validation
Nov  2 11:16:01 pog kernel:  target8:0:5: asynchronous.
Nov  2 11:16:01 pog kernel:  target8:0:5: Domain Validation skipping write tests
Nov  2 11:16:01 pog scsi.agent[26702]: disk at
/devices/pci0000:00/0000:00:1e.0/0000:02:04.0/host8/target8:0:4/8:0:4:0
Nov  2 11:16:01 pog kernel:  target8:0:5: FAST-5 SCSI 5.0 MB/s ST (200 ns, offset 8)
Nov  2 11:16:01 pog kernel:  target8:0:5: Ending Domain Validation
Nov  2 11:16:01 pog kernel: Attached scsi tape st0 at scsi8, channel 0, id 5, lun 0
Nov  2 11:16:01 pog kernel: st0: try direct i/o: yes (alignment 512 B), max page
reachable by HBA 1048575

Comment 51 Jeff Layton 2005-11-03 21:13:19 UTC

Created attachment 120703 [details]
updated patch with fewer parentheses

At the suggestion of Pete Z, cut down the use of parentheses.

Comment 54 Andreas Sheriff 2005-11-15 05:26:35 UTC

Any further info on a fix?

Comment 55 Jeff Layton 2005-11-15 11:24:53 UTC

The patch attached to this case has been tested by a couple of customers with
this issue, and seems to take care of it. It's currently on track for the next
update release, but that is (of course) subject to QA testing and further
review. So, no guarantee when the fix will make it into an official update.

If you have an official entitlement, I'd suggest opening a case through the
normal support channels and referencing this ticket. The more customers we have
reporting this issues, the more weight they're given.

Comment 56 Bruce Bigby 2005-11-20 16:41:33 UTC

Note: This bug exists in the FC4 2.6.14-1.1637_FC4smp kernel, too.  I have a
Symbios SCSI controller with one sole device on it -- an Iomega Zip 100 Drive. 
It worked fine under RedHat 9.

Comment 57 Andreas Sheriff 2005-11-20 18:34:49 UTC

I've applied said patch to the file mentioned in the beginning of the patch
(linux-2.6.9/drivers/scsi/sym53c8xx_2/sym_hipd.c  # I couldn't find the other
file: linux-2.6.9/drivers/scsi/sym53c8xx_2/sym_hipd.c.ppr-on-se), recompiled the
kernel, copied over the resultant .ko in the driver directory, and rebooted the
machine, but I still see SYM phase change error messages.  Am I doing something
wrong?
Please advise the proper way to incorporate this patch.

Thanks.

Comment 58 Trevor Cordes 2005-11-20 22:32:13 UTC

Comment #56: that appears to be the commonality... we both have ZIP drives.  As
per my comment #50 it appears the problem may be isolated to something weird the
ZIP is doing.  Perhaps if others who have this problem can list the devices they
have on their bus?  If you have a ZIP device, see if you can test this bug with
that device unplugged temporarily.

Comment 59 Andreas Sheriff 2005-11-21 19:53:15 UTC

I've applied the patch, recompiled the kernel, and copied only those files that
modprobe -v --show-depends reported that sym53c8xx depends, but I still get the
phase change error.  I've even changed the error message that spits out 'phase
change' to something else, but it's still saying phase change.  Obviously, the
messages are coming from somewhere else.

Comment 62 Stefan Skopnik 2005-12-20 23:35:46 UTC

I have the same problem using a Yamaha cdr 400 SCSI CDrom Writer, using kernel
version 2.6.13. The message:

kernel: sr 0:0:3:0: phase change 6-7 9@2f4b1ba0 resid=7.

Tested patches for 2.6.15 (Dissable IU and QAS negotiation) with no go

Comment 63 Stefan Skopnik 2005-12-20 23:39:40 UTC

Controler is Tekram 390

Comment 64 Trevor Cordes 2005-12-22 16:31:54 UTC

If you take the Yamaha off the bus (unplug the SCSI cable from it), do the phase
errors disappear.  Is there anything else on the bus?

Comment 65 Stefan Skopnik 2005-12-22 23:49:28 UTC

Well, I tested the actual driver from the 2.6.15-rcX: Simply took the
sym53c8xx_2 directory form the git tree (www.kernel.org) and compiled it into my
2.6.13 kernel (sorry I'm no kernel expert ;-), just tried it:

AND THE ERRORS ARE GONE!

I'm no SCSI expert either, but I think it's related to asyncronous devices. The
Yamaha is one. Think the the driver developer (Matthew Wilcox) has found the
problem... There are some comments about this in the logs.

Comment 66 Trevor Cordes 2005-12-25 00:37:28 UTC

Really cool!  So we may see this solved in FC5.

Comment 67 Jay Turner 2006-01-03 19:35:13 UTC

Changes have been committed to our latest beta kernel.  Please test with
2.6.9-27.EL from our public beta and confirm if the issue is resolved there. 
Thanks!

Comment 68 Jeff Layton 2006-02-07 17:25:14 UTC

I've had another customer report a performance issue relating to PPR
negotiation. Their problem is apparently due to a bug that has existed since the
GA release of RHEL4, but fixing it touches the same section of code as the patch
to correct this issue.

Can anyone who was suffering from the "phase change" problem reported above ,
and who has had positive results with the latest beta kernels please also test
the kernels at:

http://people.redhat.com/jlayton/BZ180366/

and post the results here. I'm particularly interested to see if the addition of
the patch to correct the performance issue causes any regression.

Comment 72 Trevor Cordes 2006-02-10 11:59:34 UTC

I would test, but I only have FC3, no RHEL.  I'm eagerly awaiting FC5 to see if
that fixes this bug and will report back.

Comment 77 ThG 2006-03-03 22:50:28 UTC

I have FC4 with the kernel 2.6.15-1.1831_FC4. On my SCSI controller (sym53c8xx)
I connected internal two HDs and three optical drives. 
lsscsi shows:
[0:0:0:0]    disk    IBM      DCAS-34330       S65A  /dev/sda
[0:0:1:0]    disk    IBM      DCAS-34330       S61A  /dev/sdb
[0:0:3:0]    cd/dvd  PLEXTOR  CD-ROM PX-40TS   1.04  /dev/scd0
[0:0:4:0]    cd/dvd  PLEXTOR  CD-R   PX-R820T  1.08  /dev/scd1
[0:0:5:0]    cd/dvd  PIONEER  DVD-ROM DVD-303  1.09  /dev/scd2
[0:0:6:0]    process HP       C5110A           3701  -
...etc.
In /var/log/messages I get a lot of phase change messages, only if I put a disc
in the corresponding drive e.g. for sr 0:0:3:0

Mar  3 15:30:26 duesentrieb kernel: sr 0:0:3:0: phase change 2-3 12@1ecc6b60
resid=2.
Mar  3 15:30:28 duesentrieb kernel: sr 0:0:3:0: phase change 2-3 12@1ecc6b60
resid=2.
Mar  3 15:30:30 duesentrieb kernel: sr 0:0:3:0: phase change 2-3 12@1f7faf60
resid=2.
Mar  3 15:30:38 duesentrieb last message repeated 4 times
... etc.

Reading CD's in all three drives is no problem, watching DVD's and burning CD's
is impossible. I observe exactly the same behavior if I connect an external SCSI
scanner on the controller. The scanner as well as the HDs are working properly.
ALL devices seem to be asynchronous: 

# less /var/log/messages|grep synchron

Mar  3 10:19:02 duesentrieb kernel:  target0:0:0: asynchronous.
Mar  3 10:19:03 duesentrieb kernel:  target0:0:1: asynchronous.
Mar  3 10:19:03 duesentrieb kernel:  target0:0:3: asynchronous.
Mar  3 10:19:03 duesentrieb kernel:  target0:0:4: asynchronous.
Mar  3 10:19:03 duesentrieb kernel:  target0:0:5: asynchronous.
Mar  3 23:47:36 duesentrieb kernel:  target0:0:6: asynchronous.

Comment 78 Red Hat Bugzilla 2006-03-07 18:31:58 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html

Comment 81 Su Seau Yeen 2008-03-07 01:25:37 UTC

HI, the solution in ERRATA doesn't work for my itanium that was running on 
2.6.9-55.EL #1 SMP. After updating my kernel to 2.6.9-67.0.4.EL #1 SMP, those
messages still appear every 15 seconds without fail. Is there any workabout? The
strange thing is that it happens only on one of my 5 servers that have
2.6.9-55.EL #1 SMP installed on them. The rest are fine