|Summary:||sym driver creates voluminous /var/log/messages entries|
|Product:||Red Hat Enterprise Linux 4||Reporter:||Marc Williams <marcjw53>|
|Component:||kernel||Assignee:||Mike Christie <mchristi>|
|Status:||CLOSED ERRATA||QA Contact:|
|Version:||4.0||CC:||asheriff, coughlan, cugase2, davej, djuran, ekanter, galens, jturner, noc, sasha, seauyeen, smann, stefan.skopnik, steved, trevor, wtogami|
|Fixed In Version:||RHSA-2006-0132||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2006-03-07 18:31:58 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
|Bug Blocks:||168348, 168429|
Description Marc Williams 2004-11-18 21:35:10 UTC
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Description of problem: Booting to this new kernel this morning produced much and continuous /var/log/messages both during boot up and after. It wouldn't stop. So I rebboted to a previous kernel. Here's a sample of what it produced: Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2e37a784 resid=6. Nov 18 08:15:39 server1 kernel: sym0:1:0:phase change 6-7 11 2c972784 resid=6. Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2e336b84 resid=6. Nov 18 08:15:39 server1 kernel: sym0:2:0:phase change 6-7 11 2c972784 resid=6. Nov 18 08:15:39 server1 kernel: sym0:2:0:phase change 6-7 11 2c972784 resid=6. Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2c972784 resid=6. Nov 18 08:15:39 server1 kernel: sym0:1:0:phase change 6-7 11 2e336b84 resid=6. Nov 18 08:15:39 server1 kernel: sym0:2:0:phase change 6-7 11 2c972784 resid=6. Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2c972784 resid=6. Nov 18 08:15:39 server1 kernel: sym0:1:0:phase change 6-7 11 2e336b84 resid=6. Nov 18 08:15:39 server1 kernel: sym0:0:0:phase change 6-7 11 2c972784 resid=6. This went on for tens of thousands of lines. The log file was growing at a phenomenal rate and showed no signes of letting up. After about 30 minutes and a 10MB log file, I rebooted into the previous kernel. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-1.3_FC2 How reproducible: Always Steps to Reproduce: 1. reboot into the new kernel. 2. 3. Actual Results: the log file entries described above. Additional info: I did not try the non-smp kernel.
Comment 1 Marc Williams 2004-11-18 23:26:50 UTC
Here's something else that may (or may not) be usefull. It's a section of the log when these entries first started appearing after a reboot: Nov 18 07:37:42 server1 kernel: Total of 2 processors activated (1773.56 BogoMIPS). Nov 18 07:37:42 server1 kernel: ENABLING IO-APIC IRQs Nov 18 07:37:42 server1 kernel: ..TIMER: vector=0x31 pin1=2 pin2=0 Nov 18 07:37:42 server1 kernel: checking TSC synchronization across 2 CPUs: passed. Nov 18 07:37:42 server1 kernel: Brought up 2 CPUs Nov 18 07:37:42 server1 kernel: zapping low mappings. Nov 18 07:37:42 server1 kernel: checking if image is initramfs...it isn't (no cpio magic); looks like an initrd Nov 18 07:37:42 server1 kernel: Freeing initrd memory: 323k freed Nov 18 07:37:42 server1 kernel: NET: Registered protocol family 16 Nov 18 07:37:42 server1 kernel: PCI: PCI BIOS revision 2.10 entry at 0xfdaf0, last bus=0 Nov 18 07:37:42 server1 kernel: PCI: Using configuration type 1 Nov 18 07:37:42 server1 kernel: mtrr: v2.0 (20020519) Nov 18 07:37:42 server1 kernel: ACPI: Subsystemabf84 resid=6. Nov 18 07:37:42 server1 kernel: sym0:1:0:phase change 6-7 11@2f2abf84 resid=6. Nov 18 07:37:42 server1 kernel: sym0:2:0:phase change 6-7 11@2f2abf84 resid=6. Nov 18 07:37:42 server1 kernel: sym0:0:0:phase change 6-7 11@2f2abf84 resid=6. Nov 18 07:37:43 server1 kernel: sym0:2:0:phase change 6-7 11@2f398384 resid=6. Almost makes me wonder if it's not an ACPI thing...
Comment 2 Marc Williams 2004-11-19 14:41:07 UTC
I experimented with the command line a bit this morning. I tried acpi=on|off, noapic and various combinations. None made any difference. Fwiw, I run acpi=off with my other (working) kernels.
Comment 3 Marc Williams 2004-11-22 15:33:26 UTC
I just updated to the new kernel-smp-2.6.9-1.6_FC2 kernel and it exhibits the exact same problem behavior as kernel-smp-2.6.9-1.3_FC2 did. I've go a somewhat slimmed down copy of log/messages (19k gzipped) showing all the errors that I'll attempt to upload as an attachment.
Comment 4 Marc Williams 2004-11-22 15:35:48 UTC
Created attachment 107183 [details] /var/log/messages as described earlier messages file showing described errors for kernel-smp-2.6.9-1.6_FC2
Comment 5 Dave Jones 2004-11-25 06:38:24 UTC
*** Bug 140530 has been marked as a duplicate of this bug. ***
Comment 6 Marc Williams 2005-01-08 13:15:53 UTC
I just tried the newest kernel 22.214.171.124-11 and it too fails with the same results. In other words, the last realeased working kernel is still 2.6.8-1.521. Which means I am missing out on other bug and security fixes by not being able to upgrade. I have now had two other people contact me, including one Suse user, saying that they too have the same problem and wondering about solutions.
Comment 7 Marc Williams 2005-01-12 01:07:53 UTC
Just brought my system up to date including the latest 2.6.10 kernel and once again I have to report that the same bug still exists. It's still back to 2.6.8-1.521 for a working machine. How come there has been no progress with this?
Comment 8 Marc Williams 2005-01-16 14:33:14 UTC
Dang it! Once again I tried with the latest kernel that appeared in my up2date. This time it was 2.6.10-1.9 and it too prodecued the same results as all previous 4 kernels. So it's back to 2.6.8-1.521.
Comment 9 Marc Williams 2005-02-05 14:27:28 UTC
The 2.6.10-1.12 kernel that I just tried this morning still doesn't fix this.
Comment 10 Marc Williams 2005-02-06 01:29:46 UTC
Googling for "linux kernel phase change 2.6.9" returned oodles of hits about this very problem on several different mailing lists. It seems this problem was known about since at least 2.6.9-rc2 way back in Sept, 04. Weird how this has managed to stay unsolved this long. FYI
Comment 11 Steffen Mann 2005-02-18 08:13:08 UTC
*** Bug 140023 has been marked as a duplicate of this bug. ***
Comment 12 Marc Williams 2005-02-18 12:44:33 UTC
Rats. I just tried 2.6.10-1.14 the results of which are exactly the same, i.e. still not fixed.
Comment 13 Jim Hook 2005-03-05 13:04:44 UTC
I am a newbie to Linux, I also have a similar system log issue. This is my first posting to Bugzilla so I am not 100% sure of the procedure - I hope the information below is what is needed. The system log has line after line of messages similar to: Mar 5 04:38:34 localhost kernel: sym1:0:0:phase change 6-7 11@01e59f84 resid=2. The number: 11@01e59f84 changes. From the Gnome Log Viewer - RPM Packages log I show: kernel-2.6.10-1.760_FC3.i386.rpm and kernel-smp-2.6.10-1.760_FC3.i386.rpm I am using an older (circa 1997) Intergraph TDZ2000 GT1 computer with dual 400 mhz chips. I have two buddies who purchased similar machines (firesale - $10 each) - one uses Redhat9 and the other uses Gentoo. They do not have this issue. (As an aside, what is the best procedure for clearing this file (i.e. deleting it) from time to time?)
Comment 14 Marc Williams 2005-03-16 12:34:48 UTC
Trying yet another, latest kernel - 2.6.10-1.770_FC2 - still has the same problem. The latest kernel to not exhibit this issue is still 2.6.8-1.521 and that's what I am still reverting back to after trying each of the latest kernels. Quite discouraging thinking that I'll never get to upgrade the kernel on this very nice machine.
Comment 15 Sasha Borodin 2005-03-21 16:33:38 UTC
> Trying yet another, latest kernel - 2.6.10-1.770_FC2 - still has the same > problem. The latest kernel to not exhibit this issue is still 2.6.8-1.521 and > that's what I am still reverting back to after trying each of the latest kernels. Please excuse the novice nature of this comment... but should we report this on some kernel development list? Or has this already been done?
Comment 16 Dave Jones 2005-03-21 20:39:29 UTC
the folks at firstname.lastname@example.org will probably be interested, though I'm not sure if this has already been reported. It's worth trying the .11 test kernel (http://people.redhat.com/davej/kernels/Fedora/FC3/) first to be sure it hasn't already been fixed.
Comment 17 Sasha Borodin 2005-03-21 22:55:58 UTC
Is there any commonalities between those reporting the problem other than smp machines? Are any/all of you using software RAID for the root partition?
Comment 18 Marc Williams 2005-03-22 00:19:27 UTC
(In reply to comment #17) > Is there any commonalities between those reporting the problem other than smp machines? Are any/all > of you using software RAID for the root partition? Yes, I am running software raid. Here's some info in case it's helpful: [root@server1 marcw]# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/md3 4126912 2632512 1284764 68% / /dev/md0 100890 13305 82376 14% /boot none 387272 0 387272 0% /dev/shm /dev/md4 10056520 196912 9348760 3% /home /dev/md2 2071224 809996 1156012 42% /var /dev/hda1 115377640 18140984 91375744 17% /extra1 [root@server1 marcw]# mount /dev/md3 on / type ext3 (rw) none on /proc type proc (rw) none on /sys type sysfs (rw) none on /dev/pts type devpts (rw,gid=5,mode=620) usbfs on /proc/bus/usb type usbfs (rw) /dev/md0 on /boot type ext3 (rw) none on /dev/shm type tmpfs (rw) /dev/md4 on /home type ext3 (rw) /dev/md2 on /var type ext3 (rw) /dev/hda1 on /extra1 type ext3 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) [root@server1 marcw]# cat /proc/mdstat Personalities : [raid1] [raid5] md3 : active raid5 sdc2 sdb2 sda2 4192768 blocks level 5, 256k chunk, algorithm 0 [3/3] [UUU] md2 : active raid5 sdc3 sdb3 sda3 2104320 blocks level 5, 256k chunk, algorithm 0 [3/3] [UUU] md1 : active raid5 sdc5 sdb5 sda5 1043968 blocks level 5, 256k chunk, algorithm 0 [3/3] [UUU] md4 : active raid5 sdc6 sdb6 sda6 10216960 blocks level 5, 256k chunk, algorithm 0 [3/3] [UUU] md0 : active raid1 sdc1 sdb1 sda1 104192 blocks [3/3] [UUU] unused devices: <none>
Comment 19 Marc Williams 2005-03-22 00:28:37 UTC
(In reply to comment #16) > the folks at email@example.com will probably be interested, though I'm > not sure if this has already been reported. It's worth trying the .11 test kernel > (http://people.redhat.com/davej/kernels/Fedora/FC3/) first to be sure it hasn't > already been fixed. As much as I'd love to try the new FC3 kernel, I wrote up this bug for my FC2 machine. There's no way I can even try FC3 until this bug is squashed. At that point, I would love nothing better than to finally be able to migrate. Perhaps you could advise when .11 is made available for us FC2 types.
Comment 20 Sasha Borodin 2005-03-22 05:32:44 UTC
I oppoligize for being slightly off topic, but along the lines of DEALING with this bug: Marc, you mentioned that the latest "working" kernel is 2.6.8. Did you upgrade to this version via the usual channels (yum or up2date), or did you have to dig up an archived version of this older kernel and tweak anything to make it work with FDC (FC3?). Thanks, -Sasha
Comment 21 Marc Williams 2005-03-22 12:07:44 UTC
(In reply to comment #20) > Marc, you mentioned that the latest "working" kernel is 2.6.8. Did you upgrade to this version via the > usual channels (yum or up2date), or did you have to dig up an archived version of this older kernel and > tweak anything to make it work with FDC (FC3?). I don't know where you got the idea that I'm running FC3, unless from a typo. As is stated several times in this thread, mine is a fc2 machine. I never had to "dig up" 2.6.8 - it was current at one time. It was simply all subsequent kernels that failed.
Comment 22 Steffen Mann 2005-03-22 12:44:30 UTC
davej, was checking your kernel kernel-2.6.11-1.7_FC3.i686.rpm that gets rid of the phase changes in /var/log/messages, will need to verify later when I am at home if I can see speed inceases as well. I would assume so, as the nasty logging took time... Cheers, Steff
Comment 23 Marc Williams 2005-03-29 16:33:08 UTC
I just tried 2.6.10-1.771_FC2 announced yesterday with the same results as all the rest after 2.6.8-1.521. Since Steffan Mann indicated some promise in using 2.6.11 under FC3, I would still love to try this with FC2 whenever (if?) it becomes available for FC2.
Comment 24 Dave Jones 2005-04-16 05:10:50 UTC
Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you.
Comment 25 Eugene Kanter 2005-04-17 00:56:22 UTC
Dave, per your instructions I am reopening this bug, since it is 100% reporoduced in fresh new Enterprise Linux v4 install. kernel 2.6.9-5.ELsmp #1 SMP root filesystem is on a RAID1 volume, similar to (comment #18)
Comment 26 Eugene Kanter 2005-04-17 02:30:49 UTC
Installing latest FC3 kernel 2.6.11-1.14_FC3smp fixes the issue.
Comment 27 Stnet NOC 2005-04-17 07:51:14 UTC
(In reply to comment #26) > Installing latest FC3 kernel 2.6.11-1.14_FC3smp fixes the issue. Sorry, I can't agree. Just did it and the file /var/log/messages still reports hundreds of: Apr 17 09:39:39 ncc1701 kernel: sym0:5:0:phase change 6-7 9@36e4bb84 resid=7. However, performances seems not to be affected.
Comment 28 Eugene Kanter 2005-04-18 22:16:52 UTC
I tested two compaq DL1600 server with different internals. On an older machine original EL v4 kernel hdparm -t /dev/sdb (SEAGATE Model: SX150176LC) shows around 3 mb/sec with tons of messages, 2.6.11-1.14_FC3 shows over 20 mb/sec with no messages at all. On a newer machine there still are messages but much less. I'll move disk to it and test drive speed later. Original kernel speed was the same 3 mb/sec on this particular disk. What interesting is that old 9 Gig drive speed is over 10 mb/sec. There is definitely some problems with the driver/controller recognizing big disks. But controllers reported by lspci are the same on both.
Comment 29 Eugene Kanter 2005-04-19 13:58:21 UTC
(In reply to comment #28) Correction: On both systems kernel 2.6.11-1.14_FC3smp works without filling log with "phase change" messages.
Comment 30 Trevor Cordes 2005-04-19 22:26:20 UTC
I'm getting this exact same problem. My experience might help track down the bug. I was running FC3 with NO ERRORS up to and including version kernel-2.6.10-1.770_FC3. I just tried to boot into kernel-2.6.11-1.14_FC3 and this is the first time I've seen this bug. I get about 15 log entries a second in 2.6.11. If I switch back to 2.6.10 I have zero problems. I have an IBM OEM sym53c8xx card (with no BIOS, from an old RS/6000!) that runs alongside an Adaptec aic7xxx 29160 (with BIOS). I run 3 CD, 2 tape and 1 ZIP drives off the sym card. At this point I would say that 2.6.11 is unusable with the sym card and it DEFINITELY affects the latest FC3 kernel.
Comment 31 Trevor Cordes 2005-06-26 15:07:39 UTC
Just tested with kernel-2.6.11-1.35_FC3 and the problem persists. I'm still stuck having to run kernel-2.6.10-1.770_FC3 so as to not fill my disk with the errors. Is this problem still being looked into? It seems to have wide appeal. Anyone with this problem still, make some noise here so we know you're still there.
Comment 32 Marc Williams 2005-06-26 15:39:44 UTC
(In reply to comment #31) > Just tested with kernel-2.6.11-1.35_FC3 and the problem persists. I'm still > stuck having to run kernel-2.6.10-1.770_FC3 so as to not fill my disk with the > errors. > > Is this problem still being looked into? It seems to have wide appeal. Anyone > with this problem still, make some noise here so we know you're still there. > Hey at least you got into 2.6.10. Because everything newer causes errors, I'm stuck at 2.6.8-1.521 which means I can't move off of FC2.
Comment 33 Mike Christie 2005-06-28 00:29:44 UTC
I think this is fixed in kernel.org/vanilla 2.6.12. Could you guys try that kernel out and report back so we can see what we need to do next? Thanks and sorry for the troubles!
Comment 34 Trevor Cordes 2005-07-29 22:11:39 UTC
I just installed the latest FC3 kernel-2.6.12-1.1372_FC3 and this problem persists. I have no idea how to do the "vanilla" kernel -- I only know how to rpmbuild the FC one. Is the change mentioned supposed to be in the 1372 kernel? If not, will it be in the next one?
Comment 35 Mike Christie 2005-07-29 22:41:19 UTC
No nead to test vanilla. The fix that I thought would fix your problem went into vanilla 2.6.12 then as the kernel-2.6.12-1.1372_FC3 name suggests we base that kernel off of vanilla 2.6.12 so the fix was there. Thanks for testing and replying. I will coninue looking into this now that I know it does not help everyone.
Comment 36 Trevor Cordes 2005-08-02 02:59:11 UTC
I'm keeping my sym card in my system for the time being but now rmmod it after boot so I can run 2.6.12 without the bug biting me. Of course I can't access my SCSI devices on that bus, but they're not critical. As such, I can easily test the bug on each new FC3 kernel release and report back. If you think something else might fix it, let me know. I often recompile the FC kernel from source so I can also maybe try the odd patch.
Comment 46 Galen Seitz 2005-11-01 08:17:46 UTC
Is it possible that this bug is related to the one that is discussed below? http://marc.theaimsgroup.com/?t=110048286400004&r=1&w=2 http://comments.gmane.org/gmane.comp.freedesktop.hal/2456 https://bugs.freedesktop.org/show_bug.cgi?id=1852 The quick summary is that some tools perform PCI config reads to addresses that are really meant to be internal to the Symbios controller. My system has a Tekram DC-390U2W (SYM53C895) that gives a scsi parity error coincident with haldaemon starting. If I prevent haldaemon from starting, the parity error does not occur. The kernel is 2.6.9-22.0.1.EL.
Comment 47 Jeff Layton 2005-11-01 14:42:27 UTC
I don't think that problem is related to the one reported here. The phase change 6-7 errors seem to be due to the inappropriate use of PPR on the bus. The problem you describe seems to be unrelated to this one.
Comment 49 Trevor Cordes 2005-11-02 17:10:32 UTC
I agree, comment #46 links appear to be unrelated. Interesting though how they talk about sym problems in the other distros but don't mention anything about phase change errors. Perhaps this indicates the problem is RH/FC specific and not vanilla kernel? Just a guess.
Comment 50 Trevor Cordes 2005-11-02 17:23:42 UTC
Hey, I hadn't tested this bug on an updated kernel in a while. I just modprobe'd in my sym mod and watched the logs and there doesn't appear to be any phase change errors (yet). Hey, I think I'm onto something here. I started the module with my external SCSI case off. No phase errors. So with just my 3 internal SCSI CD drives and 1 SCSI tape drive, there are no errors. I rmmod'd the sym. I turned on my external SCSI box, which has another tape drive and a ZIP drive. I modprobed the sym back in and instantly I get phase errors. So the problem seems related only to when my SCSI bus is using the external devices. I just double-checked and my external box is properly connected by an actual plug-in terminator on the out port of the box. All the connections seem secure. The total SCSI bus length, according to proper SCSI-2 rules is under 6'. The logs shed some light. The phase error appears the instant the module starts poking at the crappy tape drive in my external box. The drive was working before these phase issues came into the kernel, but it is a piece of crap QIC or TR2 or something drive that I don't use anymore. See log output below. I hope I'm not reading the timing of the phase change wrong. In the output, the phase change shown is the very first one showing up. Perhaps the other people with this bug could list what devices appear to trigger this bug when they first load the module? I will see if I can remove that tape drive from my system sometime soon and re-test. As I mentioned way back in this bug, my setup was phase-error free in an older kernel. My SCSI setup hasn't changed at all. Nov 2 11:16:01 pog kernel: Vendor: IOMEGA Model: ZIP 100 Rev: E.08 Nov 2 11:16:01 pog kernel: Type: Direct-Access ANSI SCSI revision: 02 Nov 2 11:16:01 pog kernel: target8:0:4: Beginning Domain Validation Nov 2 11:16:01 pog kernel: 8:0:4:0: phase change 6-7 9@1a77a3a0 resid=7. Nov 2 11:16:01 pog last message repeated 3 times Nov 2 11:16:01 pog scsi.agent: cdrom at /devices/pci0000:00/0000:00:1e.0/0000:02:04.0/host8/target8:0:3/8:0:3:0 Nov 2 11:16:01 pog kernel: target8:0:4: Ending Domain Validation Nov 2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3a0 resid=7. Nov 2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3ac resid=7. Nov 2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3a0 resid=7. Nov 2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3ac resid=7. Nov 2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3a0 resid=7. Nov 2 11:16:01 pog kernel: sd 8:0:4:0: phase change 6-7 9@1a77a3ac resid=7. Nov 2 11:16:01 pog kernel: Attached scsi removable disk sdc at scsi8, channel 0, id 4, lun 0 Nov 2 11:16:01 pog kernel: Vendor: CONNER Model: CTT8000-S Rev: 1.17 Nov 2 11:16:01 pog kernel: Type: Sequential-Access ANSI SCSI revision: 02 Nov 2 11:16:01 pog kernel: target8:0:5: Beginning Domain Validation Nov 2 11:16:01 pog kernel: target8:0:5: asynchronous. Nov 2 11:16:01 pog kernel: target8:0:5: Domain Validation skipping write tests Nov 2 11:16:01 pog scsi.agent: disk at /devices/pci0000:00/0000:00:1e.0/0000:02:04.0/host8/target8:0:4/8:0:4:0 Nov 2 11:16:01 pog kernel: target8:0:5: FAST-5 SCSI 5.0 MB/s ST (200 ns, offset 8) Nov 2 11:16:01 pog kernel: target8:0:5: Ending Domain Validation Nov 2 11:16:01 pog kernel: Attached scsi tape st0 at scsi8, channel 0, id 5, lun 0 Nov 2 11:16:01 pog kernel: st0: try direct i/o: yes (alignment 512 B), max page reachable by HBA 1048575
Comment 51 Jeff Layton 2005-11-03 21:13:19 UTC
Created attachment 120703 [details] updated patch with fewer parentheses At the suggestion of Pete Z, cut down the use of parentheses.
Comment 54 Andreas Sheriff 2005-11-15 05:26:35 UTC
Any further info on a fix?
Comment 55 Jeff Layton 2005-11-15 11:24:53 UTC
The patch attached to this case has been tested by a couple of customers with this issue, and seems to take care of it. It's currently on track for the next update release, but that is (of course) subject to QA testing and further review. So, no guarantee when the fix will make it into an official update. If you have an official entitlement, I'd suggest opening a case through the normal support channels and referencing this ticket. The more customers we have reporting this issues, the more weight they're given.
Comment 56 Bruce Bigby 2005-11-20 16:41:33 UTC
Note: This bug exists in the FC4 2.6.14-1.1637_FC4smp kernel, too. I have a Symbios SCSI controller with one sole device on it -- an Iomega Zip 100 Drive. It worked fine under RedHat 9.
Comment 57 Andreas Sheriff 2005-11-20 18:34:49 UTC
I've applied said patch to the file mentioned in the beginning of the patch (linux-2.6.9/drivers/scsi/sym53c8xx_2/sym_hipd.c # I couldn't find the other file: linux-2.6.9/drivers/scsi/sym53c8xx_2/sym_hipd.c.ppr-on-se), recompiled the kernel, copied over the resultant .ko in the driver directory, and rebooted the machine, but I still see SYM phase change error messages. Am I doing something wrong? Please advise the proper way to incorporate this patch. Thanks.
Comment 58 Trevor Cordes 2005-11-20 22:32:13 UTC
Comment #56: that appears to be the commonality... we both have ZIP drives. As per my comment #50 it appears the problem may be isolated to something weird the ZIP is doing. Perhaps if others who have this problem can list the devices they have on their bus? If you have a ZIP device, see if you can test this bug with that device unplugged temporarily.
Comment 59 Andreas Sheriff 2005-11-21 19:53:15 UTC
I've applied the patch, recompiled the kernel, and copied only those files that modprobe -v --show-depends reported that sym53c8xx depends, but I still get the phase change error. I've even changed the error message that spits out 'phase change' to something else, but it's still saying phase change. Obviously, the messages are coming from somewhere else.
Comment 62 Stefan Skopnik 2005-12-20 23:35:46 UTC
I have the same problem using a Yamaha cdr 400 SCSI CDrom Writer, using kernel version 2.6.13. The message: kernel: sr 0:0:3:0: phase change 6-7 9@2f4b1ba0 resid=7. Tested patches for 2.6.15 (Dissable IU and QAS negotiation) with no go
Comment 63 Stefan Skopnik 2005-12-20 23:39:40 UTC
Controler is Tekram 390
Comment 64 Trevor Cordes 2005-12-22 16:31:54 UTC
If you take the Yamaha off the bus (unplug the SCSI cable from it), do the phase errors disappear. Is there anything else on the bus?
Comment 65 Stefan Skopnik 2005-12-22 23:49:28 UTC
Well, I tested the actual driver from the 2.6.15-rcX: Simply took the sym53c8xx_2 directory form the git tree (www.kernel.org) and compiled it into my 2.6.13 kernel (sorry I'm no kernel expert ;-), just tried it: AND THE ERRORS ARE GONE! I'm no SCSI expert either, but I think it's related to asyncronous devices. The Yamaha is one. Think the the driver developer (Matthew Wilcox) has found the problem... There are some comments about this in the logs.
Comment 66 Trevor Cordes 2005-12-25 00:37:28 UTC
Really cool! So we may see this solved in FC5.
Comment 67 Jay Turner 2006-01-03 19:35:13 UTC
Changes have been committed to our latest beta kernel. Please test with 2.6.9-27.EL from our public beta and confirm if the issue is resolved there. Thanks!
Comment 68 Jeff Layton 2006-02-07 17:25:14 UTC
I've had another customer report a performance issue relating to PPR negotiation. Their problem is apparently due to a bug that has existed since the GA release of RHEL4, but fixing it touches the same section of code as the patch to correct this issue. Can anyone who was suffering from the "phase change" problem reported above , and who has had positive results with the latest beta kernels please also test the kernels at: http://people.redhat.com/jlayton/BZ180366/ and post the results here. I'm particularly interested to see if the addition of the patch to correct the performance issue causes any regression.
Comment 72 Trevor Cordes 2006-02-10 11:59:34 UTC
I would test, but I only have FC3, no RHEL. I'm eagerly awaiting FC5 to see if that fixes this bug and will report back.
Comment 77 ThG 2006-03-03 22:50:28 UTC
I have FC4 with the kernel 2.6.15-1.1831_FC4. On my SCSI controller (sym53c8xx) I connected internal two HDs and three optical drives. lsscsi shows: [0:0:0:0] disk IBM DCAS-34330 S65A /dev/sda [0:0:1:0] disk IBM DCAS-34330 S61A /dev/sdb [0:0:3:0] cd/dvd PLEXTOR CD-ROM PX-40TS 1.04 /dev/scd0 [0:0:4:0] cd/dvd PLEXTOR CD-R PX-R820T 1.08 /dev/scd1 [0:0:5:0] cd/dvd PIONEER DVD-ROM DVD-303 1.09 /dev/scd2 [0:0:6:0] process HP C5110A 3701 - ...etc. In /var/log/messages I get a lot of phase change messages, only if I put a disc in the corresponding drive e.g. for sr 0:0:3:0 Mar 3 15:30:26 duesentrieb kernel: sr 0:0:3:0: phase change 2-3 12@1ecc6b60 resid=2. Mar 3 15:30:28 duesentrieb kernel: sr 0:0:3:0: phase change 2-3 12@1ecc6b60 resid=2. Mar 3 15:30:30 duesentrieb kernel: sr 0:0:3:0: phase change 2-3 12@1f7faf60 resid=2. Mar 3 15:30:38 duesentrieb last message repeated 4 times ... etc. Reading CD's in all three drives is no problem, watching DVD's and burning CD's is impossible. I observe exactly the same behavior if I connect an external SCSI scanner on the controller. The scanner as well as the HDs are working properly. ALL devices seem to be asynchronous: # less /var/log/messages|grep synchron Mar 3 10:19:02 duesentrieb kernel: target0:0:0: asynchronous. Mar 3 10:19:03 duesentrieb kernel: target0:0:1: asynchronous. Mar 3 10:19:03 duesentrieb kernel: target0:0:3: asynchronous. Mar 3 10:19:03 duesentrieb kernel: target0:0:4: asynchronous. Mar 3 10:19:03 duesentrieb kernel: target0:0:5: asynchronous. Mar 3 23:47:36 duesentrieb kernel: target0:0:6: asynchronous.
Comment 78 Red Hat Bugzilla 2006-03-07 18:31:58 UTC
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html
Comment 81 Su Seau Yeen 2008-03-07 01:25:37 UTC
HI, the solution in ERRATA doesn't work for my itanium that was running on 2.6.9-55.EL #1 SMP. After updating my kernel to 2.6.9-67.0.4.EL #1 SMP, those messages still appear every 15 seconds without fail. Is there any workabout? The strange thing is that it happens only on one of my 5 servers that have 2.6.9-55.EL #1 SMP installed on them. The rest are fine