Bug 743275 - system not usable - "Machine check in kernel mode."
system not usable - "Machine check in kernel mode."
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
16
powerpc Linux
high Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-04 09:12 EDT by Jiri Kastner
Modified: 2012-09-04 09:51 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-04 09:51:57 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
.config from my G4 machine for linux-3.2.1 kernel (63.34 KB, text/plain)
2012-02-19 13:11 EST, Paul Osmialowski
no flags Details

  None (edit)
Description Jiri Kastner 2011-10-04 09:12:07 EDT
Description of problem:
when tried 3.0 kernels from ppc.koji on apple powerbook g4, i'm getting messages for few minutes (see 'Actual results' section). fedora 12 kernel is last fedora usable, i tried also debian testing as it have 3.0 kernel and it works. so it is only problem of fedora kernel. i was able to launch fedora when used debian kernel and modules.

Version-Release number of selected component (if applicable):
3.0.0+ on ppc

How reproducible:
always

Steps to Reproduce:
1. burn ppc iso
2. boot powerbook from cdrom burned in step 1
3. check output
  
Actual results:
Machine check in kernel mode.
Caused by (from SRR1=149030): Transfer error ack signal
after 4 minutes machine completely freezes

Expected results:
kernel booted and system works as in fedora 12 or using debian kernel

Additional info:
https://picasaweb.google.com/111531644309523180100/Powerpc#5659623210999413394
Comment 1 Josh Boyer 2011-10-04 09:19:18 EDT
F16 is using the 3.1-rcX kernels at this point.  Can you recreate with those?
Comment 2 Jiri Kastner 2011-10-04 09:28:18 EDT
tested kernels:

kernel-3.0.0-1.fc16.ppc (DOES NOT WORKS)
kernel-3.1.0-0.rc8.git0.0.fc16.kh.ppc (DOES NOT WORKS)
and 
3.0.0-1-powerpc (copied from debian partition - WORKS)
Comment 3 benoît barthés 2011-10-26 20:45:24 EDT
I have precisely the same bug on my Powerbook G4(7455) DVI.

Caused by (from SRR1=149030): Transfer error ack signal
Comment 4 Josh Boyer 2011-11-02 13:03:45 EDT
Ben, any ideas?
Comment 5 Brig C. McCoy 2011-12-23 21:54:16 EST
Seeing the same problem with Beta and Release Netinst ISOs.

PowerBook G4
1GHz, 2GB, 60GB, Combo, BT

Runs Fedora 12, Yellow Dog 6.2, and Ubuntu 10.04 LTS fine.

Please let me know if there's anything I can do to help resolve this.
Comment 6 Benjamin Herrenschmidt 2011-12-23 23:20:49 EST
I've tested recent kernels on powerbooks without much problem so it must be something in fedora, either something configured in the kernel or in userspace.

The machine check described above usually happens when something tries to access non-existent hardware (ioremap+access of a non-existing MMIO address or PIO port). Could it be the keyboard stuff in userspace whacking /proc/ioports that was reported a while back on pseries ?

I'll try to dig here but it may take a few days.
Comment 7 Britton Dodd 2012-01-01 01:57:45 EST
Confirmed on DVD iso using the following:

Powerbook G4 1.67Ghz
1.25GB RAM


Worked with Ubuntu 11.04/PPC
Comment 8 Paul Osmialowski 2012-02-18 19:35:10 EST
Guys, I'm having this problem for about a year and so far had no time to deal with it.

Rationale:
- Powerbook G4 worked fine with linux 2.6.32, despite the fact that following messages could be seen in dmesg:

[    6.600384] Intel ISA PCIC probe: Machine check in kernel mode.
[    6.608879] Caused by (from SRR1=141030): Transfer error ack signal
[    6.617416] IN from bad port 3e1 at c0367fec
[    6.617431] Machine check in kernel mode.
[    6.625865] Caused by (from SRR1=141030): Transfer error ack signal
[    6.634339] IN from bad port 3e1 at c0367fec
[    6.634347] Machine check in kernel mode.
[    6.642834] Caused by (from SRR1=141030): Transfer error ack signal
[    6.651492] IN from bad port 3e1 at c0367fec
[    6.651499] Machine check in kernel mode.
[    6.660151] Caused by (from SRR1=141030): Transfer error ack signal
[    6.668947] IN from bad port 3e3 at c0367fec
[    6.668953] Machine check in kernel mode.
[    6.677672] Caused by (from SRR1=141030): Transfer error ack signal
[    6.686479] IN from bad port 3e3 at c0367fec
[    6.686486] not found.

it is sufficient to disable yenta PCMCIA socket driver in kernel configuration and you'll not see this message again. Unfortunatelu, I neeed PCMCIA slot on this particular machine

- after upgrade to linux-2.6.36 this P4 machine cannot be booted anymore: the console is flooded by these messages:

[    6.617431] Machine check in kernel mode.
[    6.625865] Caused by (from SRR1=141030): Transfer error ack signal
[    6.634339] IN from bad port 3e1 at c0367fec
(last three lines repeated millions of times)

- I've tried linux-3.2.1, still the same.
Comment 9 Benjamin Herrenschmidt 2012-02-18 22:15:16 EST
Ok, looks like I'll have to dig, though I won't have time today.

From what I can tell, various x86'isms seem to have made their way into
the PCMCIA code. Those messages are caused by some bit of driver trying
to access IO ports where no device respond. Typically hard-coded ISA IO
ports that don't exist on most non-x86 machines.

I thought we had eradicated most of that crap from the kernel long ago
but it looks like some of it managed to come back while I wasn't looking :-)
Comment 10 Benjamin Herrenschmidt 2012-02-18 22:15:39 EST
BTW. Please somebody attach (or email me) the relevant .config as well
Comment 11 Paul Osmialowski 2012-02-19 13:11:22 EST
Created attachment 564203 [details]
.config from my G4 machine for linux-3.2.1 kernel

This is my very custom kernel with no modules loading support, everything is put straight into kernel binary
Comment 12 Paul Osmialowski 2012-02-19 13:23:56 EST
I have some progress with this.

First, these "Transfer error ack signal" messages that are present in 2.6.39 preceded by "Intel ISA PCIC probe" message aren't from yenta socket. When I disabled yenta_socket on 3.2.1, system started properly (as expected), however, these messages were still shown - they're from i82365.c - different PCMCIA socket that I happen to enable in kernel config.
Anyway, I was curious what is shown before "Machine check in kernel mode"/"Transfer error ack signal", so I went to arch/powerpc/kernel/traps.c and erased both printk calls. To my surprise, linux-3.2.1 booted for a first time with yenta_socket enabled! Unfortunately, it was still flooded by "IN from bad port 7ffddd at c03a252c", so full dmesg didn't fit to memory buffer, however, i was able to catch this:

[   37.606975] IN from bad port 7ffff7 at c03a252c
[   37.606979] IN from bad port 7ffff8 at c03a252c
[   37.606983] IN from bad port 7ffff9 at c03a252c
[   37.606987] IN from bad port 7ffffa at c03a252c
[   37.606990] IN from bad port 7ffffb at c03a252c
[   37.606994] IN from bad port 7ffffc at c03a252c
[   37.606998] IN from bad port 7ffffd at c03a252c
[   37.607001] IN from bad port 7ffffe at c03a252c
[   37.607005] IN from bad port 7fffff at c03a252c
[   37.607008] 
[   37.608781] yenta_cardbus 0001:10:13.0: pcmcia: parent PCI bridge window: [mem 0xf3000000-0xf3ffffff]
[   37.611003] pcmcia_socket pcmcia_socket0: cs: memory probe 0xf3000000-0xf3ffffff: clean.
[   37.613388] yenta_cardbus 0001:10:13.0: pcmcia: parent PCI bridge window: [mem 0x80000000-0xafffffff]
[   37.616022] pcmcia_socket pcmcia_socket0: cs: memory probe 0x80000000-0xafffffff: excluding 0x80000000-0x807fffff 0x84000000-0x8bffffff 0xa0000000-0xa07fffff
[   37.619980] Intel ISA PCIC probe: 
[   37.620112] IN from bad port 3e1 at c03a662c
[   37.623228] IN from bad port 3e1 at c03a662c
[   37.623232] IN from bad port 3e1 at c03a662c
[   37.623236] IN from bad port 3e3 at c03a662c
[   37.623240] IN from bad port 3e3 at c03a662c
[   37.623243] not found.
[   37.626556] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   37.630055] ehci_hcd 0001:10:1b.2: enabling device (0004 -> 0006)
[   37.633643] ehci_hcd 0001:10:1b.2: EHCI Host Controller

Maybe this will help you.

BTW, PCMCIA card that I'm using (16-bit RS-232 real full-blown UART) works just fine with this kernel,

fish ~ # lspcmcia -v
Socket 0 Bridge:        [yenta_cardbus]         (bus ID: 0001:10:13.0)
        Configuration:  state: on       ready: yes
                        Voltage: 5.0V Vcc: 5.0V Vpp: 0.0V
Socket 0 Device 0:      [serial_cs]             (bus ID: 0.0)
        Configuration:  state: on
        Product Name:   ARGOSY RS-COM 1P 2V1
        Identification: function: 2 (serial)
                        prod_id(1): "ARGOSY" (0x78f308dc)
                        prod_id(2): "RS-COM 1P" (0x860de295)
                        prod_id(3): "2V1" (0x9ad0cb16)
                        prod_id(4): --- (---)
Comment 13 Paul Osmialowski 2012-02-19 14:31:30 EST
Disabling ISA bus completely and i82365 slot solves this problem. 16-bit PCMCIA card is still useable.
Comment 14 Dave Jones 2012-03-22 12:43:19 EDT
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Comment 15 Dave Jones 2012-03-22 12:47:50 EDT
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Comment 16 Dave Jones 2012-03-22 12:57:10 EDT
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Comment 17 benoît barthés 2012-04-11 17:26:22 EDT
the problem seems solved on fedora 17.

Note You need to log in before you can comment on or make changes to this bug.