Bug 1724276

Summary: arm-smmu e0800000.smmu: Unexpected global fault, this could be serious
Product: [Fedora] Fedora Reporter: Paul Whalen <pwhalen>
Component: kernelAssignee: Mark Salter <msalter>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: airlied, bskeggs, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, msalter, steved
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-22 15:11:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418    
Attachments:
Description Flags
Seattle nonacpi boot
none
Seattle with acpi=force none

Description Paul Whalen 2019-06-26 16:03:58 UTC
Created attachment 1584845 [details]
Seattle nonacpi boot

1. Please describe the problem:

Repeated messages on Seattle Overdrive serial console with nonacpi:

[  292.023147] arm-smmu e0600000.smmu: Unexpected global fault, this could be serious
[  292.030736] arm-smmu e0600000.smmu: 	GFSR 0x00000001, GFSYNR0 0x00000000, GFSYNR1 0x00000003, GFSYNR2 0x00000000
[  292.040961] arm-smmu e0600000.smmu: 	GFSR 0x00000001, GFSYNR0 0x00000000, GFSYNR1 0x00000005, GFSYNR2 0x00000000
[  292.051190] arm-smmu e0600000.smmu: 	GFSR 0x00000001, GFSYNR0 0x00000000, GFSYNR1 0x00000007, GFSYNR2 0x00000000
[  292.061414] arm-smmu e0600000.smmu: 	GFSR 0x00000001, GFSYNR0 0x00000000, GFSYNR1 0x00000000, GFSYNR2 0x00000000


2. What is the Version-Release number of the kernel:

5.2.0-0.rcX

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

5.2.0-0.rc0.git6.1.fc31.aarch64

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Install 5.2 rcX kernel on Seattle, boot nonacpi. Booting with acpi=force there are no messages. 


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Yes. 


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Paul Whalen 2019-06-26 16:04:49 UTC
Created attachment 1584846 [details]
Seattle with acpi=force

Comment 2 Mark Salter 2019-06-27 17:32:22 UTC
commit 954a03be033c :

   iommu/arm-smmu: Break insecure users by disabling bypass by default
    
    If you're bisecting why your peripherals stopped working, it's
    probably this CL.  Specifically if you see this in your dmesg:
      Unexpected global fault, this could be serious
    ...then it's almost certainly this CL.

I'll look into this...

Comment 3 Mark Salter 2019-07-25 16:48:03 UTC
I wasn't able to reproduce this with the upstream edk2 firmware I had on my seattle. I finally found my dediprog (lost during a recent move) and was able to install the AMI firmware and reproduce the problem. So, the iommus are *not* running in bypass mode but it still triggers global faults. It appears to be related to legacy arm,smmu DT bindings. The upstream edk2 uses the newer bindings where the older AMI uses the legacy bindings. I'm still not sure if there's a problem in the firmware tables or in the arm-smmu driver. In any case, this can be worked around with "arm-smmu.disable_bypass=n".