Bug 2171155 - No kernel video on 6.1.11-200.fc37 with mgag200
Summary: No kernel video on 6.1.11-200.fc37 with mgag200
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 37
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jocelyn Falempe
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-18 11:54 UTC by Jens Neu
Modified: 2023-06-20 14:26 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-20 14:26:45 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
kernel boot log (121.11 KB, text/plain)
2023-02-18 11:54 UTC, Jens Neu
no flags Details
Add Pll logs (1.28 KB, patch)
2023-04-28 12:01 UTC, Jocelyn Falempe
no flags Details | Diff
0001-mgag200-Add-PLL-logs-before-regression.patch (1.43 KB, patch)
2023-04-28 12:02 UTC, Jocelyn Falempe
no flags Details | Diff
0001-mgag200-Add-PLL-logs.patch (1.79 KB, patch)
2023-04-28 12:05 UTC, Jocelyn Falempe
no flags Details | Diff
mgag200-Add-PLL-logs-for-v6.1.10.patch (1.75 KB, patch)
2023-05-02 14:29 UTC, Jocelyn Falempe
no flags Details | Diff
mgag200-Add-register-init-debug-logs.patch (2.78 KB, patch)
2023-05-02 16:08 UTC, Jocelyn Falempe
no flags Details | Diff
drm-mgag200-Test-fix-release-BMC-before-enabling-dis.patch (1021 bytes, patch)
2023-05-04 19:48 UTC, Jocelyn Falempe
no flags Details | Diff
drm-mgag200-always-setup-gamma.patch (987 bytes, application/mbox)
2023-05-09 15:13 UTC, Jocelyn Falempe
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1902519 0 unspecified CLOSED Matrox MGA G200e Pilot ServerEngines - no kernel video in kernel 5.9 2023-02-18 11:54:54 UTC
Red Hat Issue Tracker FC-817 0 None None None 2023-04-27 12:33:30 UTC

Description Jens Neu 2023-02-18 11:54:55 UTC
Created attachment 1944935 [details]
kernel boot log

1. Please describe the problem:
Kernel video is lost right after luks passphrase. Machine is a Dell R510 with Matrox video:

06:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)

To me this looks like a reoccurence/related to bug 1902519 https://bugzilla.redhat.com/show_bug.cgi?id=1902519

Log message is:
Feb 17 17:27:49 server-3 kernel: mgag200 0000:06:03.0: vgaarb: deactivate vga console
Feb 17 17:27:49 server-3 kernel: Console: switching to colour dummy device 80x25


2. What is the Version-Release number of the kernel:
 6.1.11-200.fc37

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Works fine on 6.0.12-100.fc35.x86_64

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Yes, reproducible. Works fine on 6.0.12 from fc35 (see 3)

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
I have only one machine and it is productive, so can't try this atm.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
no

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Trevor Cordes 2023-04-14 21:49:15 UTC
Hi, I'm the submitter of bug 1902519 which you reference.  Strangely enough, I just ran into a similar bug on another box, with nvidia instead of matrox video.  The system does its grub / early kernel stuff then video goes off.  However, this box is graphical and if I wait then type my login info into the blank screen, then startx, then X starts perfectly fine and the video is fine.  It's just kernel console mode that's screwed up.

6.1.10 was fine, 6.2.9 is broken.  However I am using the nvidia binary driver so I'm reluctant to report a separate bug.

On the other box, we never did solve bug 1902519 and have just used the vga= kludge this whole time.  However, I'm glad to hear it sounds like you did have it working at least for a while.  Maybe if this bug gets fixed we can try normal video again.

Comment 2 Phil O 2023-04-24 19:02:40 UTC
I'm hitting this as well, and tracked it down to: 

commit 7e6739b9336e61fe23ca4e2c8d1fda8f19f979bf
Merge: a47e60729d96 65898687cf73
Author: Linus Torvalds <torvalds>
Date:   Wed Oct 5 11:24:12 2022 -0700

    Merge tag 'drm-next-2022-10-05' of git://anongit.freedesktop.org/drm/drm

If I checkout the prior kernel (a47e60729d9624e931f988709ab76e043e2ee8b9), video works fine.  Unfortunately my attempts at a git bisect have failed, as I think there are 2 bad commits in that merge.

Comment 3 Phil O 2023-04-24 23:57:46 UTC
I had to resort to starting at a47e60729d96 and cherry-picking the mgag200 patches included in 7e6739b9336e.  Once I got to 1baf9127c482, I lost video.  The sad part is that patch claims to not make any functional changes.

commit 1baf9127c482a3a58aef81d92ae751798e2db202
Author: Thomas Zimmermann <tzimmermann>
Date:   Thu Jul 28 14:40:56 2022 +0200

    drm/mgag200: Replace simple-KMS with regular atomic helpers
    
    Drop simple-KMS in favor of regular atomic helpers. Makes the code
    more modular and hence better to adapt to per-model requirements.
    
    The simple-KMS helpers provide few extra features, so the patch is
    mostly about open-coding what simple-KMS does. The simple-KMS helpers
    do mix up plane and CRTC state. Changing to regular atomic helpers
    requires to split some of the simple-pipe functions into per-plane
    and per-CRTC code
    
    No functional changes.

Comment 4 Jocelyn Falempe 2023-04-25 15:34:39 UTC
Thanks for opening the bugs, and doing the git bisect.

I don't see something obvious from the broken commit, so I will try to reproduce it, if I find the right machine.

Comment 5 Jens Neu 2023-04-26 19:33:52 UTC
Hi Jocelyn,
I'm in the process of moving my setup from the Dell R510 to a "new to me" Dell R720, so my R510 will get available for testing (timeframe: couple of weeks). I can let you know when the machine is free if you're interested.
regards
-Jens

Comment 6 Jocelyn Falempe 2023-04-27 10:42:27 UTC
Can you please provide additional details:

 * when you said "no video" is it on a VGA monitor plugged, or is it through the remote access (iDRAC, XClarity, ...).
 * Are you using BIOS or UEFI boot ?

I have a Dell T310 with MGA G200eW WPCM450 [102b:0532] (rev 0a) that is not affected.
I just tried on a remote system PowerEdge R640/0W23H8 with Integrated Matrox G200eW3 Graphics Controller [102b:0536] (rev 04) and I can't reproduce it.

So the known affected systems are:
Dell R510
Dell R815 (from dri-devel mailing list thread).

So I will try to get access to one of those.

Comment 7 Jens Neu 2023-04-27 11:59:13 UTC
(In reply to Jocelyn Falempe from comment #6)
>  * when you said "no video" is it on a VGA monitor plugged, or is it through
> the remote access (iDRAC, XClarity, ...).
vga monitor (and usb keyboard) plugged in. Giving it the vga=793 workaroud in grub works though. I don't know if the iDRAC video is also affected, since it works over the nasty jnlp java webstart which is effectively broken due to ancient security settings.

>  * Are you using BIOS or UEFI boot ?
BIOS boot

Comment 8 Phil O 2023-04-27 14:20:13 UTC
I also have HP DL360P G8s which are impacted.  They have MGA G200EH [102b:0533].  

Both iDRAC and iLO video are also impacted, so it is not only a physical monitor being hooked up (though that is also blank).  

I am also using BIOS boot.  

Is there any kind of additional debugging I can enable in the driver to give you any useful information?

Comment 9 Jocelyn Falempe 2023-04-27 14:41:41 UTC
Thanks for the replies.

On my side, I only tested with UEFI, so I will try again in BIOS mode.

Is it possible for you to also test with UEFI ?

Comment 10 Phil O 2023-04-27 23:28:25 UTC
I tested with UEFI on an HP G9, and it worked fine.  On the same system in BIOS mode, no video.

Comment 11 Jocelyn Falempe 2023-04-28 12:00:30 UTC
I tried on my T310 with BIOS mode, and it works good, so I can't reproduce it.

At this point, I'm unsure to be able to reproduce it, so I made two patches to check the PLL settings, which often are the cause of black screen.

0001-mgag200-Add-PLL-logs.patch must be applied on top of 1baf9127c482a3a58aef81d92ae751798e2db202 (bad commit)

0001-mgag200-Add-PLL-logs-before-regression.patch must be applied on top of 4f4dc37e374c957b2bbcd3b1f3dad73afeb647a5 (last good commit)

if you can run "dmesg | grep MGA" and report in both cases, I will see if the settings are different.

Comment 12 Jocelyn Falempe 2023-04-28 12:01:34 UTC
Created attachment 1960799 [details]
Add Pll logs

Comment 13 Jocelyn Falempe 2023-04-28 12:02:28 UTC
Created attachment 1960800 [details]
0001-mgag200-Add-PLL-logs-before-regression.patch

Comment 14 Jocelyn Falempe 2023-04-28 12:05:07 UTC
Created attachment 1960801 [details]
0001-mgag200-Add-PLL-logs.patch

sorry Add Pll logs attachment was missing one printk.

Comment 15 Phil O 2023-04-28 18:20:22 UTC
Results below.  

4f4dc37e374c957b2bbcd3b1f3dad73afeb647a5
[Fri Apr 28 07:38:37 2023] mgag200 0000:01:00.1: vgaarb: deactivate vga console
[Fri Apr 28 07:38:37 2023] Console: switching to colour dummy device 80x25
[Fri Apr 28 07:38:37 2023] [drm] Initialized mgag200 1.0.0 20110418 for 0000:01:00.1 on minor 0
[Fri Apr 28 07:38:37 2023] fbcon: mgag200drmfb (fb0) is primary device
[Fri Apr 28 07:38:37 2023] MGA New PLL settings for clock 121750
[Fri Apr 28 07:38:37 2023] MGA Update PLL settings 13 190 4 0
[Fri Apr 28 07:38:37 2023] MGA Resolution 1400 x 1050
[Fri Apr 28 07:38:37 2023] Console: switching to colour frame buffer device 175x65
[Fri Apr 28 07:38:37 2023] mgag200 0000:01:00.1: [drm] fb0: mgag200drmfb frame buffer device

1baf9127c482a3a58aef81d92ae751798e2db202
[Fri Apr 28 08:45:19 2023] mgag200 0000:01:00.1: vgaarb: deactivate vga console
[Fri Apr 28 08:45:19 2023] Console: switching to colour dummy device 80x25
[Fri Apr 28 08:45:19 2023] [drm] Initialized mgag200 1.0.0 20110418 for 0000:01:00.1 on minor 0
[Fri Apr 28 08:45:19 2023] fbcon: mgag200drmfb (fb0) is primary device
[Fri Apr 28 08:45:19 2023] MGA Resolution 1400 x 1050
[Fri Apr 28 08:45:19 2023] MGA New PLL settings for clock 121750
[Fri Apr 28 08:45:19 2023] MGA Update PLL settings 13 190 4 0
[Fri Apr 28 08:45:19 2023] Console: switching to colour frame buffer device 175x65
[Fri Apr 28 08:45:19 2023] mgag200 0000:01:00.1: [drm] fb0: mgag200drmfb frame buffer device

Comment 16 Phil O 2023-04-28 18:46:30 UTC
As I alluded to in my comment 2, there does seem to be 2 bugs at play here.  If I checkout 4f4dc37e374c957b2bbcd3b1f3dad73afeb647a5 directly, I get video, but the video has sync issues.  Screenshot here: https://snipboard.io/jrx7eG.jpg .  If I then checkout 1baf9127c482a3a58aef81d92ae751798e2db202, I don't even get that - I get no video.  That is what makes bisecting this so complicated, and why I had to just cherry-pick the MGA commits to find the one which caused the screen to go blank.  

I thought I would try to bisect the sync issue independently, by starting a bisect at 4f4dc37e374 (bad) and a47e60729d96 (good), but the complicated merges in 7e6739b9336e61f made that not a straightforward task, and I ended up with "Bisecting: a merge base must be tested" and gave up.

Comment 17 Jocelyn Falempe 2023-05-02 14:27:33 UTC
Thanks for the reply,
So the PLL settings are the same, and the resolution is still 1400x1050.

If you can reproduce the sync issue, it would be interesting to find the commit where it is introduced.
I will also add a patch to get the same PLL logs on v6.1.10, to make sure.

As PLL looks good and the culprit patch doesn't change the register values, I think it's maybe the order in which they are set which causes problem.
I will do a patch for this shortly.

Comment 18 Jocelyn Falempe 2023-05-02 14:29:22 UTC
Created attachment 1961693 [details]
mgag200-Add-PLL-logs-for-v6.1.10.patch

Comment 19 Jocelyn Falempe 2023-05-02 16:08:07 UTC
Created attachment 1961744 [details]
mgag200-Add-register-init-debug-logs.patch

This patch applies cleanly on top of

4f4dc37e374c957b2bbcd3b1f3dad73afeb647a5 and 1baf9127c482a3a58aef81d92ae751798e2db202

if you can boot your machine with the patch on top of both commits and provides the kernel logs, I hope it should look different this time.

Thanks again for your help.

Comment 20 Jocelyn Falempe 2023-05-04 19:48:25 UTC
Created attachment 1962349 [details]
drm-mgag200-Test-fix-release-BMC-before-enabling-dis.patch

In fact I run the debug patch on my machine and I got this:

4f4dc37e374c957b2bbcd3b1f3dad73afeb647a5 (Good)

[   12.023609] MGA Init registers
[   20.824251] MGA Set start addr
[   20.824269] MGA set offset 192
[   20.824273] MGA Hold BMC
[   21.125401] MGA set format regs
[   21.125406] MGA set mode regs 768 771 777 806
[   21.126598] MGA Release BMC
[   21.126707] MGA Enable display
[   21.173589] MGA Set start addr
[   21.173607] MGA set offset 192

1baf9127c482a3a58aef81d92ae751798e2db202 (Bad)

[   11.903070] MGA Init registers
[   20.731892] MGA Set start addr
[   20.731911] MGA set offset 192
[   20.731914] MGA Hold BMC
[   21.033038] MGA set format regs
[   21.033043] MGA set mode regs 768 771 777 806
[   21.034234] MGA Enable display
[   21.059510] MGA Release BMC
[   21.071270] MGA Set start addr
[   21.071288] MGA set offset 192

The difference is that before the BMC was released before enabling the display.
So I made this patch to revert to the previous order.

If you can apply this patch on top of 1baf9127c482a3a58aef81d92ae751798e2db202 and report if it fixes your issue ?

Thanks,

Comment 21 Phil O 2023-05-08 15:18:16 UTC
I applied attachment 1962349 [details] on top of 1baf9127c482 and it made no difference.

Comment 22 Phil O 2023-05-08 18:29:36 UTC
Below are the results of applying attachment 1961744 [details]

4f4dc37e374c:
[Mon May  8 11:07:40 2023] MGA Init registers
[Mon May  8 11:07:40 2023] MGA Set start addr
[Mon May  8 11:07:40 2023] MGA set offset 262
[Mon May  8 11:07:40 2023] MGA set format regs -922322432
[Mon May  8 11:07:40 2023] MGA set mode regs 1050 1053 1057 1089
[Mon May  8 11:07:40 2023] MGA Enable display
[Mon May  8 11:07:40 2023] MGA Set start addr
[Mon May  8 11:07:40 2023] MGA set offset 262
<prior two lines repeat endlessly>

1baf9127c482:
[Mon May  8 11:24:00 2023] MGA Init registers
[Mon May  8 11:24:00 2023] MGA Set start addr
[Mon May  8 11:24:00 2023] MGA set offset 262
[Mon May  8 11:24:00 2023] MGA set format regs 217538176
[Mon May  8 11:24:00 2023] MGA set mode regs 1050 1053 1057 1089
[Mon May  8 11:24:00 2023] MGA Enable display
[Mon May  8 11:24:00 2023] MGA Set start addr
[Mon May  8 11:24:00 2023] MGA set offset 262
<prior two lines repeat endlessly>

Comment 23 Phil O 2023-05-09 14:54:26 UTC
I did some further testing to figure out the sync issue I mentioned in comment 16.  If I checkout 4f4dc37e374c, then cherry-pick 5415bec18c69 ("drm/mgag200: Force 32 bpp on the console"), the sync issue goes away and video is perfect.  If I apply the same 5415bec18c69 on top of 1baf9127c482, I still have no video.  But this finding makes little sense to me, since the issue 5415bec18c69 claims to fix (73f54d5d9682) is present in 6.0, which works perfectly.

Comment 24 Jocelyn Falempe 2023-05-09 15:13:15 UTC
Created attachment 1963563 [details]
drm-mgag200-always-setup-gamma.patch

Thanks for testing.

I've made more investigations, and found that gamma settings may not be set after the culprit commit.

can you try the patch drm-mgag200-always-setup-gamma.patch on top of 1baf9127c482a3a58aef81d92ae751798e2db202 ?

Comment 25 Phil O 2023-05-09 16:08:49 UTC
Applied on top of 1baf9127c482, it goes back to having the sync issue (because 5415bec18c69 is missing).  Applied on top of v6.1 final, it works perfectly!  Thanks Jocelyn!

Comment 26 Jocelyn Falempe 2023-05-09 16:29:02 UTC
Thanks for confirming that this fix works, I will send it upstream soon.

Comment 27 Jocelyn Falempe 2023-06-20 14:23:01 UTC
The fix is now merged in v6.4-rc4, v6.3.5 and v6.1.31

So I think this bug can be closed.


Note You need to log in before you can comment on or make changes to this bug.