Bug 1366842 - Xorg does not start in virtual guest when using video device virtio on kernel 4.8
Summary: Xorg does not start in virtual guest when using video device virtio on kernel...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 25
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-13 10:51 UTC by Joachim Frieben
Modified: 2016-10-16 12:04 UTC (History)
13 users (show)

Fixed In Version: kernel-4.8.1-1.fc25
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-10 17:41:50 UTC


Attachments (Terms of Use)
Xorg.0.log for video device "virtio" under Fedora 25 (6.38 KB, text/plain)
2016-08-13 10:51 UTC, Joachim Frieben
no flags Details
Xorg.0.log for video device "virtio" under Fedora 24 (28.89 KB, text/plain)
2016-08-15 15:27 UTC, Joachim Frieben
no flags Details
Updated file xserver-autobind-hotplug.patch (4.10 KB, patch)
2016-09-29 15:39 UTC, Joachim Frieben
no flags Details | Diff
XML configuration file of virtual machine (5.32 KB, application/xml)
2016-09-29 16:32 UTC, Joachim Frieben
no flags Details
Xorg.0.log for video device "virtio" under Fedora 25 (kernel-4.7.0-2.fc25) (23.49 KB, text/plain)
2016-09-29 17:16 UTC, Joachim Frieben
no flags Details
bisection log for c624c86..d52bd54 in drivers/gpu/drm/virtio/ (1.15 KB, text/plain)
2016-09-30 02:18 UTC, Laszlo Ersek
no flags Details
output of "lspci -v -v -v" (9.30 KB, text/plain)
2016-09-30 10:34 UTC, Laszlo Ersek
no flags Details
Patch for virtio_gpu kernel module solving the busid issue (3.43 KB, patch)
2016-10-04 09:34 UTC, Joachim Frieben
no flags Details | Diff
Xorg.0.log for video device "virtio" after applying patch from comment 31 (28.01 KB, text/plain)
2016-10-04 09:37 UTC, Joachim Frieben
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1256933 None CLOSED Radeon driver is unable to find card without BusID snippet of xorg.conf 2019-09-27 04:47:27 UTC

Internal Links: 1256933

Description Joachim Frieben 2016-08-13 10:51:48 UTC
Created attachment 1190638 [details]
Xorg.0.log for video device "virtio" under Fedora 25

Description of problem:
For current Fedora 25, Xorg is denied to set IOPL for I/O and aborts when launched:

X.Org X Server 1.18.4
Release Date: 2016-07-19
X Protocol Version 11, Revision 0
Build Operating System:  4.6.3-300.fc24.x86_64
Current Operating System: Linux noname 4.8.0-0.rc1.git3.1.fc25.x86_64 #1 SMP Thu
 Aug 11 04:08:28 UTC 2016 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-4.8.0-0.rc1.git3.1.fc25.x86_64 root=/de
v/mapper/noname-root ro rd.lvm.lv=noname/root rd.lvm.lv=noname/swap rhgb quiet L
ANG=en_US.UTF-8 enforcing=0 3
Build Date: 19 July 2016  06:00:51PM
Build ID: xorg-x11-server 1.18.4-1.fc25
Current version of pixman: 0.34.0
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/home/frieben/.local/share/xorg/Xorg.0.log", Time: Sat Aug 13 07
:28:27 2016
(==) Using config directory: "/etc/X11/xorg.conf.d"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
vesa: Ignoring device with a bound kernel driver
(EE) Fatal server error:
(EE) no screens found
(EE)
(EE) Please consult the Fedora Project supportat http://wiki.x.org for help.
(EE) Please also check the log file at "/home/frieben/.local/share/xorg/Xorg.0.log" for additional information.
(EE)
(EE) Server terminated with error (1). Closing log file.
xinit: giving up
xinit: unable to connect to X server: Connection refused
xinit: server error

Version-Release number of selected component (if applicable):
xorg-x11-server-1.18.4-1.fc25

How reproducible:
Always

Steps to Reproduce:
1. Boot Fedora 25 on virtual guest with video device "virtio".
2. Run command 'startx' at run level 3.

Actual results:
Xorg aborts because it is denied to set IOPL for I/O.

Expected results:
Xorg starts up successfully for video device "virtio".

Additional info:
Xorg starts up successfully for video device "qxl".

Comment 1 Hans de Goede 2016-08-15 06:51:36 UTC
Hi,

Not sure what is going on here, but the "xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)" error is normal,  since now a days we are running Xorg as a normal user instead of root, and the IOPL call is not necessary when using a kernel-modesetting driver.

Regards,

Hans

Comment 2 Joachim Frieben 2016-08-15 07:35:44 UTC
(In reply to Hans de Goede from comment #1)
Not that the attached Xorg.0.log file exhibits repeated error messages:
  "[   XXX.YYY] (EE) open /dev/fb0: Permission denied"

Comment 3 Hans de Goede 2016-08-15 09:49:26 UTC
(In reply to Joachim Frieben from comment #2)
> (In reply to Hans de Goede from comment #1)
> Not that the attached Xorg.0.log file exhibits repeated error messages:
>   "[   XXX.YYY] (EE) open /dev/fb0: Permission denied"

That is the X server trying to fall back to the fbdev driver, which means things go wrong earlier.

Looking closer at the log file, it seems that this is the real problem:

[   127.271] (EE) Screen 0 deleted because of no matching config section.

So it looks like you've some custom xorg.conf file lying around either under /etc/X11/xorg.conf.d or under 
/usr/share/X11/xorg.conf.d/ probably referring to the qxl driver ?

If you've a config file telling X to load the qxl driver, the modesetting driver will not work, and the virtio gfx device should use the modesetting driver.

Comment 4 Joachim Frieben 2016-08-15 15:27:14 UTC
Created attachment 1190931 [details]
Xorg.0.log for video device "virtio" under Fedora 24

(In reply to Hans de Goede from comment #3)
There is no custom xorg.conf or related file in any of the locations that you have mentioned, and I have not created any custom xorg.conf either. Interestingly, things work perfectly for under current Fedora 24 as shown by the attached log file Xorg.0.log despite the nearly identical set of X related packages.

Comment 5 Gerd Hoffmann 2016-09-05 12:42:00 UTC
F24:

[    69.601] (II) modeset(0): using drv /dev/dri/card0
[ ... ]
[    69.625] (II) modeset(0): Creating default Display subsection in Screen section
	"Default Screen Section" for depth/fbbpp 24/32
[    69.625] (==) modeset(0): Depth 24, (==) framebuffer bpp 32
[    69.625] (==) modeset(0): RGB weight 888
[    69.625] (==) modeset(0): Default visual is TrueColor

[ ... goes on setup the hardware ... ]

F25:

[   127.261] (II) modeset(G0): using drv /dev/dri/card0

[ ... no more (error) messages from the modeset driver ... ]

Hmm.

Possibly related: bug 1256933

Comment 6 Laszlo Ersek 2016-09-29 09:14:06 UTC
The patch fixing bug 1256933 has been committed to upstream:

https://cgit.freedesktop.org/xorg/xserver/commit/?id=ca8d88e50310a0d440a127c22a0a383cc149f408

Retesting this bug (i.e., bug 1366842) against upstream Xorg/Xserver, or with the patch in question backported to F25, might make sense.

Comment 7 Joachim Frieben 2016-09-29 15:39:40 UTC
Created attachment 1205979 [details]
Updated file xserver-autobind-hotplug.patch

After adding the changes applied in https://cgit.freedesktop.org/xorg/xserver/commit/?id=ca8d88e50310a0d440a127c22a0a383cc149f408 to patch xserver-autobind-hotplug.patch of package xorg-x11-server-1.18.4-6.fc25, the Xorg.0.log file of the rebuilt X server shows the same failure as before (I have verified that xf86platformBus.c in the build tree has actually been patched correctly, and Xorg.0.log indeed reports the correct build date (29 September 2016) and revision (xorg-x11-server 1.18.4-6.fc25).).

Comment 8 Laszlo Ersek 2016-09-29 16:15:37 UTC
Thanks for checking Joachim. Can you please confirm whether the X server continues to refer to the card as "G0" in the log, rather than just "0"? (Please search the log file for the pattern "using drv /dev/dri/card0".)

Also, can you please attach your complete libvirt domain XML or QEMU command line? Thanks.

Comment 9 Joachim Frieben 2016-09-29 16:26:10 UTC
Xorg.0.log-20160813: [   127.261] (II) modeset(G0): using drv /dev/dri/card0
Xorg.0.log-20160929: [    88.280] (II) modeset(G0): using drv /dev/dri/card0

Comment 10 Joachim Frieben 2016-09-29 16:32:45 UTC
Created attachment 1205985 [details]
XML configuration file of virtual machine

Comment 11 Laszlo Ersek 2016-09-29 16:38:46 UTC
Thanks. I'll try to look at this in the next week(s).

Comment 12 Laszlo Ersek 2016-09-29 16:54:01 UTC
For now, I diffed the attachments from comment 4 and comment 0 (that is, the X.org log under F24 and F25). The version number is practically identical:

  Build ID: xorg-x11-server 1.18.4-1.fc24

vs.

  Build ID: xorg-x11-server 1.18.4-1.fc25

what differs is the underlying kernel:

  Current Operating System: Linux noname 4.6.5-300.fc24.x86_64 #1 SMP
  Thu Jul 28 01:10:12 UTC 2016 x86_64

versus

  Current Operating System: Linux noname 4.8.0-0.rc1.git3.1.fc25.x86_64 #1 SMP
  Thu Aug 11 04:08:28 UTC 2016 x86_64

Also, there's an interesting message only present in the F25 log:

Joachim, sorry for asking you to jump through hoops; is there any chance you can boot your F24 guest (where X.org used to work) with a 4.8-based kernel?

Hm... I can't seem to find such a build in Koji. Okay, let's look the other way: is there a 4.6-based kernel for F25? Apparently there is:

http://koji.fedoraproject.org/koji/buildinfo?buildID=765346

My point is that the difference in X.org's behavior could be rooted in the different underlying kernel. For example, X.org relies on the kernel to inquire whether a graphics card counts as "boot VGA" or not. I guess it's worth a shot. Thanks.

Comment 13 Laszlo Ersek 2016-09-29 16:54:50 UTC
(In reply to Laszlo Ersek from comment #12)

> Also, there's an interesting message only present in the F25 log:

I forgot to add "vesa: Ignoring device with a bound kernel driver" here, but it's irrelevant anyway, for the reason stated by Hans in comment 3.

Comment 14 Joachim Frieben 2016-09-29 17:16:17 UTC
Created attachment 1205989 [details]
Xorg.0.log for video device "virtio" under Fedora 25 (kernel-4.7.0-2.fc25)

Xorg starts up successfully for kernel-4.7.0-2.fc25 available at http://koji.fedoraproject.org/koji/buildinfo?buildID=785559.

Comment 15 Laszlo Ersek 2016-09-29 17:24:11 UTC
Awesome, Joachim, kudos! So this is a kernel regression in 4.8.

Comment 16 Laszlo Ersek 2016-09-29 17:32:13 UTC
$ git log --oneline --reverse --no-merges v4.7..v4.8-rc1 -- \
      drivers/gpu/drm/virtio

d1e372c4fbdf virtio-gpu: fix output lookup
bd884b74bb29 drm/virtio: Use lockless gem BO free callback
a288c1ea8e77 drm/virtio: use drm_crtc_send_vblank_event()
d3767d49f16b virtio-gpu: fix output lookup
e7cf0963f816 virtio-gpu: add atomic_commit function
bbbed8884f8e virtio-gpu: switch to atomic cursor interfaces
86f752d2cc6b virtio-gpu: pick up hotspot from framebuffer
0062795e3069 virtio-gpu: use src not crtc
5e84c2690b80 drm/atomic-helper: Massage swap_state signature somewhat
a50dcc500170 drm: virtgpu: Rely on the default ->best_encoder() behavior
0d841ac0ec21 drm/virtio: Don't reinvent a flipping wheel
a325725633c2 drm: Lobotomize set_busid nonsense for !pci drivers
88932a7be27d drm/ttm: add wait for idle in all drivers bo_move functions
5345a5ab513b drm/virtgpu: Delete unnecessary checks before
             drm_gem_object_unreference_unlocked()
0b6320dfdfea drm/virtio: make fbdev support really optional
a9853117d841 drm/virtio: Fix non static symbol warning

Nothing stands out to me -- hopefully something will stand out to the kernel maintainers.

Alternatively, you could bisect the upstream kernel (in the guest) between v4.7 and v4.8-rc1. Kernel bisection is usually quite painful though...

Comment 17 Hans de Goede 2016-09-29 17:43:19 UTC
(In reply to Laszlo Ersek from comment #16)
> $ git log --oneline --reverse --no-merges v4.7..v4.8-rc1 -- \
>       drivers/gpu/drm/virtio
> 
> d1e372c4fbdf virtio-gpu: fix output lookup
> bd884b74bb29 drm/virtio: Use lockless gem BO free callback
> a288c1ea8e77 drm/virtio: use drm_crtc_send_vblank_event()
> d3767d49f16b virtio-gpu: fix output lookup
> e7cf0963f816 virtio-gpu: add atomic_commit function
> bbbed8884f8e virtio-gpu: switch to atomic cursor interfaces
> 86f752d2cc6b virtio-gpu: pick up hotspot from framebuffer
> 0062795e3069 virtio-gpu: use src not crtc
> 5e84c2690b80 drm/atomic-helper: Massage swap_state signature somewhat
> a50dcc500170 drm: virtgpu: Rely on the default ->best_encoder() behavior
> 0d841ac0ec21 drm/virtio: Don't reinvent a flipping wheel


> a325725633c2 drm: Lobotomize set_busid nonsense for !pci drivers

This ^^^ one may be related, note not sure at all, but you could try building a kernel with it reverted, assuming it reverts cleanly ...


> 88932a7be27d drm/ttm: add wait for idle in all drivers bo_move functions
> 5345a5ab513b drm/virtgpu: Delete unnecessary checks before
>              drm_gem_object_unreference_unlocked()
> 0b6320dfdfea drm/virtio: make fbdev support really optional
> a9853117d841 drm/virtio: Fix non static symbol warning

Comment 18 Joachim Frieben 2016-09-29 18:52:09 UTC
Comparing Fedora 25 kernels alone reveals:
kernel-4.8.0-0.rc0.git3.1.fc25 (v4.7-6438-gc624c86): SUCCESS
kernel-4.8.0-0.rc1.git0.1.fc25 (v4.8-rc1): FAILURE

However, the first Fedora 26 kernel succeeding version 4.8.0-0.rc0.git3.1 results in:
kernel-4.8.0-0.rc0.git5.1.fc26 (v4.7-11470-gd52bd54): FAILURE

This suggests that this issue occurred between kernel versions 4.8.0-0.rc0.git3.1 (v4.7-6438-gc624c86) and 4.8.0-0.rc0.git5.1 (v4.7-11470-gd52bd54).

Comment 19 Laszlo Ersek 2016-09-29 19:00:42 UTC
Thanks for checking. Unfortunately this recent info doesn't narrow it down any better: all 16 patches listed in comment 16 are in the c624c86..d52bd54 range that you identified.

Comment 20 Laszlo Ersek 2016-09-30 02:18:40 UTC
Created attachment 1206097 [details]
bisection log for c624c86..d52bd54 in drivers/gpu/drm/virtio/

Good eye, Hans:

> a325725633c26aa66ab940f762a6b0778edf76c0 is the first bad commit
> commit a325725633c26aa66ab940f762a6b0778edf76c0
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date:   Tue Jun 21 14:08:33 2016 +0200
>
>     drm: Lobotomize set_busid nonsense for !pci drivers
>
>     We already have a fallback in place to fill out the unique from
>     dev->unique, which is set to something reasonable in drm_dev_alloc.
>
>     Which means we only need to have a special set_busid for pci devices,
>     to be able to care the backwards compat code for drm 1.1 around, which
>     libdrm still needs.
>
>     While developing and testing this patch things blew up in really
>     interesting ways, and the code is rather confusing in naming things
>     between the kernel code, ioctl #defines and libdrm. For the next brave
>     dragon slayer, document all this madness properly in the userspace
>     interface section of gpu.tmpl.
>
>     v2: Make drm_dev_set_unique static and update kerneldoc.
>
>     v3: Entire rewrite, plus document what's going on for posterity in the
>     gpu docbook uapi section.
>
>     v4: Drop accidental amdgpu hunk (Emil).
>
>     v5: Drop accidental omapdrm vblank counter change (Emil).
>
>     v6: Rebase on top of the sphinx conversion.
>
>     Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
>     Cc: Emil Velikov <emil.l.velikov@gmail.com>
>     Tested-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> (virt_gpu)
>     Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>     Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Bisection log attached.

Comment 21 Laszlo Ersek 2016-09-30 02:23:05 UTC
Strangely enough, the Tested-by: line on the patch even names "virt_gpu", which is what breaks here. I guess the testing wasn't done with X.org.

Comment 22 Laszlo Ersek 2016-09-30 02:55:38 UTC
I performed the bisection in a Fedora 24 guest, with X.org "Build ID: xorg-x11-server 1.18.4-4.fc24".

Comment 23 Laszlo Ersek 2016-09-30 03:07:19 UTC
Current HEAD for upstream Linux is

  53061afee43b Merge branch 'akpm' (patches from Andrew)

and there are no commits from d52bd54 (== the known-broken range-end of the bisection) until 53061afee43b that would affect the "drivers/gpu/drm/virtio" subdirectory:

$ git log --oneline --reverse --no-merges d52bd54..master -- \
      drivers/gpu/drm/virtio/

so this regression has not been fixed meanwhile.

Comment 24 Laszlo Ersek 2016-09-30 03:16:19 UTC
Reported this issue on dri-devel:

Subject: Regression: drm: Lobotomize set_busid nonsense for !pci drivers (a325725633c2)
Message-ID: <503b3377-5e61-75c7-1d33-a44c89028a79@redhat.com>
Date: Fri, 30 Sep 2016 05:09:18 +0200
In-Reply-To: <1466510913-17958-1-git-send-email-daniel.vetter@ffwll.ch>

I'd rather put a mailing list archive link here, but <https://lists.freedesktop.org/archives/dri-devel/2016-September/thread.html> doesn't seem to have picked up my message yet (or maybe dri-devel doesn't permit posting without a subscription).

Comment 25 Hans de Goede 2016-09-30 08:32:30 UTC
Hi Laszlo and Joachim,

Lets continue discussing this bug on the list / mail-thread Laszlo started and add a comment with a summary of that here when the discussion is done.

I've just replied to the mail-thread.

Regards,

Hans

Comment 26 Laszlo Ersek 2016-09-30 10:34:38 UTC
Created attachment 1206202 [details]
output of "lspci -v -v -v"

Hans asked for lspci outputs on both working and regressed guest kernels. I saved both outputs, and they are identical. I'm uploading it.

Comment 27 Hans de Goede 2016-09-30 10:39:07 UTC
(In reply to Laszlo Ersek from comment #26)
> Created attachment 1206202 [details]
> output of "lspci -v -v -v"
> 
> Hans asked for lspci outputs on both working and regressed guest kernels. I
> saved both outputs, and they are identical. I'm uploading it.

They are identical ? Hmm, not what I expected. But the kernel change may be causing some changes in sysfs which lspci does not see, while the xserver depends on them.

Anyways please give the xserver patches I just posted in the mailinglist thread a try.

Comment 28 Gerd Hoffmann 2016-09-30 11:24:29 UTC
> They are identical ? Hmm, not what I expected. But the kernel change may be
> causing some changes in sysfs which lspci does not see, while the xserver
> depends on them.

busid probably, that was what the patch removes from the virtio driver.

The virtio device hangs on a virtio bus which is connected a virtio-pci
device (in the virtio-vga or virtio-gpu-pci case discussed here, it could
also be hooked up via virtio-mmio).

So, when the virtio-gpu driver figures the parent is a pci device it used to call drm_pci_set_busid(), so it gets a pci-style busid.  The patch zapped that logic.

Comment 29 Laszlo Ersek 2016-10-03 21:06:53 UTC
Posted the upstream patch:
http://www.spinics.net/lists/stable/msg146434.html

(the link points to the Cc:stable instance of the message; the dri-devel list software seems to be severely delayed and it doesn't show the "main" copy of the message yet)

Comment 30 Hans de Goede 2016-10-04 09:07:51 UTC
Hi,

(In reply to Laszlo Ersek from comment #29)
> Posted the upstream patch:
> http://www.spinics.net/lists/stable/msg146434.html
> 
> (the link points to the Cc:stable instance of the message; the dri-devel
> list software seems to be severely delayed and it doesn't show the "main"
> copy of the message yet)

Thank you for all your work on this. Fedora kernel team can you please cherry-pick the linked patch into the Fedora kernel 4.8.x pkgs for now? It has been accepted into drm-next, so should hit a stable 4.8.x release eventually.

Regards,

Hans

Comment 31 Joachim Frieben 2016-10-04 09:34:29 UTC
Created attachment 1207119 [details]
Patch for virtio_gpu kernel module solving the busid issue

The patch applies to the linux-4.8.0-0.rc8.git0.1.fc25 build tree and actually solves the busid issue.

Comment 32 Joachim Frieben 2016-10-04 09:37:10 UTC
Created attachment 1207120 [details]
Xorg.0.log for video device "virtio" after applying patch from comment 31

Comment 33 Laszlo Ersek 2016-10-04 11:21:41 UTC
(In reply to Hans de Goede from comment #30)

> Fedora kernel team can you please cherry-pick the linked patch into the
> Fedora kernel 4.8.x pkgs for now? It has been accepted into drm-next, so
> should hit a stable 4.8.x release eventually.

Seconded; I just tried installing an aarch64 Rawhide nightly compose, in a
guest:

  Fedora-Server-dvd-aarch64-Rawhide-20161003.n.1.iso

X fails to start, with the following messages:

> X.Org X Server 1.18.99.901 (1.19.0 RC 1)
> Release Date: 2016-09-19
> [    68.139] X Protocol Version 11, Revision 0
> [    68.139] Build Operating System:  4.7.2-201.fc24.aarch64
> [    68.139] Current Operating System: Linux localhost.localdomain
>              4.8.0-0.rc8.git3.1.fc26.aarch64 #1 SMP Fri Sep 30 22:18:23
>              UTC 2016 aarch64
> [    68.140] Kernel command line: BOOT_IMAGE=/images/pxeboot/vmlinuz
>              inst.stage2=hd:LABEL=Fedora-S-dvd-aarch64-rawh rd.live.check
> [    68.140] Build Date: 29 September 2016  06:26:36PM
> [    68.140] Build ID: xorg-x11-server 1.19.0-0.1.20160929.fc26
> ...
> [    68.168] (--) PCI:*(0:4:0:0) 1af4:1050:1af4:1100 rev 1, Mem @
>              0x10600000/409 6, 0x8001000000/8388608, BIOS @
>              0x????????/65536
> [    68.232] (EE) No devices detected.
> [    68.232] (EE)
> Fatal server error:
> [    68.232] (EE) no screens found(EE)

Note "Current Operating System": 4.8.0-0.rc8.git3.1.fc26.aarch64.

So, in order to get graphical installation to work in aarch64 guests, not
only bug 1256933 should be fixed in Xorg (already done) but the kernel too
needs the patch from comment 31. (Thanks Joachim for copying the patch from
drm-next to an attachment here; I checked the commit hash and it's the one
that Dave pushed.)

Thanks!

Comment 34 Laszlo Ersek 2016-10-04 11:22:53 UTC
CC Marcin re comment 33.

Comment 35 Josh Boyer 2016-10-04 18:24:29 UTC
(In reply to Laszlo Ersek from comment #33)
> (In reply to Hans de Goede from comment #30)
> 
> > Fedora kernel team can you please cherry-pick the linked patch into the
> > Fedora kernel 4.8.x pkgs for now? It has been accepted into drm-next, so
> > should hit a stable 4.8.x release eventually.
> 
> Seconded; I just tried installing an aarch64 Rawhide nightly compose, in a

Added in F25.  Rawhide should pick it up shortly.  If it doesn't land in the merge window we can pull it in at 4.9-rc1.

Comment 36 Laszlo Ersek 2016-10-06 16:40:01 UTC
(In reply to Josh Boyer from comment #35)

> Added in F25.  Rawhide should pick it up shortly.  If it doesn't land in
> the merge window we can pull it in at 4.9-rc1.

Thanks!

Unfortunately, there don't seem to be nightly builds for F25 (any longer?),
and I'd have to test this change with the installer ISO.

Regarding Rawhide, I just tried

  Fedora-Server-dvd-aarch64-Rawhide-20161006.n.0.iso

Apparently, since comment 33 (i.e.,
Fedora-Server-dvd-aarch64-Rawhide-20161003.n.1.iso) Rawhide has moved to
"4.9.0-0.rc0.git2.1" (v4.8-2283-ga3443cd); I guess due to the release of the
upstream 4.8 kernel. The problem is that the new kernel promptly crashes for
me in my (identical config) VM when I boot the ISO:

> ------------[ cut here ]------------
> kernel BUG at lib/ioremap.c:64!
> Internal error: Oops - BUG: 0 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.9.0-0.rc0.git2.1.fc26.aarch64 #1
> Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> task: fffffe00e0061400 task.stack: fffffe00e8070000
> PC is at ioremap_page_range+0x168/0x2f8
> LR is at pci_remap_iospace+0x68/0x88
> pc : [<fffffc0008487c48>] lr : [<fffffc00084fc6e0>] pstate: 80000145
> sp : fffffe00e8073a80
> x29: fffffe00e8073a80 x28: 00000200c01f0000
> x27: 0140000000000800 x26: fffffdff7ee10000
> x25: 0400000000000001 x24: 00e8000000000707
> x23: fffffc0009ff7fd8 x22: fffffdff7ee10000
> x21: fffffdff7ee00000 x20: fffffdff7ee0ffff
> x19: 000000003eff0000 x18: 0000000000000010
> x17: 0000000000000000 x16: 0000000000000000
> x15: 0000000000000006 x14: fffffc0089cc42f7
> x13: fffffc0009cc4305 x12: 0000000000000237
> x11: 0000000000000006 x10: 0000000000000238
> x9 : 0000000000000001 x8 : fffffe00e8070000
> x7 : fffffc0009002e68 x6 : 00000200c01f0000
> x5 : fffffe000104f700 x4 : 000000000000ffff
> x3 : fffffc0008ee5eb0 x2 : 0000000040000000
> x1 : 0000000041040000 x0 : 00e800003eff0707
>
> Process swapper/0 (pid: 1, stack limit = 0xfffffe00e8070020)
> Stack: (0xfffffe00e8073a80 to 0xfffffe00e8074000)
> 3a80: fffffe00e8073b00 fffffc00084fc6e0 fffffe00f4872a00 000000003eff0000
> 3aa0: 0000000000000000 fffffc0008f8d908 fffffe00f4870d00 fffffc0008c251c8
> 3ac0: fffffe00e8073b90 0000000000000000 fffffe00f4872a00 0000000000000000
> 3ae0: fffffe00e8073b20 fffffc000851f60c fffffdff7e801384 fffffe00ec1a3810
> 3b00: fffffe00e8073b20 fffffc000851f6e4 fffffe00f487c680 fffffe00ec1a3810
> 3b20: fffffe00e8073be0 fffffc000851f7e4 fffffe00ec1a3800 fffffe00ec1a3810
> 3b40: fffffc0008f8dab8 fffffc0009f88000 0000000000000000 0000000000000000
> 3b60: fffffc0008d063c0 fffffc0008d86398 fffffc0008e520e0 fffffc000894d8cc
> 3b80: fffffe00e8073bb0 fffffc00087b5bc0 fffffe00f487f280 fffffe00f4873780
> 3ba0: 000000003eff0000 fffffe00ffffc488 fffffe00e8073be0 fffffc000851f7d8
> 3bc0: fffffe00ec1a3800 fffffe00ec1a3810 fffffc0008f8dab8 fffffc0009f88000
> 3be0: fffffe00e8073c00 fffffc000862a760 00000000fffffffe fffffe00ec1a3810
> 3c00: fffffe00e8073c30 fffffc0008627ac8 fffffe00ec1a3810 0000000000000000
> 3c20: fffffc0008f8dae0 0000000000000000 fffffe00e8073c70 fffffc0008627e54
> 3c40: fffffe00ec1a3810 fffffe00ec1a3870 fffffc0008f8dae0 0000000000000000
> 3c60: fffffc0008fc4000 fffffc0008d20464 fffffe00e8073ca0 fffffc00086252a8
> 3c80: 0000000000000000 fffffc0008f8dae0 fffffc0008627d28 fffffc0008d86398
> 3ca0: fffffe00e8073ce0 fffffc0008627130 fffffc0008f8dae0 fffffe00fd84f600
> 3cc0: fffffc0008fc5768 0000000000000000 fffffe00f9366d30 fffffe00ec111b98
> 3ce0: fffffe00e8073d00 fffffc0008626b50 fffffc0008f8dae0 fffffe00fd84f600
> 3d00: fffffe00e8073d40 fffffc0008628ee0 fffffc0008f8dae0 0000000000000000
> 3d20: 0000000000000000 0000000000000006 fffffc0009060000 0000000000000000
> 3d40: fffffe00e8073d60 fffffc000862a684 fffffc0008f8dab8 0000000000000000
> 3d60: fffffe00e8073d80 fffffc0008d5c5ac fffffc0008d5c594 fffffe00e8070000
> 3d80: fffffe00e8073d90 fffffc0008083594 fffffe00e8073e00 fffffc0008d20dec
> 3da0: 0000000000000102 fffffc0009060000 fffffc0008d863a8 0000000000000006
> 3dc0: fffffc0008e51c00 0000000000000000 fffffc0008f048f8 fffffc0008bbe670
> 3de0: 0000000000000000 0000000600000006 fffffc0008d20464 fffffc0008d063c0
> 3e00: fffffe00e8073ea0 fffffc0008945568 fffffc0008945550 0000000000000000
> 3e20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3e40: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
> 3e60: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
> 3e80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3ea0: 0000000000000000 fffffc0008083330 fffffc0008945550 0000000000000000
> 3ec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3ee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 3fc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000
> 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call trace:
> Exception stack(0xfffffe00e80738a0 to 0xfffffe00e80739d0)
> 38a0: 000000003eff0000 0000040000000000 fffffe00e8073a80 fffffc0008487c48
> 38c0: 0000000080000145 000000000000003d fffffe00e8073960 fffffc0008137f44
> 38e0: d7104ea0cfa63c7e fffffe00e0061cd8 0000000000000000 0000000000000000
> 3900: fffffe00e0061400 0000000000000000 0000000000000001 fffffc0009cc4000
> 3920: 0000000000000002 fffffe00e0061cb0 fffffe00ec1a3bd0 0000000000000000
> 3940: fffffe00e0061400 0000000000000000 00e800003eff0707 0000000041040000
> 3960: 0000000040000000 fffffc0008ee5eb0 000000000000ffff fffffe000104f700
> 3980: 00000200c01f0000 fffffc0009002e68 fffffe00e8070000 0000000000000001
> 39a0: 0000000000000238 0000000000000006 0000000000000237 fffffc0009cc4305
> 39c0: fffffc0089cc42f7 0000000000000006
> [<fffffc0008487c48>] ioremap_page_range+0x168/0x2f8
> [<fffffc00084fc6e0>] pci_remap_iospace+0x68/0x88
> [<fffffc000851f6e4>] pci_host_common_probe+0x24c/0x318
> [<fffffc000851f7e4>] gen_pci_probe+0x34/0x40
> [<fffffc000862a760>] platform_drv_probe+0x60/0xc8
> [<fffffc0008627ac8>] driver_probe_device+0x240/0x4a0
> [<fffffc0008627e54>] __driver_attach+0x12c/0x130
> [<fffffc00086252a8>] bus_for_each_dev+0x70/0xb0
> [<fffffc0008627130>] driver_attach+0x30/0x40
> [<fffffc0008626b50>] bus_add_driver+0x200/0x2b8
> [<fffffc0008628ee0>] driver_register+0x68/0x100
> [<fffffc000862a684>] __platform_driver_register+0x54/0x60
> [<fffffc0008d5c5ac>] gen_pci_driver_init+0x18/0x20
> [<fffffc0008083594>] do_one_initcall+0x44/0x138
> [<fffffc0008d20dec>] kernel_init_freeable+0x23c/0x2dc
> [<fffffc0008945568>] kernel_init+0x18/0x110
> [<fffffc0008083330>] ret_from_fork+0x10/0x20
> Code: 54fff801 52800000 1400005e d503201f (d4210000)
> ---[ end trace eb27dece942f441a ]---
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> SMP: stopping secondary CPUs
> Kernel Offset: disabled
> Memory Limit: none
> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Is this a known bug?

(I'm so going to *not* bisect this. I'm incredibly fed up with upstream
kernel regressions. Looks like I can't test my fix (with Rawhide at least)
for a regression that appeared in 4.8-rc1 -- and that I *had* bisected, see
comment 20 -- because the most recent upstream kernel doesn't even boot, due
to a *different* regression.)

... Actually, I shouldn't even have tried to test Rawhide just yet; my patch
hasn't been pulled from drm-next into Linus's tree, looks like. Well, at
least I found a new crash. Yay.

Comment 37 Joachim Frieben 2016-10-06 16:55:00 UTC
(In reply to Laszlo Ersek from comment #33)
This bug is for x86_64 alone as indicated in section "Hardware". I really recommend reporting a new bug for specific aarch64 issues.
The present issue was shown to be fixed by building an updated kernel module against kernel-4.8.0-0.rc8.git0.1.fc25 with the patch from comment 31. Thanks!

Comment 38 Josh Boyer 2016-10-06 17:09:20 UTC
(In reply to Laszlo Ersek from comment #36)
> (In reply to Josh Boyer from comment #35)
> 
> > Added in F25.  Rawhide should pick it up shortly.  If it doesn't land in
> > the merge window we can pull it in at 4.9-rc1.
> 
> Thanks!
> 
> Unfortunately, there don't seem to be nightly builds for F25 (any longer?),
> and I'd have to test this change with the installer ISO.
> 
> Regarding Rawhide, I just tried
> 
>   Fedora-Server-dvd-aarch64-Rawhide-20161006.n.0.iso
> 
> Apparently, since comment 33 (i.e.,
> Fedora-Server-dvd-aarch64-Rawhide-20161003.n.1.iso) Rawhide has moved to
> "4.9.0-0.rc0.git2.1" (v4.8-2283-ga3443cd); I guess due to the release of the
> upstream 4.8 kernel. The problem is that the new kernel promptly crashes for
> me in my (identical config) VM when I boot the ISO:

<snip>
 
> Is this a known bug?

Yes.  There are other bugs open for it.

> (I'm so going to *not* bisect this. I'm incredibly fed up with upstream
> kernel regressions. Looks like I can't test my fix (with Rawhide at least)
> for a regression that appeared in 4.8-rc1 -- and that I *had* bisected, see
> comment 20 -- because the most recent upstream kernel doesn't even boot, due
> to a *different* regression.)

Frustration understandable, but upstream is the next Fedora is the next RHEL.  Writing it off will only bite you (and others) later on.

> ... Actually, I shouldn't even have tried to test Rawhide just yet; my patch
> hasn't been pulled from drm-next into Linus's tree, looks like. Well, at
> least I found a new crash. Yay.

Right.  Rawhide doesn't have the kernel patch.  Actually, none of the available kernels in F25 have the patch either.  It's only been committed in Fedora git at this point.

The next F25 build will contain it and bodhi will leave a comment here when it is available in an update.  Rawhide will pick it up one way or another likely around the 4.9-rc1/rc2 timeframe.

Comment 39 Laszlo Ersek 2016-10-06 17:13:50 UTC
Joachim, you may have reported the bug (the kernel regression breaking virtio-gpu-pci and virtio-vga in Xorg) only for x86_64, but it definitely affects other architectures. In fact it affects aarch64 *more*, because for x86_64 guests, you can pick a bunch of other video card device models, such as std VGA, QXL, even Cirrus if you are into antique stiff. But, for aarch64 KVM guests, if you want a graphics card that works with *both* guest firmware and guest kernel, you can only use virtio-gpu-pci. So my wanting to test the DRM virtio-gpu fix against aarch64 makes sense.

Anyway, I agree I shouldn't pollute this BZ with an independent aarch64 kernel crash. I don't intend to. I just wanted to ask Josh if he was aware of this specific crash (so that I could go to the existent BZ if he was), and for that -- i.e. for identifying the crash -- I had to paste the backtrace. If there's no BZ yet for this crash, I agree I should report one. (My googling didn't turn up anything relevant.)

Comment 40 Laszlo Ersek 2016-10-06 17:29:25 UTC
(In reply to Josh Boyer from comment #38)
> (In reply to Laszlo Ersek from comment #36)

> > Is this a known bug?
> 
> Yes.  There are other bugs open for it.

Can you please provide a BZ number?

> > (I'm so going to *not* bisect this. I'm incredibly fed up with upstream
> > kernel regressions. Looks like I can't test my fix (with Rawhide at least)
> > for a regression that appeared in 4.8-rc1 -- and that I *had* bisected, see
> > comment 20 -- because the most recent upstream kernel doesn't even boot, due
> > to a *different* regression.)
> 
> Frustration understandable, but upstream is the next Fedora is the next
> RHEL.  Writing it off will only bite you (and others) later on.

You would be right, if kernel development and graphics development were among my responsibilities. They aren't. All I've been doing with xorg, kernel, and firefox recently, is strictly extra-curricular for me. In these areas I'm really nothing more than a random contributor not entirely unfamiliar with general debugging that tries (well, tried) to help. I've simply picked some BZs that piqued my interest (due to their relation to virtualization) and seemed stuck.

(

> > ... Actually, I shouldn't even have tried to test Rawhide just yet; my patch
> > hasn't been pulled from drm-next into Linus's tree, looks like. Well, at
> > least I found a new crash. Yay.
> 
> Right.  Rawhide doesn't have the kernel patch.  Actually, none of the
> available kernels in F25 have the patch either.  It's only been committed in
> Fedora git at this point.
> 
> The next F25 build will contain it and bodhi will leave a comment here when
> it is available in an update.  Rawhide will pick it up one way or another
> likely around the 4.9-rc1/rc2 timeframe.

My ultimate goal with this work is to help enable aarch64 Fedora Installer and Live CDs to launch with a GUI out-of-the-box when booted in QEMU/KVM virtual machines. (I work in virt; that's my ulterior motive.) For some reason, Installer and Live media are not updated periodically for released Fedora OSes (unlike for released Debian OSes, for example) -- this status could be justified, I just don't know the reasons --; the end result is that, whenever I want to test a bugfix in the Installer or Live environment, I can only use Rawhide. Only Rawhide gets nightly Installer / Live media builds.

)

Thank you for the quick action picking up the patch in F25!

Comment 41 Josh Boyer 2016-10-06 17:31:42 UTC
(In reply to Laszlo Ersek from comment #40)
> (In reply to Josh Boyer from comment #38)
> > (In reply to Laszlo Ersek from comment #36)
> 
> > > Is this a known bug?
> > 
> > Yes.  There are other bugs open for it.
> 
> Can you please provide a BZ number?

1382318 I think.  There are a number of problems with the initial 4.9 kernels that have been reported at any rate.

Comment 42 Fedora Update System 2016-10-09 12:29:36 UTC
kernel-4.8.1-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-b762b15e29

Comment 43 Fedora Update System 2016-10-10 17:41:50 UTC
kernel-4.8.1-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 44 Laszlo Ersek 2016-10-12 14:43:57 UTC
(In reply to Joachim Frieben from comment #31)
> Created attachment 1207119 [details]
> Patch for virtio_gpu kernel module solving the busid issue
> 
> The patch applies to the linux-4.8.0-0.rc8.git0.1.fc25 build tree and
> actually solves the busid issue.

The patch is now in upstream Linux:

commit c2cbc38b9715bd8318062e600668fc30e5a3fbfa
Author: Laszlo Ersek <lersek@redhat.com>
Date:   Mon Oct 3 19:43:03 2016 +0200

    drm: virtio: reinstate drm_virtio_set_busid()

Comment 45 Laszlo Ersek 2016-10-13 21:38:36 UTC
At long last!

Tonight I tested

  Fedora-Server-dvd-aarch64-Rawhide-20161013.n.0.iso

in an aarch64 guest using virtio-gpu-pci.

This ISO comes with

  kernel-4.9.0-0.rc0.git7.1.fc26

That is the first Rawhide kernel that simultaneously:

- includes the downstream workaround for bug 1382530 -- i.e., it disables
  CONFIG_DEBUG_TEST_DRIVER_REMOVE in the config, so it boots at all --,

- and is a descendant of upstream commit c2cbc38b9715 ("drm: virtio:
  reinstate drm_virtio_set_busid()").

I almost can't believe my eyes, but this ISO boots into the graphical
installer environment, without kernel command line parameters or other
tweaks.

The installation completes fine, but it doesn't include a desktop
environment by default. (Likely expected for a Server installation.) The
commands

  dnf group install 'Xfce Desktop'
  systemctl set-default graphical.target
  reboot

do the trick.

Comment 46 Laszlo Ersek 2016-10-16 12:04:29 UTC
4.8.1-1.fc25.aarch64 works fine too; thanks.


Note You need to log in before you can comment on or make changes to this bug.