Bug 1665433 - [Hyper-V][RHEL 7.6]Startx will have segment fault with hyper-V environment
Summary: [Hyper-V][RHEL 7.6]Startx will have segment fault with hyper-V environment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: xorg-x11-server
Version: 7.6
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Adam Jackson
QA Contact: HuijingHei
URL:
Whiteboard:
Depends On:
Blocks: 1704513 1717309 1722524
TreeView+ depends on / blocked
 
Reported: 2019-01-11 11:38 UTC by jqdeng
Modified: 2019-10-30 05:46 UTC (History)
21 users (show)

Fixed In Version: xorg-x11-server-1.20.4-7.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1704513 1717309 (view as bug list)
Environment:
Last Closed: 2019-08-06 12:42:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:2079 None None None 2019-08-06 12:43:10 UTC

Description jqdeng 2019-01-11 11:38:21 UTC
Description of problem:
It will have segmant fault when startx on Redhat7.6 virtual machine with running on Hyper-V hypervisor

It is because the tip Xorg 1.20.1 changes the logic to get the busid from :   
buf = drmGetBusid(fd);
xf86_platform_odev_attributes(delayed_index)->busid = XNFstrdup(buf);
drmFreeBusid(buf);
to follow: 
  if (!strncmp(attribs->syspath, pci_prefix, strlen(pci_prefix))) {        
   char *dbdf = attribs->syspath + strlen(pci_prefix) + strlen("XXXX:XX") + 1;                                                         
   asprintf(&xf86_platform_odev_attributes(delayed_index)->busid,                                                                      
                 "pci:%.12s", dbdf);                                                                                                        
   LogMessage(X_INFO, "Platform PCI device at %s\n",                                                                                   
   xf86_platform_odev_attributes(delayed_index)->busid);                                                                    
   } 
, as the pci_prefix is "/sys/devices/pci", but on the hyper-V platform, the pci_hyperv driver will create devices under the directory:
/sys/devices/LNXSYSTM:00/device:00/ACPI0004:00/VMBUS:00/74f89322-92bd-400f-a8bb-008a102a3e4d/pcia8bb:00/a8bb:00:00.0/drm/card0,
so the strncmp function will return fail, then the busid of will be NULL. And then it will have segmant fault in xf86platformProbe.

Comment 2 Yaju Cao 2019-01-14 06:40:20 UTC
Hi, Thanks for reporting this issue. I have tried with RHEL 7.6 VM on Hyper-V, but not seen anything abnormal after starting GUI. Could you tell how to trigger this issue with detailed steps? Thanks!

Comment 3 jqdeng 2019-01-21 07:13:25 UTC
Hi,
   Thanks for your response. 
   I passthrough 1 amd graphic device to the Virtual machine, and load the graphic driver amdgpu. And then I run "init 3" and startx. The issue's root cause is because the tip
Xorg assume the pci device's directory is "/sys/devices/pci", but on the hyper-V platform, the pci device's directory is similar to "/sys/devices/LNXSYSTM:00/device:00/ACPI0004:00/VMBUS:00/74f89322-92bd-400f-a8bb-008a102a3e4d/pci".
   You could check the function "get_drm_info" in hw/xfree86/os-support/linux/lnx-platform.c.

Comment 4 jqdeng 2019-01-23 07:08:23 UTC
Hi,
   What is the update? It is related with the follow change:
From b96e7972e90144a697401f393ae8e1e12b3e767c Mon Sep 17 00:00:00 2001
From: Adam Jackson <ajax@redhat.com>
Date: Tue, 18 Sep 2018 14:37:51 -0400
Subject: [PATCH] linux: Make platform device probe less fragile

If we have platform devices - and we usually do - we would really want
them to bind through the platform bus code not PCI. At the point where
get_drm_info runs, however, we haven't yet taken our own VT, which means
we can't perform drm "master" operations on the device. This is tragic,
because the operation we need to perform here is fishing the bus id out
of the kernel, which we can only do after drmSetInterfaceVersion, which
for some reason stores that knowledge on the device not the file handle
and thus needs master access. Since we fail, the probe logic gets very
confused.

Fortunately we know the format of the busid string (it's our own, drm
copied it from xfree86), so we can scrape that out of the sysfs path. We
do still potentially do the whole SetInterfaceVersion dance later on,
but it's harmless at that point because we've taken the VT by then.

This should all be vastly simplified, but that is not the cat we're
skinning today.
---

Comment 5 Dexuan Cui 2019-01-23 09:26:07 UTC
I don't know the background of the patch "[PATCH] linux: Make platform device probe less fragile", and I never read the code of Xorg, but please let me share my thoughts about the sysfs paths.

On KVM or a physical host, I suppose the GPU device is directly attached to the PCI root bridge, and hence you see the short path /sys/devices/pci0000:00/. Note: I think the kernel (rather than Hyper-V drivers) handles PCI root bridge ("PNP0A03") specially: see arch/x86/pci/acpi.c: acpi_pci_root_add() -> pci_acpi_scan_root() -> acpi_pci_root_create() -> pci_create_root_bus(NULL, …)

struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
                struct pci_ops *ops, void *sysdata, struct list_head *resources);

That is to say, the kernel hardcodes the root bridge’s "parent" to NULL, meaning the devices on the PCI root bridge appear in /sys/devices/pci0000:00/.

On the other hand, pci-hyperv’s parent is not NULL:

drivers/pci/controller/pci-hyperv.c:
     static int create_root_hv_pci_bus(struct hv_pcibus_device *hbus)
{
                      /* Register the device */
        hbus->pci_bus = pci_create_root_bus(&hbus->hdev->device, …)

The sysfs name of “hbus->hdev->device” is “LNXSYSTM:00/device:00/ACPI0004:00/VMBUS:00/74f89322-92bd-400f-a8bb-008a102a3e4d” in your case. That’s why you see the device in /sys/devices/LNXSYSTM:00/device:00/ACPI0004:00/VMBUS:00/74f89322-92bd-400f-a8bb-008a102a3e4d/pciXXXX:XX/.

You may ask why we can’t use a NULL parent in pci-hyperv. 

My understanding is: that doesn’t correctly reflect how the device is connected to the system when the guest runs on Hyper-V, and hence that can cause a conflict of bus/device name, and as a result we may fail to register the bus/device.

In summary, I don’t think there is anything we can fix in the kernel/driver. 

I suggest RedHat should not assume the GPU device always resides on the PCI root bridge.

Comment 7 Jack Hammons 2019-04-09 19:03:29 UTC
Business Justification: This is an issue that is blocking a vendor as they perform bring up of new hardware. This work will enable future scenarios in Azure that RedHat 7.6 will be unable to leverage without a fix.

Requesting a bump to urgent and flagging for z-stream review.

Comment 9 Yaju Cao 2019-04-11 09:55:16 UTC
hhei@ has reproduced the issue on Hyper-V. She will update details later.

Comment 10 HuijingHei 2019-04-12 01:43:16 UTC
@yacao, thanks!

Here is the logs:
1. Pass through gpu card to rhel7.6 guest, then startx and get "Segmentation fault at address 0x0", if remove the gpu card, startx successfully

# lspci -v
be3f:00:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device 100a
	Flags: bus master, fast devsel, latency 0, IRQ 24
	Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
	Memory at fe0000000 (64-bit, prefetchable) [size=128M]
	Memory at fe8000000 (64-bit, prefetchable) [size=32M]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Kernel driver in use: nouveau
	Kernel modules: nouveau


# startx
X.Org X Server 1.20.1
X Protocol Version 11, Revision 0
[  1603.323] Build Operating System:  3.10.0-862.2.3.el7.x86_64 
[  1603.323] Current Operating System: Linux bootp-73-131-209.rhts.eng.pek2.redhat.com 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64
[  1603.324] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-957.el7.x86_64 root=/dev/mapper/rhel_bootp--73--199--7-root ro crashkernel=auto rd.lvm.lv=rhel_bootp-73-199-7/root rd.lvm.lv=rhel_bootp-73-199-7/swap rhgb quiet LANG=en_US.UTF-8
[  1603.324] Build Date: 24 September 2018  06:30:46PM
[  1603.324] Build ID: xorg-x11-server 1.20.1-3.el7 
[  1603.324] Current version of pixman: 0.34.0
[  1603.324] 	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
[  1603.324] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  1603.324] (==) Log file: "/var/log/Xorg.3.log", Time: Fri Apr 12 09:32:11 2019
[  1603.325] (==) Using config directory: "/etc/X11/xorg.conf.d"
[  1603.325] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  1603.340] (==) No Layout section.  Using the first Screen section.
[  1603.340] (==) No screen section available. Using defaults.
[  1603.340] (**) |-->Screen "Default Screen Section" (0)
[  1603.340] (**) |   |-->Monitor "<default monitor>"
[  1603.341] (==) No monitor specified for screen "Default Screen Section".
	Using a default monitor configuration.
[  1603.341] (==) Automatically adding devices
[  1603.341] (==) Automatically enabling devices
[  1603.341] (==) Automatically adding GPU devices
[  1603.341] (==) Automatically binding GPU devices
[  1603.341] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  1603.341] (==) FontPath set to:
	catalogue:/etc/X11/fontpath.d,
	built-ins
[  1603.341] (==) ModulePath set to "/usr/lib64/xorg/modules"
[  1603.341] (II) The server relies on udev to provide the list of input devices.
	If no devices become available, reconfigure udev or disable AutoAddDevices.
[  1603.341] (II) Loader magic: 0x55c7fb36b020
[  1603.341] (II) Module ABI versions:
[  1603.341] 	X.Org ANSI C Emulation: 0.4
[  1603.341] 	X.Org Video Driver: 24.0
[  1603.341] 	X.Org XInput driver : 24.1
[  1603.341] 	X.Org Server Extension : 10.0
[  1603.341] (II) xfree86: Adding drm device (/dev/dri/card0)
[  1603.342] (II) Platform probe for /sys/devices/LNXSYSTM:00/device:00/ACPI0004:00/VMBUS:00/9557dce2-607d-45a7-be3f-6be1a3dee24e/pcibe3f:00/be3f:00:00.0/drm/card0
[  1603.353] (EE) 
[  1603.353] (EE) Backtrace:
[  1603.353] (EE) 0: /usr/bin/X (xorg_backtrace+0x55) [0x55c7fb0dd155]
[  1603.353] (EE) 1: /usr/bin/X (0x55c7faf2c000+0x1b4dd9) [0x55c7fb0e0dd9]
[  1603.353] (EE) 2: /lib64/libpthread.so.0 (0x7fdc54ce9000+0xf5d0) [0x7fdc54cf85d0]
[  1603.353] (EE) 3: /usr/bin/X (0x55c7faf2c000+0xb5d28) [0x55c7fafe1d28]
[  1603.353] (EE) 4: /usr/bin/X (xf86BusProbe+0x9) [0x55c7fafbb0b9]
[  1603.353] (EE) 5: /usr/bin/X (InitOutput+0x718) [0x55c7fafc8d58]
[  1603.353] (EE) 6: /usr/bin/X (0x55c7faf2c000+0x601b0) [0x55c7faf8c1b0]
[  1603.353] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7fdc5493e3d5]
[  1603.353] (EE) 8: /usr/bin/X (0x55c7faf2c000+0x4a4ce) [0x55c7faf764ce]
[  1603.354] (EE) 
[  1603.354] (EE) Segmentation fault at address 0x0
[  1603.354] (EE) 
Fatal server error:
[  1603.354] (EE) Caught signal 11 (Segmentation fault). Server aborting

Comment 12 Yaju Cao 2019-05-14 09:33:14 UTC
From the description, this is a regression in RHEL 7.6

Comment 15 Adam Jackson 2019-05-14 19:14:20 UTC
Please try this test build (against 7.6):

https://people.redhat.com/ajackson/1665433/

Comment 16 Rick Barry 2019-05-14 20:54:00 UTC
(In reply to Adam Jackson from comment #15)
> Please try this test build (against 7.6):
> 
> https://people.redhat.com/ajackson/1665433/

Thanks, Adam. I assume this is for external testing.

Jack, is there anyone @Microsoft who is waiting/able to try out a test build with this fix?

Comment 17 Jack Hammons 2019-05-14 21:10:21 UTC
We are proxying this request for an IHV. I have provided them with the link to your test build and will update this Bugzilla with any information that they provide.

Comment 18 HuijingHei 2019-05-15 03:22:20 UTC
(In reply to Adam Jackson from comment #15)
> Please try this test build (against 7.6):
> 
> https://people.redhat.com/ajackson/1665433/

Hi, I have tried this build, after startx, the terminal freeze but the os works, can not display GUI after pass through GPU card to rhel7.6 guest, check there is no error logs in /var/log/Xorg.0.log. Is there anything I missed when display with GPU? Additional info, when I remove the GPU and startx successfully, then add gpu, get error logs and console return back to terminal, add the logs and hope can be helpful 

 
1) # rpm -qa | grep -i xorg-x11-server
xorg-x11-server-common-1.20.4-7.el7_6.x86_64
xorg-x11-server-Xorg-1.20.4-7.el7_6.x86_64
xorg-x11-server-utils-7.7-20.el7.x86_64

2) # lspci -v
9c0f:00:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device 100a
	Flags: bus master, fast devsel, latency 0, IRQ 24
	Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
	Memory at fe0000000 (64-bit, prefetchable) [size=128M]
	Memory at fe8000000 (64-bit, prefetchable) [size=32M]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Kernel driver in use: nouveau
	Kernel modules: nouveau

3) # startx

# cat /var/log/Xorg.0.log
[   112.075] 
X.Org X Server 1.20.4
X Protocol Version 11, Revision 0
[   112.075] Build Operating System:  3.10.0-862.2.3.el7.x86_64 
[   112.075] Current Operating System: Linux bootp-73-131-209.rhts.eng.pek2.redhat.com 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64
[   112.076] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-957.el7.x86_64 root=/dev/mapper/rhel_bootp--73--199--7-root ro crashkernel=auto rd.lvm.lv=rhel_bootp-73-199-7/root rd.lvm.lv=rhel_bootp-73-199-7/swap rhgb quiet LANG=en_US.UTF-8
[   112.076] Build Date: 14 May 2019  07:08:50PM
[   112.076] Build ID: xorg-x11-server 1.20.4-7.el7_6 
[   112.076] Current version of pixman: 0.34.0
[   112.076] 	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
[   112.076] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[   112.076] (==) Log file: "/var/log/Xorg.0.log", Time: Wed May 15 10:55:28 2019
[   112.077] (==) Using config directory: "/etc/X11/xorg.conf.d"
[   112.077] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[   112.083] (==) No Layout section.  Using the first Screen section.
[   112.083] (==) No screen section available. Using defaults.
[   112.083] (**) |-->Screen "Default Screen Section" (0)
[   112.083] (**) |   |-->Monitor "<default monitor>"
[   112.083] (==) No monitor specified for screen "Default Screen Section".
	Using a default monitor configuration.
[   112.083] (==) Automatically adding devices
[   112.083] (==) Automatically enabling devices
[   112.083] (==) Automatically adding GPU devices
[   112.083] (==) Automatically binding GPU devices
[   112.083] (==) Max clients allowed: 256, resource mask: 0x1fffff
[   112.083] (==) FontPath set to:
	catalogue:/etc/X11/fontpath.d,
	built-ins
[   112.083] (==) ModulePath set to "/usr/lib64/xorg/modules"
[   112.083] (II) The server relies on udev to provide the list of input devices.
	If no devices become available, reconfigure udev or disable AutoAddDevices.
[   112.083] (II) Loader magic: 0x5630ab068020
[   112.083] (II) Module ABI versions:
[   112.083] 	X.Org ANSI C Emulation: 0.4
[   112.083] 	X.Org Video Driver: 24.0
[   112.083] 	X.Org XInput driver : 24.1
[   112.083] 	X.Org Server Extension : 10.0
[   112.084] (II) xfree86: Adding drm device (/dev/dri/card0)
[   112.131] (--) PCI:*(0@39951:0:0) 10de:11bf:10de:100a rev 161, Mem @ 0xfa000000/16777216, 0xfe0000000/134217728, 0xfe8000000/33554432, BIOS @ 0x????????/131072
[   112.131] (II) LoadModule: "glx"
[   112.138] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[   112.175] (II) Module glx: vendor="X.Org Foundation"
[   112.175] 	compiled for 1.20.4, module version = 1.0.0
[   112.175] 	ABI class: X.Org Server Extension, version 10.0
[   112.346] (==) Matched modesetting as autoconfigured driver 0
[   112.346] (==) Matched fbdev as autoconfigured driver 1
[   112.346] (==) Matched vesa as autoconfigured driver 2
[   112.346] (==) Assigned the driver to the xf86ConfigLayout
[   112.346] (II) LoadModule: "modesetting"
[   112.347] (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so
[   112.360] (II) Module modesetting: vendor="X.Org Foundation"
[   112.360] 	compiled for 1.20.4, module version = 1.20.4
[   112.360] 	Module class: X.Org Video Driver
[   112.360] 	ABI class: X.Org Video Driver, version 24.0
[   112.360] (II) LoadModule: "fbdev"
[   112.360] (II) Loading /usr/lib64/xorg/modules/drivers/fbdev_drv.so
[   112.365] (II) Module fbdev: vendor="X.Org Foundation"
[   112.365] 	compiled for 1.20.0, module version = 0.5.0
[   112.365] 	Module class: X.Org Video Driver
[   112.365] 	ABI class: X.Org Video Driver, version 24.0
[   112.365] (II) LoadModule: "vesa"
[   112.365] (II) Loading /usr/lib64/xorg/modules/drivers/vesa_drv.so
[   112.451] (II) Module vesa: vendor="X.Org Foundation"
[   112.451] 	compiled for 1.20.0, module version = 2.4.0
[   112.451] 	Module class: X.Org Video Driver
[   112.451] 	ABI class: X.Org Video Driver, version 24.0
[   112.451] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[   112.451] (II) FBDEV: driver for framebuffer: fbdev
[   112.451] (II) VESA: driver for VESA chipsets: vesa
[   112.451] (++) using VT number 1

[   112.460] (II) modeset(0): using drv /dev/dri/card0
[   112.460] (WW) Falling back to old probe method for fbdev
[   112.460] (II) Loading sub module "fbdevhw"
[   112.460] (II) LoadModule: "fbdevhw"
[   112.460] (II) Loading /usr/lib64/xorg/modules/libfbdevhw.so
[   112.467] (II) Module fbdevhw: vendor="X.Org Foundation"
[   112.467] 	compiled for 1.20.4, module version = 0.0.2
[   112.467] 	ABI class: X.Org Video Driver, version 24.0
[   112.509] (II) modeset(0): Creating default Display subsection in Screen section
	"Default Screen Section" for depth/fbbpp 24/32
[   112.509] (==) modeset(0): Depth 24, (==) framebuffer bpp 32
[   112.509] (==) modeset(0): RGB weight 888
[   112.509] (==) modeset(0): Default visual is TrueColor
[   112.509] (II) Loading sub module "glamoregl"
[   112.509] (II) LoadModule: "glamoregl"
[   112.509] (II) Loading /usr/lib64/xorg/modules/libglamoregl.so
[   112.558] (II) Module glamoregl: vendor="X.Org Foundation"
[   112.558] 	compiled for 1.20.4, module version = 1.0.1
[   112.558] 	ABI class: X.Org ANSI C Emulation, version 0.4
[   113.377] (II) modeset(0): glamor X acceleration enabled on NVE4
[   113.377] (II) modeset(0): glamor initialized
[   113.418] (II) modeset(0): Output VGA-1 has no monitor section
[   113.450] (II) modeset(0): Output DVI-I-1 has no monitor section
[   113.482] (II) modeset(0): Output DVI-I-2 has no monitor section
[   113.514] (II) modeset(0): Output DVI-D-1 has no monitor section
[   113.546] (II) modeset(0): Output DVI-D-2 has no monitor section
[   113.587] (II) modeset(0): EDID for output VGA-1
[   113.619] (II) modeset(0): EDID for output DVI-I-1
[   113.651] (II) modeset(0): EDID for output DVI-I-2
[   113.683] (II) modeset(0): EDID for output DVI-D-1
[   113.715] (II) modeset(0): EDID for output DVI-D-2
[   113.715] (II) modeset(0): Output VGA-1 disconnected
[   113.715] (II) modeset(0): Output DVI-I-1 disconnected
[   113.715] (II) modeset(0): Output DVI-I-2 disconnected
[   113.715] (II) modeset(0): Output DVI-D-1 disconnected
[   113.715] (II) modeset(0): Output DVI-D-2 disconnected
[   113.715] (WW) modeset(0): No outputs definitely connected, trying again...
[   113.715] (II) modeset(0): Output VGA-1 disconnected
[   113.715] (II) modeset(0): Output DVI-I-1 disconnected
[   113.715] (II) modeset(0): Output DVI-I-2 disconnected
[   113.715] (II) modeset(0): Output DVI-D-1 disconnected
[   113.715] (II) modeset(0): Output DVI-D-2 disconnected
[   113.715] (WW) modeset(0): Unable to find connected outputs - setting 1024x768 initial framebuffer
[   113.715] (==) modeset(0): Using gamma correction (1.0, 1.0, 1.0)
[   113.715] (==) modeset(0): DPI set to (96, 96)
[   113.715] (II) Loading sub module "fb"
[   113.715] (II) LoadModule: "fb"
[   113.715] (II) Loading /usr/lib64/xorg/modules/libfb.so
[   113.726] (II) Module fb: vendor="X.Org Foundation"
[   113.726] 	compiled for 1.20.4, module version = 1.0.0
[   113.726] 	ABI class: X.Org ANSI C Emulation, version 0.4
[   113.726] (II) UnloadModule: "fbdev"
[   113.726] (II) Unloading fbdev
[   113.726] (II) UnloadSubModule: "fbdevhw"
[   113.726] (II) Unloading fbdevhw
[   113.726] (II) UnloadModule: "vesa"
[   113.726] (II) Unloading vesa
[   113.762] (==) modeset(0): Backing store enabled
[   113.762] (==) modeset(0): Silken mouse enabled
[   113.763] (II) modeset(0): Initializing kms color map for depth 24, 8 bpc.
[   113.763] (==) modeset(0): DPMS enabled
[   113.770] (II) modeset(0): [DRI2] Setup complete
[   113.770] (II) modeset(0): [DRI2]   DRI driver: nouveau
[   113.770] (II) modeset(0): [DRI2]   VDPAU driver: nouveau
[   113.770] (II) Initializing extension Generic Event Extension
[   113.770] (II) Initializing extension SHAPE
[   113.770] (II) Initializing extension MIT-SHM
[   113.770] (II) Initializing extension XInputExtension
[   113.770] (II) Initializing extension XTEST
[   113.770] (II) Initializing extension BIG-REQUESTS
[   113.771] (II) Initializing extension SYNC
[   113.771] (II) Initializing extension XKEYBOARD
[   113.771] (II) Initializing extension XC-MISC
[   113.771] (II) Initializing extension SECURITY
[   113.771] (II) Initializing extension XFIXES
[   113.771] (II) Initializing extension RENDER
[   113.772] (II) Initializing extension RANDR
[   113.772] (II) Initializing extension COMPOSITE
[   113.772] (II) Initializing extension DAMAGE
[   113.772] (II) Initializing extension MIT-SCREEN-SAVER
[   113.772] (II) Initializing extension DOUBLE-BUFFER
[   113.772] (II) Initializing extension RECORD
[   113.773] (II) Initializing extension DPMS
[   113.773] (II) Initializing extension Present
[   113.773] (II) Initializing extension DRI3
[   113.773] (II) Initializing extension X-Resource
[   113.773] (II) Initializing extension XVideo
[   113.773] (II) Initializing extension XVideo-MotionCompensation
[   113.773] (II) Initializing extension SELinux
[   113.774] (II) SELinux: Disabled by boolean
[   113.774] (II) Initializing extension GLX
[   113.777] (II) AIGLX: Loaded and initialized nouveau
[   113.777] (II) GLX: Initialized DRI2 GL provider for screen 0
[   113.777] (II) Initializing extension XFree86-VidModeExtension
[   113.777] (II) Initializing extension XFree86-DGA
[   113.777] (II) Initializing extension XFree86-DRI
[   113.777] (II) Initializing extension DRI2
[   113.779] (II) modeset(0): Damage tracking initialized
[   114.110] (II) config/udev: Adding input device AT Translated Set 2 keyboard (/dev/input/event1)
[   114.110] (**) AT Translated Set 2 keyboard: Applying InputClass "evdev keyboard catchall"
[   114.110] (**) AT Translated Set 2 keyboard: Applying InputClass "system-keyboard"
[   114.110] (II) LoadModule: "evdev"
[   114.110] (II) Loading /usr/lib64/xorg/modules/input/evdev_drv.so
[   114.146] (II) Module evdev: vendor="X.Org Foundation"
[   114.146] 	compiled for 1.19.5, module version = 2.10.6
[   114.146] 	Module class: X.Org XInput Driver
[   114.146] 	ABI class: X.Org XInput driver, version 24.1
[   114.146] (II) Using input driver 'evdev' for 'AT Translated Set 2 keyboard'
[   114.146] (**) AT Translated Set 2 keyboard: always reports core events
[   114.146] (**) evdev: AT Translated Set 2 keyboard: Device: "/dev/input/event1"
[   114.146] (--) evdev: AT Translated Set 2 keyboard: Vendor 0x1 Product 0x1
[   114.146] (--) evdev: AT Translated Set 2 keyboard: Found keys
[   114.146] (II) evdev: AT Translated Set 2 keyboard: Configuring as keyboard
[   114.146] (**) Option "config_info" "udev:/sys/devices/LNXSYSTM:00/device:00/ACPI0004:00/VMBUS:00/d34b2567-b9b6-42b9-8778-0a4ec0b955bf/serio0/input/input1/event1"
[   114.146] (II) XINPUT: Adding extended input device "AT Translated Set 2 keyboard" (type: KEYBOARD, id 6)
[   114.146] (**) Option "xkb_rules" "evdev"
[   114.146] (**) Option "xkb_layout" "us"
[   114.146] (II) config/udev: Adding input device PC Speaker (/dev/input/event2)
[   114.146] (II) No input driver specified, ignoring this device.
[   114.146] (II) This device may have been added with another device file.
[   114.147] (II) config/udev: Adding input device Microsoft Vmbus HID-compliant Mouse (/dev/input/event0)
[   114.147] (**) Microsoft Vmbus HID-compliant Mouse: Applying InputClass "evdev pointer catchall"
[   114.147] (II) Using input driver 'evdev' for 'Microsoft Vmbus HID-compliant Mouse'
[   114.147] (**) Microsoft Vmbus HID-compliant Mouse: always reports core events
[   114.147] (**) evdev: Microsoft Vmbus HID-compliant Mouse: Device: "/dev/input/event0"
[   114.147] (--) evdev: Microsoft Vmbus HID-compliant Mouse: Vendor 0x45e Product 0x621
[   114.147] (--) evdev: Microsoft Vmbus HID-compliant Mouse: Found 9 mouse buttons
[   114.147] (--) evdev: Microsoft Vmbus HID-compliant Mouse: Found scroll wheel(s)
[   114.147] (--) evdev: Microsoft Vmbus HID-compliant Mouse: Found relative axes
[   114.147] (--) evdev: Microsoft Vmbus HID-compliant Mouse: Found absolute axes
[   114.147] (--) evdev: Microsoft Vmbus HID-compliant Mouse: Found x and y absolute axes
[   114.147] (--) evdev: Microsoft Vmbus HID-compliant Mouse: Found absolute touchscreen
[   114.147] (II) evdev: Microsoft Vmbus HID-compliant Mouse: Configuring as touchscreen
[   114.147] (II) evdev: Microsoft Vmbus HID-compliant Mouse: Adding scrollwheel support
[   114.147] (**) evdev: Microsoft Vmbus HID-compliant Mouse: YAxisMapping: buttons 4 and 5
[   114.147] (**) evdev: Microsoft Vmbus HID-compliant Mouse: EmulateWheelButton: 4, EmulateWheelInertia: 10, EmulateWheelTimeout: 200
[   114.147] (**) Option "config_info" "udev:/sys/devices/virtual/input/input0/event0"
[   114.147] (II) XINPUT: Adding extended input device "Microsoft Vmbus HID-compliant Mouse" (type: TOUCHSCREEN, id 7)
[   114.147] (WW) evdev: Microsoft Vmbus HID-compliant Mouse: touchpads, tablets and touchscreens ignore relative axes.
[   114.147] (II) evdev: Microsoft Vmbus HID-compliant Mouse: initialized for absolute axes.
[   114.147] (**) Microsoft Vmbus HID-compliant Mouse: (accel) keeping acceleration scheme 1
[   114.147] (**) Microsoft Vmbus HID-compliant Mouse: (accel) acceleration profile 0
[   114.147] (**) Microsoft Vmbus HID-compliant Mouse: (accel) acceleration factor: 2.000
[   114.147] (**) Microsoft Vmbus HID-compliant Mouse: (accel) acceleration threshold: 4
[   114.148] (II) config/udev: Adding input device Microsoft Vmbus HID-compliant Mouse (/dev/input/js0)
[   114.148] (II) No input driver specified, ignoring this device.
[   114.148] (II) This device may have been added with another device file.
[   114.148] (II) config/udev: Adding input device Microsoft Vmbus HID-compliant Mouse (/dev/input/mouse0)
[   114.148] (II) No input driver specified, ignoring this device.
[   114.148] (II) This device may have been added with another device file.

==============================================================================================
Do the following steps can get error logs, and add the logs
1. Remove the GPU card and startx
2. Add CPU card to vm
3. Get error logs and the console return back to terminal

The error logs after add gpu card to vm with GUI:

[   188.830] (II) config/udev: removing GPU device /sys/devices/LNXSYSTM:00/device:00/ACPI0004:00/VMBUS:00/70d46452-5098-45a8-b529-c3ef52ddfc59/pcib529:00/b529:00:00.0/drm/card0 /dev/dri/card0
[   188.830] (II) config/udev: Adding drm device (/dev/dri/card0)
[   188.830] (II) xfree86: Adding drm device (/dev/dri/card0)
[   188.839] (II) LoadModule: "modesetting"
[   188.840] (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so
[   188.840] (II) Module modesetting: vendor="X.Org Foundation"
[   188.840] 	compiled for 1.20.4, module version = 1.20.4
[   188.840] 	Module class: X.Org Video Driver
[   188.840] 	ABI class: X.Org Video Driver, version 24.0
[   188.858] (II) modeset(G0): using drv /dev/dri/card0
[   188.904] (II) modeset(G0): Creating default Display subsection in Screen section
	"Default Screen Section" for depth/fbbpp 24/32
[   188.904] (==) modeset(G0): Depth 24, (==) framebuffer bpp 32
[   188.904] (==) modeset(G0): RGB weight 888
[   188.904] (==) modeset(G0): Default visual is TrueColor
[   188.905] (II) Loading sub module "glamoregl"
[   188.905] (II) LoadModule: "glamoregl"
[   188.905] (II) Loading /usr/lib64/xorg/modules/libglamoregl.so
[   188.917] (II) Module glamoregl: vendor="X.Org Foundation"
[   188.917] 	compiled for 1.20.4, module version = 1.0.1
[   188.917] 	ABI class: X.Org ANSI C Emulation, version 0.4
[   189.186] (II) modeset(G0): glamor X acceleration enabled on NVE4
[   189.186] (II) modeset(G0): glamor initialized
[   189.227] (II) modeset(G0): Output VGA-1-1 has no monitor section
[   189.259] (II) modeset(G0): Output DVI-I-1-1 has no monitor section
[   189.291] (II) modeset(G0): Output DVI-I-1-2 has no monitor section
[   189.324] (II) modeset(G0): Output DVI-D-1-1 has no monitor section
[   189.356] (II) modeset(G0): Output DVI-D-1-2 has no monitor section
[   189.525] (==) modeset(G0): Using gamma correction (1.0, 1.0, 1.0)
[   189.525] (==) modeset(G0): DPI set to (96, 96)
[   189.525] (II) Loading sub module "fb"
[   189.525] (II) LoadModule: "fb"
[   189.525] (II) Loading /usr/lib64/xorg/modules/libfb.so
[   189.525] (II) Module fb: vendor="X.Org Foundation"
[   189.525] 	compiled for 1.20.4, module version = 1.0.0
[   189.525] 	ABI class: X.Org ANSI C Emulation, version 0.4
[   189.526] (EE) 
[   189.526] (EE) Backtrace:
[   189.526] (EE) 0: /usr/bin/X (xorg_backtrace+0x55) [0x555c1f3b8465]
[   189.526] (EE) 1: /usr/bin/X (0x555c1f207000+0x1b50e9) [0x555c1f3bc0e9]
[   189.526] (EE) 2: /lib64/libpthread.so.0 (0x7fa40bdac000+0xf5d0) [0x7fa40bdbb5d0]
[   189.526] (EE) 3: /lib64/libc.so.6 (gsignal+0x37) [0x7fa40ba15207]
[   189.526] (EE) 4: /lib64/libc.so.6 (abort+0x148) [0x7fa40ba168f8]
[   189.526] (EE) 5: /lib64/libc.so.6 (0x7fa40b9df000+0x2f026) [0x7fa40ba0e026]
[   189.526] (EE) 6: /lib64/libc.so.6 (0x7fa40b9df000+0x2f0d2) [0x7fa40ba0e0d2]
[   189.526] (EE) 7: /usr/bin/X (dixRegisterPrivateKey+0x1cf) [0x555c1f28155f]
[   189.526] (EE) 8: /usr/lib64/xorg/modules/libglamoregl.so (glamor_init+0x17c) [0x7fa3fb74e01c]
[   189.526] (EE) 9: /usr/lib64/xorg/modules/drivers/modesetting_drv.so (0x7fa3fb976000+0x11340) [0x7fa3fb987340]
[   189.526] (EE) 10: /usr/lib64/xorg/modules/drivers/modesetting_drv.so (0x7fa3fb976000+0x8cfc) [0x7fa3fb97ecfc]
[   189.526] (EE) 11: /usr/bin/X (AddGPUScreen+0x7d) [0x555c1f26383d]
[   189.526] (EE) 12: /usr/bin/X (0x555c1f207000+0xb64d0) [0x555c1f2bd4d0]
[   189.526] (EE) 13: /usr/bin/X (0x555c1f207000+0xbb8f1) [0x555c1f2c28f1]
[   189.526] (EE) 14: /usr/bin/X (0x555c1f207000+0xb7c45) [0x555c1f2bec45]
[   189.526] (EE) 15: /usr/bin/X (0x555c1f207000+0xb84ce) [0x555c1f2bf4ce]
[   189.526] (EE) 16: /usr/bin/X (0x555c1f207000+0x1b5c02) [0x555c1f3bcc02]
[   189.526] (EE) 17: /usr/bin/X (WaitForSomething+0x1bb) [0x555c1f3b5deb]
[   189.526] (EE) 18: /usr/bin/X (0x555c1f207000+0x5c281) [0x555c1f263281]
[   189.526] (EE) 19: /usr/bin/X (0x555c1f207000+0x6049a) [0x555c1f26749a]
[   189.526] (EE) 20: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7fa40ba013d5]
[   189.527] (EE) 21: /usr/bin/X (0x555c1f207000+0x4a58e) [0x555c1f25158e]
[   189.527] (EE) 
[   189.527] (EE) 
Fatal server error:
[   189.527] (EE) Caught signal 6 (Aborted). Server aborting
[   189.527] (EE) 
[   189.527] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[   189.527] (EE) Please also check the log file at "/var/log/Xorg.1.log" for additional information.
[   189.527] (EE) 
[   189.530] (EE) Server terminated with error (1). Closing log file.

Comment 19 HuijingHei 2019-05-15 07:27:39 UTC
(In reply to HuijingHei from comment #18)
> (In reply to Adam Jackson from comment #15)
> > Please try this test build (against 7.6):
> > 
> > https://people.redhat.com/ajackson/1665433/
> 
> Hi, I have tried this build, after startx, the terminal freeze but the os
> works, can not display GUI after pass through GPU card to rhel7.6 guest,
> check there is no error logs in /var/log/Xorg.0.log. Is there anything I
> missed when display with GPU? Additional info, when I remove the GPU and
> startx successfully, then add gpu, get error logs and console return back to
> terminal, add the logs and hope can be helpful 
> 
  
Also tried RHEL7.5, the result is same as test build(xorg-x11-server-Xorg-1.20.4-7.el7_6.x86_64), perhaps it is expected that we can not see the GUI with GPU card pass through. We can wait for the test result from vendor. Thanks!

Comment 20 richard.weedon.ctr 2019-05-17 18:44:06 UTC
Seeing this issue in Azure Government using the marketplace 7.6 image on NV6 hardware.  GUI access is impossible while GPU is enabled.

Comment 21 Michael A. Milazzo 2019-05-17 22:31:07 UTC
A similar issue was posted to GitHub for CentOS (https://github.com/MicrosoftDocs/azure-docs/issues/26014).  We're definitely experiencing this in Azure CentOS/RHEL 7.6 images.

Comment 22 Yuxin Sun 2019-05-20 10:42:34 UTC
Hi Michael A. Milazzo,

Could you please share how to use GUI in Azure? I can reproduce this issue in Azure but I don't know exactly how use GUI in Azure. Is that "ssh -X"? Or any GUI tools? Thank you so much!

Comment 23 richard.weedon.ctr 2019-05-20 15:28:23 UTC
(In reply to yuxisun@redhat.com from comment #22)
> Hi Michael A. Milazzo,
> 
> Could you please share how to use GUI in Azure? I can reproduce this issue
> in Azure but I don't know exactly how use GUI in Azure. Is that "ssh -X"? Or
> any GUI tools? Thank you so much!

Yuxisun,

Milazzo is working with us as well as doing his own testing.  I have tried every means we can think of.  SSH to X, tigerVNC, xRDP.  They all rely on startx (xorg x11 1.20) which will not start with nvidia GPU enabled.

Comment 24 Michael A. Milazzo 2019-05-20 15:30:47 UTC
Running startx in an SSH session will get you the segfault.  As Richard mentioned, trying to initiate a xRDP session hangs on a blue screen as Xserver cannot be contacted.  The errors we see in the logs are consistent whether you run startx or try to access the GUI through xRDP.

Comment 25 Yuxin Sun 2019-05-21 06:51:35 UTC
Thank you so much Richard and Michael!

I've tried with xRDP. If select Xvnc session it can work well, but if select Xorg session it hangs on a blue screen. After installed:
# rpm -ivh xorg-x11-server-common-1.20.4-7.el7_6.x86_64.rpm xorg-x11-server-Xorg-1.20.4-7.el7_6.x86_64.rpm --force
in https://people.redhat.com/ajackson/1665433/, then open xRDP with Xorg session it can work well.

Comment 26 HuijingHei 2019-05-21 08:41:48 UTC
Thanks for Yuxin's help!  Also try on rhel7.6 with GUP card in hyper-v 2016 host, the result is the same as #comment 25

Comment 27 John Jarvis 2019-05-23 17:28:18 UTC
Microsoft and the IHV have tested the proposed fix and confirmed it fixes the issues:
======================
 it does solve the xorg issue that the IHV was seeing.

Comment 28 Rick Barry 2019-05-28 14:29:18 UTC
John, the xorg-x11-server team may need this bug approved as as blocker in order to
get this into RHEL 7.7 at th is point.

Can you (or Jack) provide the business justification to support getting the blocker
approved?

Comment 29 John Jarvis 2019-05-28 14:36:08 UTC
Business justification is provided in https://bugzilla.redhat.com/show_bug.cgi?id=1665433#c7
====================
Business Justification: This is an issue that is blocking a vendor as they perform bring up of new hardware. This work will enable future scenarios in Azure that RedHat 7.6 will be unable to leverage without a fix.

Requesting a bump to urgent and flagging for z-stream review.

Comment 33 HuijingHei 2019-06-13 06:25:08 UTC
Pass through NVIDIA GRID K2 card to rhel7.7 vm, exec startx will not get Segmentation fault. Change status Verified.

Host: windows server 2016
Guest: gen2, xorg-x11-server-Xorg-1.20.4-7.el7.x86_64

Comment 34 Rick Barry 2019-06-25 13:45:09 UTC
Hi Adam,

Do you think a rhel-7.7 documentation/release-note is needed to describe any known issue related to the remaining part of the fix (7.7.z bug 1722524)?

Comment 36 errata-xmlrpc 2019-08-06 12:42:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2079


Note You need to log in before you can comment on or make changes to this bug.