Bug 1227193 - Segfault in vmwgfx_dri.so
Summary: Segfault in vmwgfx_dri.so
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 22
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1222312 1260157 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-02 06:24 UTC by Sebastian Bergmann
Modified: 2016-02-15 15:36 UTC (History)
19 users (show)

Fixed In Version: 4.1.6-201.fc22
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-15 15:36:22 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Rework device initialization. Prerequisite for the next patch (27.25 KB, patch)
2015-08-27 11:40 UTC, Thomas Hellström
no flags Details | Diff
Allow render node functionality for dropped masters (1.94 KB, patch)
2015-08-27 11:44 UTC, Thomas Hellström
no flags Details | Diff
New version of the patch to enable render functionality for dropped masters (2.00 KB, patch)
2015-08-27 16:58 UTC, Thomas Hellström
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
GNOME Bugzilla 753531 0 Normal RESOLVED GNOME shell crashing on VMware 2020-02-11 07:00:18 UTC
Red Hat Bugzilla 1242458 0 unspecified CLOSED [abrt] WARNING: CPU: 3 PID: 660 at kernel/sched/core.c:7389 __might_sleep+0x7d/0x90() 2021-02-22 00:41:40 UTC

Internal Links: 1242458

Description Sebastian Bergmann 2015-06-02 06:24:22 UTC
Fresh install of Fedora 22 in VMWare Workstation 11.1 (Windows 7 host). About every ten minutes gnome-shell crashes:

[  171.763807] [drm:vmw_generic_ioctl [vmwgfx]] *ERROR* Dropped master trying to access ioctl that requires authentication.
[  171.763812] [drm] IOCTL ERROR Command 65, Error -13.
[  171.763878] [drm:vmw_generic_ioctl [vmwgfx]] *ERROR* Dropped master trying to access ioctl that requires authentication.
[  171.763880] [drm] IOCTL ERROR Command 65, Error -13.
[  171.763897] [drm:vmw_generic_ioctl [vmwgfx]] *ERROR* Dropped master trying to access ioctl that requires authentication.
[  171.763901] [drm] IOCTL ERROR Command 87, Error -13.
[  171.763906] show_signal_msg: 210 callbacks suppressed
[  171.763908] gnome-shell[1003]: segfault at 20 ip 00007f2a8b1aa130 sp 00007ffc468b9268 error 4 in vmwgfx_dri.so[7f2a8ac03000+7b7000]
[  174.085165] hrtimer: interrupt took 3471920 ns
[  174.159730] gdm[707]: segfault at 8 ip 00007fcdaf421a20 sp 00007fffa3a923b0 error 4 in gdm[7fcdaf406000+5c000]

Comment 1 Thomas Hellström 2015-06-22 14:52:48 UTC
From a quick look it looks like a process that previously had been master dropped master and tries to render without reclaiming mastership. (Is this gdm on mutter/wayland?)

That's a security problem and the vmwgfx driver catches that.
Unfortunately the dri driver doesn't seem to cope well with the error.

/Thomas

Comment 2 Sebastian Bergmann 2015-06-22 15:13:10 UTC
Default install of Fedora 22 (GNOME desktop).

Comment 3 Jimmy Jones 2015-06-23 18:43:01 UTC
I've been seeing this for a while, in bug 1222312 - looks as if I have similar errors in the log.

Comment 4 Jimmy Jones 2015-06-23 19:04:00 UTC
*** Bug 1222312 has been marked as a duplicate of this bug. ***

Comment 5 Jimmy Jones 2015-06-23 19:08:25 UTC
May 17 15:01:18 localhost.localdomain gnome-session[987]: VMware: vmw_ioctl_command error Permission denied.
May 17 15:01:18 localhost.localdomain kernel: [drm:vmw_generic_ioctl [vmwgfx]] *ERROR* Dropped master trying to access ioctl that requires authentication.
May 17 15:01:18 localhost.localdomain kernel: [drm] IOCTL ERROR Command 73, Error -13.
May 17 15:01:18 localhost.localdomain kernel: [drm:vmw_generic_ioctl [vmwgfx]] *ERROR* Dropped master trying to access ioctl that requires authentication.
May 17 15:01:18 localhost.localdomain kernel: [drm] IOCTL ERROR Command 76, Error -13.
May 17 15:01:18 localhost.localdomain gnome-session[987]: (gnome-shell:1015): Cogl-ERROR **: Out of memory
May 17 15:01:18 localhost.localdomain audit[1015]: <audit-1701> auid=4294967295 uid=42 gid=42 ses=4294967295 subj=system_u:system_r:xdm_t:s0-s0:c0.c1023 pid=1015 comm="gnome-shell" exe="/usr/bin/gnome-shell" sig=5
May 17 15:01:18 localhost.localdomain kernel: [drm:vmw_generic_ioctl [vmwgfx]] *ERROR* Dropped master trying to access ioctl that requires authentication.
May 17 15:01:18 localhost.localdomain kernel: [drm] IOCTL ERROR Command 73, Error -13.
May 17 15:01:18 localhost.localdomain kernel: do_trap: 15 callbacks suppressed
May 17 15:01:18 localhost.localdomain kernel: traps: gnome-shell[1015] trap int3 ip:7f2ae0a2ed3b sp:7fffe603e050 error:0
May 17 15:01:19 localhost.localdomain polkitd[700]: Unregistered Authentication Agent for unix-session:c1 (system bus name :1.16, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_GB.UTF-8) (disconnected from bus)
May 17 15:01:19 localhost.localdomain gnome-session[987]: Unrecoverable failure in required component gnome-shell-wayland.desktop
May 17 15:01:19 localhost.localdomain /usr/libexec/gdm-wayland-session[978]: Activating service name='ca.desrt.dconf'

Comment 6 Jimmy Jones 2015-06-23 19:10:13 UTC
If I remember rightly, GDM was using wayland, and I was in an X11 GNOME session.

Comment 7 Thomas Hellström 2015-06-23 19:27:59 UTC
While I think the main problem lies in Gnome-Shell (mutter/wayland), you could perhaps try the following to hopefully work around the problem:

https://fedoraproject.org/wiki/Documentation_Desktop_Beat

Comment 8 Sinclair Yeh 2015-06-30 21:50:00 UTC
I installed F22 and then did a "dnf upgrade", logged into the Gnome Classic session (vs Gnome Wayland).  So far I have not been able to reproduce this.  Are there any other steps I should be doing?

Comment 9 Sebastian Bergmann 2015-07-01 04:48:39 UTC
(In reply to Thomas Hellström from comment #7)
> While I think the main problem lies in Gnome-Shell (mutter/wayland), you
> could perhaps try the following to hopefully work around the problem:
> 
> https://fedoraproject.org/wiki/Documentation_Desktop_Beat

The workaround seems to work; the problem is gone since I disabled Wayland for GDM.

Comment 10 Sebastian Bergmann 2015-07-01 04:49:50 UTC
(In reply to Sinclair Yeh from comment #8)
> I installed F22 and then did a "dnf upgrade", logged into the Gnome Classic
> session (vs Gnome Wayland).  So far I have not been able to reproduce this. 
> Are there any other steps I should be doing?

Interesting, because that is how I got the errors.

Comment 11 Sinclair Yeh 2015-07-06 22:17:29 UTC
Hmm... I still cannot reproduce this issue.  Can you provide the following info?

1.  What is the kernel version in the VM?
2.  What is the host, e.g. Windows, Mac, or Linux?
3.  What is the Workstation (or Fusion) version?

Comment 12 Jimmy Jones 2015-07-07 20:25:35 UTC
Running VMware Player 6.0.7 on Windows 7, installed Fedora 22 from Live image, rebooted, ran dnf update. Left machine to it, after ~15 mins came back and had happened. Then rebooted with fresh updates and did dnf install elasticsearch from a terminal and loaded firefox and same thing again.

Did this just now, and log errors same as above. Xwayland also spinning at 100% CPU like my linked bug report.

Comment 13 Sebastian Bergmann 2015-07-11 07:53:11 UTC
(In reply to Sinclair Yeh from comment #11)
> 1.  What is the kernel version in the VM?

$ uname -a
Linux localhost.localdomain 4.0.7-300.fc22.x86_64 #1 SMP Mon Jun 29 22:15:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

> 2.  What is the host, e.g. Windows, Mac, or Linux?

Windows 7 Ultimate, 64-bit 6.1.7601, Service Pack 1

> 3.  What is the Workstation (or Fusion) version?

VMware Workstation 11.1.2 build-2780323

Comment 14 Christopher Meng 2015-07-13 12:33:11 UTC
Just reboot from crashed VM with the same symptom:

------------------------dmesg--------------------------

[   16.851764] CPU: 2 PID: 644 Comm: vmtoolsd Not tainted 4.2.0-0.rc1.git3.1.fc23.x86_64 #1
[   16.851765] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[   16.851766]  0000000000000000 00000000d0527beb ffff8800695dfac8 ffffffff818641f5
[   16.851768]  0000000000000000 ffff8800695dfb20 ffff8800695dfb08 ffffffff810ab446
[   16.851770]  ffff8800679ebe80 ffffffff81c63bc0 0000000000000061 0000000000000000
[   16.851772] Call Trace:
[   16.851775]  [<ffffffff818641f5>] dump_stack+0x4c/0x65
[   16.851778]  [<ffffffff810ab446>] warn_slowpath_common+0x86/0xc0
[   16.851779]  [<ffffffff810ab4d5>] warn_slowpath_fmt+0x55/0x70
[   16.851782]  [<ffffffff8112551d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[   16.851783]  [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
[   16.851784]  [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
[   16.851786]  [<ffffffff810da2bd>] __might_sleep+0x7d/0x90
[   16.851788]  [<ffffffff812163b3>] __might_fault+0x43/0xa0
[   16.851791]  [<ffffffff81430477>] copy_from_iter+0x87/0x2a0
[   16.851795]  [<ffffffffa03a960a>] __qp_memcpy_to_queue+0x9a/0x1b0 [vmw_vmci]
[   16.851797]  [<ffffffffa03a9740>] ? qp_memcpy_to_queue+0x20/0x20 [vmw_vmci]
[   16.851799]  [<ffffffffa03a9757>] qp_memcpy_to_queue_iov+0x17/0x20 [vmw_vmci]
[   16.851801]  [<ffffffffa03a9d50>] qp_enqueue_locked+0xa0/0x140 [vmw_vmci]
[   16.851803]  [<ffffffffa03aa93f>] vmci_qpair_enquev+0x4f/0xd0 [vmw_vmci]
[   16.851805]  [<ffffffffa047d7bb>] vmci_transport_stream_enqueue+0x1b/0x20 [vmw_vsock_vmci_transport]
[   16.851807]  [<ffffffffa0473e05>] vsock_stream_sendmsg+0x2c5/0x320 [vsock]
[   16.851808]  [<ffffffff810fabd0>] ? wake_atomic_t_function+0x70/0x70
[   16.851812]  [<ffffffff81702af8>] sock_sendmsg+0x38/0x50
[   16.851813]  [<ffffffff81702ff4>] SYSC_sendto+0x104/0x190
[   16.851816]  [<ffffffff8126e25a>] ? vfs_read+0x8a/0x140
[   16.851818]  [<ffffffff817042ee>] SyS_sendto+0xe/0x10
[   16.851820]  [<ffffffff8186d9ae>] entry_SYSCALL_64_fastpath+0x12/0x76
[   16.851821] ---[ end trace 26e121d00ee25d27 ]---

On:

Windows 10 10166 + VMWare 11.1.2-2780323 + Fedora 23 Rawhide(This still happen on Fedora 22! It has interrupted upgrading 3 times)

Comment 15 Gaith Taha 2015-08-14 10:18:59 UTC
I can reproduce this within 15min of every time I restart fc22 on 
VMware workstation 11.1.2 build-2780323
VMplayer 7.1.2 build-2780323
Running under Windows 8.1 

[root@gt log]# uname -a
Linux localhost 4.1.4-200.fc22.x86_64 #1 SMP Tue Aug 4 03:22:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux


The machine is fully up-to-date according to 'dnf update'

I have also seen Xwayland stuck at 100%; but not sure if this is the direct cause for every crash I have seen. 

Is there any log file you'd like me to include?

Comment 16 Jeff Kayser 2015-08-15 15:08:50 UTC
Hi, all.

I updated to kernel 4.1.4 yesterday, and the Gnome GUI has been stable since, knock on wood.

Comment 17 Jeff Kayser 2015-08-17 17:21:43 UTC
Hi, all.

False alarm.  I am still getting the Gnome issue with VMware Workstation 11 with kernel 4.1.4.  It seems to be happening less frequently, though.

Sorry.

Comment 18 Brent 2015-08-19 13:05:59 UTC
I get the same behavior on OSX.
Host info:
  OSX 10.10.4
  VMWare Fusion 7.1.2

VM Info:
   Fedora 22, fresh install.
   uname -a
      --Linux localhost.localdomain 4.1.4-200.fc22.x86_64 #1 SMP Tue Aug 4 03:22:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Comment 19 tom.bamford 2015-08-21 13:46:23 UTC
Same error here.

  OSX 10.10.4
  VMWare Fusion 7.1.2

  Linux 4.1.5-200.fc22.x86_64

Comment 20 Thomas Hellström 2015-08-25 09:47:58 UTC
What happens is the following,

1) systemd-logind drops drm master on behalf of the gnome-shell --gdm process.
2) gnome-shell --gdm apparently oblivious of this tries to render:

#13 0x00007f49ce33b2af in _cogl_journal_flush_modelview_and_entries (batch_start=<optimized out>, batch_len=4, 
    data=0x7ffd21333e00) at ./cogl-journal.c:314
#14 0x00007f49ce33b064 in _cogl_journal_flush_texcoord_vbo_offsets_and_entries (batch_start=0x3c88830, batch_len=4, 
    data=0x7ffd21333e00) at ./cogl-journal.c:565
---Type <return> to continue, or q <return> to quit--- 
#15 0x00007f49ce33a9a5 in batch_and_call (entries=<optimized out>, n_entries=<optimized out>, 
    can_batch_callback=0x7f49ce33b4a0 <compare_entry_layer_numbers>, 
    batch_callback=0x7f49ce33afc0 <_cogl_journal_flush_texcoord_vbo_offsets_and_entries>, data=0x7ffd21333e00)
    at ./cogl-journal.c:266
#16 0x00007f49ce33aef1 in _cogl_journal_flush_vbo_offsets_and_entries (batch_start=0x3c88830, batch_len=54, data=<optimized out>)
    at ./cogl-journal.c:672
#17 0x00007f49ce33c4e2 in _cogl_journal_flush (journal=0x2739010) at ./cogl-journal.c:1399
#18 0x00007f49ce33d5bc in _cogl_framebuffer_flush_journal (framebuffer=<optimized out>) at ./cogl-framebuffer.c:636
#19 0x00007f49ce311068 in cogl_flush () at ./cogl.c:321
#20 0x00007f49ce34028a in cogl_onscreen_swap_buffers_with_damage (onscreen=0x2718960, 
    rectangles=rectangles@entry=0x7ffd21333ff0, n_rectangles=n_rectangles@entry=1) at ./cogl-onscreen.c:312
#21 0x00007f49cec094df in clutter_stage_cogl_redraw (stage_window=0x2738090) at cogl/clutter-stage-cogl.c:637
#22 0x00007f49cec78317 in clutter_stage_do_redraw (stage=0x2730f70) at clutter-stage.c:1130
#23 _clutter_stage_do_update (stage=0x2730f70) at clutter-stage.c:1186
#24 0x00007f49cec5e7d9 in master_clock_update_stages (master_clock=0x7f49a8004e80, stages=0x3d3a5a0)
    at clutter-master-clock-default.c:437
#25 clutter_clock_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
    at clutter-master-clock-default.c:561
#26 0x00007f49cbd32a8a in g_main_dispatch (context=0x2593610) at gmain.c:3122
#27 g_main_context_dispatch (context=context@entry=0x2593610) at gmain.c:3737
#28 0x00007f49cbd32e20 in g_main_context_iterate (context=0x2593610, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<optimized out>) at gmain.c:3808
#29 0x00007f49cbd33142 in g_main_loop_run (loop=0x2764320) at gmain.c:4002
#30 0x00007f49cf87f766 in meta_run () at core/main.c:437
#31 0x00000000004023fd in main (argc=1, argv=0x7ffd21334398) at main.c:462

Depending on the operations in this render, gnome-shell may or may not segfault.

3) The user X server is intended to take over and grabs master.


Now it 2) If we were to allow this, we'd also allow a malicious unprivileged process to become drm master, authenticate itself, drop master and then sit around opening other processes and users exported buffer objects through drm_gem_open(), which is a bad security hole.

/Thomas

Comment 21 Sebastian Bergmann 2015-08-27 06:22:41 UTC
Problem persists with VMware Workstation Pro 12 (12.0.0 build-2985596) and a fresh install of Fedora 23.

Comment 22 Thomas Hellström 2015-08-27 11:40:50 UTC
Created attachment 1067691 [details]
Rework device initialization. Prerequisite for the next patch

Comment 23 Thomas Hellström 2015-08-27 11:44:11 UTC
Created attachment 1067692 [details]
Allow render node functionality for dropped masters

This patch is the patch that actually works around the problem. Will be sent to dri-devel for inclusion in 4.3.

Comment 24 Thomas Hellström 2015-08-27 11:49:35 UTC
OK, so I've added two patches that together appears to work around the problem on my side.

The first one is already queued in drm-next for linux 4.3. It's needed to block render-node buffers from entering VRAM when the master is gone, as buffers in VRAM may accidently get cleared by the device.

The second one will be posted on dri-devel shortly, and is the one that enables render-node functionality for dropped masters.

Owen, could you help reassigning so that these patches actually make it to the fedora kernel team?


Thanks,
Thomas

Comment 25 Josh Boyer 2015-08-27 13:53:46 UTC
(In reply to Thomas Hellström from comment #24)
> OK, so I've added two patches that together appears to work around the
> problem on my side.
> 
> The first one is already queued in drm-next for linux 4.3. It's needed to
> block render-node buffers from entering VRAM when the master is gone, as
> buffers in VRAM may accidently get cleared by the device.
> 
> The second one will be posted on dri-devel shortly, and is the one that
> enables render-node functionality for dropped masters.
> 
> Owen, could you help reassigning so that these patches actually make it to
> the fedora kernel team?

Thanks Thomas!  I can probably get these applied later today.

I do wonder if/how you plan to fix this in the upstream stable kernels though.  The second patch that works around the problem relies on the first, but that first patch seems unsuitable for stable.

josh

Comment 26 Thomas Hellström 2015-08-27 14:10:47 UTC
Hi Josh.

First a quick update, there is a forgotten return statement in patch #2, I'll send a v2 later today.

For the first patch, as you say I'm not sure we'll get it into stable. Perhaps I'll test it more extensively with 4.1.y and see if I can get it into that tree.

Thomas

Comment 27 Thomas Hellström 2015-08-27 16:58:04 UTC
Created attachment 1067837 [details]
New version of the patch to enable render functionality for dropped masters

A forgotten return statement was fixed.

Comment 28 Josh Boyer 2015-08-27 19:20:16 UTC
OK, both added to all Fedora branches.  If something changes based on upstream feedback, please let us know.

Comment 29 Sebastian Bergmann 2015-08-28 04:54:47 UTC
Good to see that is progress is being made. Does "added to all Fedora branches" mean that updated packages with the fix(es) will be released soon?

Comment 30 Fedora Update System 2015-09-01 14:59:39 UTC
kernel-4.2.0-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782

Comment 31 Fedora Update System 2015-09-01 20:21:58 UTC
kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782

Comment 32 Fedora Update System 2015-09-04 03:23:21 UTC
kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 33 Josh Boyer 2015-09-04 20:51:01 UTC
*** Bug 1260157 has been marked as a duplicate of this bug. ***

Comment 34 Fedora Update System 2015-09-05 01:03:30 UTC
kernel-4.1.6-201.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-15130

Comment 35 Jimmy Jones 2015-09-05 14:42:47 UTC
Just tried latest F23 and looks good, no crashes. Thanks Thomas, much appreciated!

Comment 36 Fedora Update System 2015-09-06 18:52:20 UTC
kernel-4.1.6-201.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-15130

Comment 37 Fedora Update System 2015-09-11 17:21:23 UTC
kernel-4.1.6-201.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 38 Fedora Update System 2015-09-15 17:35:50 UTC
kernel-4.1.7-100.fc21 has been submitted as an update to Fedora 21. https://bodhi.fedoraproject.org/updates/FEDORA-2015-15933

Comment 39 Fedora Update System 2015-09-17 01:02:21 UTC
kernel-4.1.7-100.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-15933

Comment 40 Fedora Update System 2015-09-23 00:20:38 UTC
kernel-4.1.7-100.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.