Bug 1312551 - [abrt] ardour4: _xgetbv(): ardour-4.6.387 killed by SIGILL
[abrt] ardour4: _xgetbv(): ardour-4.6.387 killed by SIGILL
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: ardour4 (Show other bugs)
23
i686 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Nils Philippsen
Fedora Extras Quality Assurance
https://retrace.fedoraproject.org/faf...
abrt_hash:957ab2c752940920d19097d86d6...
:
: 1289351 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-27 04:57 EST by Jérôme Audu
Modified: 2016-03-26 14:17 EDT (History)
5 users (show)

See Also:
Fixed In Version: ardour4-4.7.0-2.fc23 ardour4-4.7.0-2.fc22 ardour4-4.7.0-2.fc24
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-17 16:51:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
File: backtrace (58.09 KB, text/plain)
2016-02-27 04:57 EST, Jérôme Audu
no flags Details
File: cgroup (187 bytes, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: core_backtrace (1.71 KB, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: dso_list (11.97 KB, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: environ (3.25 KB, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: exploitable (187 bytes, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: limits (1.29 KB, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: maps (38.99 KB, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: mountinfo (3.13 KB, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: namespaces (85 bytes, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: open_fds (187 bytes, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: proc_pid_status (916 bytes, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
File: var_log_messages (272 bytes, text/plain)
2016-02-27 04:58 EST, Jérôme Audu
no flags Details
Replace __cpuid() function + extra debug & check (1.92 KB, patch)
2016-03-06 10:08 EST, Jérôme Audu
no flags Details | Diff

  None (edit)
Description Jérôme Audu 2016-02-27 04:57:52 EST
Description of problem:
Just start Ardour4 on my old i686 PC (Centrino Duo)...
=> exit with instruction error (when checking FPU, specific ASM instruction)

 ../libs/pbd/fpu.cc
line 93:             __asm__ volatile ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (xcr));
 
If I start Ardour4 using "ardour4 --no-hw-optimizations", then it's ok

Version-Release number of selected component:
ardour4-4.7.0-1.fc23

Additional info:
reporter:       libreport-2.6.4
backtrace_rating: 4
cmdline:        /usr/lib/ardour4/ardour-4.6.387
crash_function: _xgetbv
executable:     /usr/lib/ardour4/ardour-4.6.387
global_pid:     7259
kernel:         4.3.5-300.fc23.i686
runlevel:       N 5
type:           CCpp
uid:            1000

Truncated backtrace:
Thread no. 1 (5 frames)
 #0 _xgetbv at ../libs/pbd/fpu.cc:93
 #1 PBD::FPU::FPU at ../libs/pbd/fpu.cc:182
 #2 PBD::FPU::instance at ../libs/pbd/fpu.cc:126
 #3 setup_hardware_optimization at ../libs/ardour/globals.cc:167
 #4 ARDOUR::init at ../libs/ardour/globals.cc:505
Comment 1 Jérôme Audu 2016-02-27 04:57:58 EST
Created attachment 1131030 [details]
File: backtrace
Comment 2 Jérôme Audu 2016-02-27 04:58:00 EST
Created attachment 1131031 [details]
File: cgroup
Comment 3 Jérôme Audu 2016-02-27 04:58:01 EST
Created attachment 1131032 [details]
File: core_backtrace
Comment 4 Jérôme Audu 2016-02-27 04:58:02 EST
Created attachment 1131033 [details]
File: dso_list
Comment 5 Jérôme Audu 2016-02-27 04:58:04 EST
Created attachment 1131034 [details]
File: environ
Comment 6 Jérôme Audu 2016-02-27 04:58:05 EST
Created attachment 1131035 [details]
File: exploitable
Comment 7 Jérôme Audu 2016-02-27 04:58:06 EST
Created attachment 1131036 [details]
File: limits
Comment 8 Jérôme Audu 2016-02-27 04:58:08 EST
Created attachment 1131037 [details]
File: maps
Comment 9 Jérôme Audu 2016-02-27 04:58:10 EST
Created attachment 1131038 [details]
File: mountinfo
Comment 10 Jérôme Audu 2016-02-27 04:58:11 EST
Created attachment 1131039 [details]
File: namespaces
Comment 11 Jérôme Audu 2016-02-27 04:58:12 EST
Created attachment 1131040 [details]
File: open_fds
Comment 12 Jérôme Audu 2016-02-27 04:58:13 EST
Created attachment 1131041 [details]
File: proc_pid_status
Comment 13 Jérôme Audu 2016-02-27 04:58:14 EST
Created attachment 1131042 [details]
File: var_log_messages
Comment 14 Jérôme Audu 2016-02-27 05:09:15 EST
This crash is not present on x86_64 !
Comment 15 Jérôme Audu 2016-02-27 15:12:44 EST
This is same bug filled in BZ#1289351

In fact use "--no-hw-optimizations" help to not crash at the startup,
but like on BZ#1289351, but abort later just after the first time configuration screens


Seem that "xgetbv" is part of the AVX instruction set... which is not present on i686 HW....
Comment 16 Nils Philippsen 2016-03-04 03:49:18 EST
*** Bug 1289351 has been marked as a duplicate of this bug. ***
Comment 17 Nils Philippsen 2016-03-04 03:53:50 EST
What stumps me a bit is that the caller of _xgetbv(), FPU::FPU() only calls it after checking with cpuid() that the processor should support the xgetbv instruction:

        __cpuid (cpu_info, 1);

        if ((cpu_info[2] & (1<<27)) /* OSXSAVE */ &&
            (cpu_info[2] & (1<<28) /* AVX */) &&
            ((_xgetbv (_XCR_XFEATURE_ENABLED_MASK) & 0x6) == 0x6)) { /* OS really supports XSAVE */
            info << _("AVX-capable processor") << endmsg;
            _flags = Flags (_flags | (HasAVX) );
        }
Comment 18 Nils Philippsen 2016-03-04 04:05:39 EST
Jérôme, as in the other bug report, cpu_string and cpu_vendor in PBD::FPU::FPU() looks garbled:

        cpu_string = "ntel\320\214Y\267ineI\tS\374\264\004\000\000\000\200\227\021\265\270\032\330\266$\006\213\277Ot\374\264\213\230q\267\000\000\337\266\220\202\245\n"
        cpu_vendor = "ntel\320\214Y\267ineI"

For comparison, here's how they look like on my (64-bit) laptop:

(gdb) print cpu_string
$2 = "GenuineIntel\377\177\000\000\000\020\000\000\000\000\000\000@\207B\365\377\177\000\000h/\016\001\000\000\000\000z/\016\001\000\000\000"
(gdb) print cpu_vendor
$3 = "GenuineIntel"
(gdb)

Can you tell me the exact CPU model of the machine ("model name" in /proc/cpuinfo)?
Comment 19 Nils Philippsen 2016-03-04 11:59:16 EST
Both of you: would you please download and try the binary version from ardour.org? I mean the 32-bit Linux "demo" version, just to check if it starts up correctly, and what messages it prints.
Comment 20 Jérôme Audu 2016-03-04 16:10:40 EST
(In reply to Nils Philippsen from comment #18)
> Can you tell me the exact CPU model of the machine ("model name" in
> /proc/cpuinfo)?

I still have 2 laptops with F23-i386 where this crash append:

 1 - ACER Aspire One: Intel(R) Atom(TM) CPU N270   @ 1.60GHz
     cpu family : 6
     model      : 28
     flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 xtpr pdcm movbe lahf_lm dtherm

 2 - Centrino Duo: Genuine Intel(R) CPU T2300  @ 1.66GHz
     cpu family : 6
     model      : 14
     flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc arch_perfmon bts aperfmperf pni monitor est tm2 xtpr pdcm dtherm


I will also check (using debugger) your comment #17
Comment 21 Jérôme Audu 2016-03-04 16:37:11 EST
(In reply to Nils Philippsen from comment #17)
> What stumps me a bit is that the caller of _xgetbv(), FPU::FPU() only calls
> it after checking with cpuid() that the processor should support the xgetbv
> instruction:
> 
>         __cpuid (cpu_info, 1);
> 
>         if ((cpu_info[2] & (1<<27)) /* OSXSAVE */ &&
>             (cpu_info[2] & (1<<28) /* AVX */) &&
>             ((_xgetbv (_XCR_XFEATURE_ENABLED_MASK) & 0x6) == 0x6)) { /* OS
> really supports XSAVE */
>             info << _("AVX-capable processor") << endmsg;
>             _flags = Flags (_flags | (HasAVX) );
>         }

yes, but "xgetbv" is part of AVX instruction set, which is not supported by my HW.
And the code may not check it before execute "_xgetbv"
 Code: "if ( A && B && C)", Compiler may decide to execute C before A or B...
 => this is what is appening on my HW.
  => I've just put a breakpoint before, then force to jump to next test (HasSSE), then no more crash !

I would like to recompile using something like: 
  if (A && B) { if (C) { set HasAVX } }
Comment 22 Jérôme Audu 2016-03-05 10:36:42 EST
(In reply to Nils Philippsen from comment #19)
> Both of you: would you please download and try the binary version from
> ardour.org? I mean the 32-bit Linux "demo" version, just to check if it
> starts up correctly, and what messages it prints.

Demo version 4.7 start without crash...

$ /usr/local/bin/Ardour4 
bind txt domain [gtk2_ardour4] to /opt/Ardour-4.7.0/share/locale
Ardour4.7.0 (built using 4.7 and GCC version 4.4.7)
ardour: [INFO]: Your system is configured to limit Ardour to only 4096 open files
ardour: [INFO]: Loading system configuration file /opt/Ardour-4.7.0/etc/system_config
Loading user configuration file /home/la/.config/ardour4/config
CPU vendor: ntel	�ineI
ardour: [INFO]: CPU brand: 	�Genuine Inte	� CPU        	�2300  @ 1.66
ardour: [INFO]: No H/W specific optimizations in use
ardour: [INFO]: Loading default ui configuration file /opt/Ardour-4.7.0/etc/default_ui_config
Comment 23 Jérôme Audu 2016-03-06 10:04:51 EST
After more investigation, seem that current "__cpuid()" (Ardour4 / FPU.cc) implementation is the main issue
=> it does not report "right" value when GCC 5.3.1 optimisation is > O0
   I've make a small piece of code to show that (based on https://gist.github.com/hi2p-perim/7855506) using ardour4 "__cpuid()"

Now, If I just the __cpuid() from https://gist.github.com/hi2p-perim/7855506 in ardour4 (FPU.cc), then no more corrupt "CPU vendor" string, and AVX report the right value, then, no more crash !
Comment 24 Jérôme Audu 2016-03-06 10:08 EST
Created attachment 1133531 [details]
Replace __cpuid() function + extra debug & check

Patch which replace __cpuid() and also add extra debug + check only if AVX instruction is supported
Comment 25 Nils Philippsen 2016-03-08 05:50:24 EST
(In reply to Jérôme Audu from comment #21)
> And the code may not check it before execute "_xgetbv"
>  Code: "if ( A && B && C)", Compiler may decide to execute C before A or B...

No, thankfully logical AND is left-right-associative and we can depend on it ;).

(In reply to Jérôme Audu from comment #23)
> After more investigation, seem that current "__cpuid()" (Ardour4 / FPU.cc)
> implementation is the main issue
> => it does not report "right" value when GCC 5.3.1 optimisation is > O0

Yeah, that's what Florian Weimer and I found, too. The patch you linked/attached may clobber the upper half of %ebx on x86_64 however, so I'll use Florian's version ;).
Comment 26 Florian Weimer 2016-03-08 05:55:59 EST
(In reply to Nils Philippsen from comment #25)
> Yeah, that's what Florian Weimer and I found, too. The patch you
> linked/attached may clobber the upper half of %ebx on x86_64 however, so

The upper half of %rbx, to be precise.
Comment 27 Nils Philippsen 2016-03-08 06:06:52 EST
Uhm yes. :)

Anyway, here's the PR on github: https://github.com/Ardour/ardour/pull/218
Comment 28 Fedora Update System 2016-03-09 04:15:17 EST
ardour4-4.7.0-2.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-d74aac213b
Comment 29 Fedora Update System 2016-03-09 04:15:25 EST
ardour4-4.7.0-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-d571c1869c
Comment 30 Fedora Update System 2016-03-09 04:15:30 EST
ardour4-4.7.0-2.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-497828eaf0
Comment 31 Jérôme Audu 2016-03-09 15:35:39 EST
Just tested on my 2xi386 machines, and both are working using ardour4-4.7.0-2.fc23.
No more corrupt "CPU Vendor" string @boot, and no crash!
I've X-check about ( A && B && C) and you are 100% right! :)
Comment 32 Fedora Update System 2016-03-09 16:23:22 EST
ardour4-4.7.0-2.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-497828eaf0
Comment 33 Fedora Update System 2016-03-09 17:55:06 EST
ardour4-4.7.0-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-d74aac213b
Comment 34 Fedora Update System 2016-03-09 20:56:24 EST
ardour4-4.7.0-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-d571c1869c
Comment 35 Fedora Update System 2016-03-17 16:51:05 EDT
ardour4-4.7.0-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
Comment 36 Fedora Update System 2016-03-17 17:19:40 EDT
ardour4-4.7.0-2.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
Comment 37 Kevin Perros 2016-03-18 06:41:55 EDT
Hi,
I had the same problem of CPUID or whatever, with 32 bits packages. I solved it by downloading the source RPM and re-build it. 
It was a compilation time problem in the set of config scripts, that enabled optimizations for the build machine CPU even if forced not to do so. The config scripts autodetects the vectorization instructions to use.
At the end, with my core 2 6550 (MMX, SSE, SSE2, SSE3, SSSE3), I experienced illegal instructions crashes.

I try the new packages.
Comment 38 Kevin Perros 2016-03-18 06:44:37 EDT
It works fine now, thank you.
Kevin
Comment 39 Nils Philippsen 2016-03-18 15:02:47 EDT
It's supposed to detect at runtime what set of instructions to use, that check was broken on i386.
Comment 40 Kevin Perros 2016-03-18 15:45:12 EDT
Ooops
I checked what I did and indeed, I actually had to tweak build/config scripts to make it work. Didn't remember that. I replaced by hand the gcc parameters in the wcsript so as to have more or less only -O3 -march=core2. My that disabled the build of SSE4 code and bypassed the bug you solved.

Nevertheless, thanks a lot for the fix.
Comment 41 Fedora Update System 2016-03-26 14:17:13 EDT
ardour4-4.7.0-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.