Bug 814915 - Cannot clean close jackd and client programs
Summary: Cannot clean close jackd and client programs
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: jack-audio-connection-kit
Version: 17
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Orcan Ogetbil
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-21 10:03 UTC by Guido Aulisi
Modified: 2013-08-01 06:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-01 06:07:18 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Guido Aulisi 2012-04-21 10:03:06 UTC
Description of problem:
Cannot close ardour and qjackctl after upgrade to jack-audio-connection-kit-1.9.8-8.fc17.x86_64 and kernel-3.3.2-1.fc17.x86_64. The previous version (sorry can't remember version number) worked fine with kernel-3.3.0-8.fc17.x86_64.

Version-Release number of selected component (if applicable):
1.9.8-8.fc17.x86_64

How reproducible:
Open a medium/big project with ardour 2.8.12; work on the project; try to close project or exit ardour; ardour hangs, you have to kill it with SIGTERM, qjackctl hangs, you have to kill with SIGTERM, jackd hangs I have to kill with SIGKILL

Steps to Reproduce:
1. Start jackd with qjackctl and ardour
2. Work this medium big project
3. Try to close ardour or qjackctl or jackd
  
Actual results:
Cannot close ardour, qjackctl, jackd

Expected results:
Can close ardour qjackctl jackd

Additional info:
I'm running jackd with RT priority 20 (not the new 70)
Kernel 3.3.2-1.fc17.x86_64

Comment 1 Orcan Ogetbil 2012-04-21 11:41:02 UTC
Interesting, As far as I remember, the only changes in the jack package between Fedora 16 and Fedora 17 are the priority related changes and the backport of the ffado runtime buffersize change patch. Ardour is the same. Since you are using he old priority, you are either hit by the above ffado related change, or this is happening outside jack.

What happens if you replace the above steps by
1. Start jackd with qjackctl and ardour
2. Open the medium big project
3. Immediately close ardour or qjackctl or jackd once the project is loaded

Comment 2 Guido Aulisi 2012-04-21 12:34:31 UTC
I did as you said. I managed to close the project soon 2 times. Then when I tried to reopen it, ardour, qjackctl and jackd hung.
Ardour is stuck in the window that report "Setup signal and plugins".
On qjackctl every button is not responding.

I never installed F16 on this system and I'm sure that previuos F17 versions of jackd and kernel did work ok.

By the way I installed F17 because I think the transition from F16 to F17 is hard because of /usr merge.

Comment 3 Orcan Ogetbil 2012-04-21 14:26:17 UTC
It is difficult for me to debug this without being able to reproduce. Do you get similar behavior when you launch other software that use jack, e.g. muse, hydrogen, audacity, qsynth, ... ? Also, just to be sure, did you add yourself to the "jackuser" group?

Comment 4 Guido Aulisi 2012-04-22 15:34:04 UTC
Yes, I'm in the jackuser group. I also tried to use the new priority (70).

I imagine your difficulty. I can work with ardour for a day without any problem, the problems are at starting or closing.

I'll try some other jack clients, I don't use hydrogen but I'll try it and also audacity. I'm also considering to compile jack by myself from git to see if something changes.

Thank for all

Comment 5 Guido Aulisi 2012-04-22 21:31:31 UTC
Some other info for this bug:

when jackd hangs I can strace its threads and I can see these calls cycling in one thread:

ioctl(9, SNDRV_SEQ_IOCTL_GET_QUEUE_STATUS, 0x7f3daaa78ce0) = 0
read(9, 0x202c6a0, 14000)               = -1 EAGAIN (Resource temporarily unavailable)
ioctl(9, SNDRV_SEQ_IOCTL_GET_QUEUE_STATUS, 0x7f3daaa78d30) = 0
write(8, " \0\0\0", 4)                  = -1 EAGAIN (Resource temporarily unavailable)
write(8, "\377\377\377\377", 4)         = -1 EAGAIN (Resource temporarily unavailable)
write(8, "\3\0\0\0", 4)                 = -1 EAGAIN (Resource temporarily unavailable)
write(8, "\0\0\0\0", 4)                 = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f3dadfe5000, FUTEX_WAKE, 1)    = 0
poll([{fd=6, events=POLLOUT|POLLERR|POLLNVAL}, {fd=7, events=POLLIN|POLLERR|POLLNVAL}], 2, 17413) = 2 ([{fd=6, revents=POLLOUT}, {fd=7, revents=POLLIN}])
ioctl(9, SNDRV_SEQ_IOCTL_GET_QUEUE_STATUS, 0x7f3daaa78ce0) = 0
read(9, 0x202c6a0, 14000)               = -1 EAGAIN (Resource temporarily unavailable)
ioctl(9, SNDRV_SEQ_IOCTL_GET_QUEUE_STATUS, 0x7f3daaa78d30) = 0
write(8, " \0\0\0", 4)                  = -1 EAGAIN (Resource temporarily unavailable)
write(8, "\377\377\377\377", 4)         = -1 EAGAIN (Resource temporarily unavailable)
write(8, "\3\0\0\0", 4)                 = -1 EAGAIN (Resource temporarily unavailable)
write(8, "\0\0\0\0", 4)                 = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f3dadfe5000, FUTEX_WAKE, 1)    = 0
poll([{fd=6, events=POLLOUT|POLLERR|POLLNVAL}, {fd=7, events=POLLIN|POLLERR|POLLNVAL}], 2, 17413) = 2 ([{fd=6, revents=POLLOUT}, {fd=7, revents=POLLIN}])

My main card is an RME Raydat. ASPM is disabled.

Output of lspci:

00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b5)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1c.6 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation Z68 Express Chipset Family LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
03:00.0 USB Controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
04:00.0 Multimedia audio controller: Xilinx Corporation RME Hammerfall DSP MADI (rev d3)
05:00.0 USB Controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
06:00.0 PCI bridge: ASMedia Technology Inc. Device 1080 (rev 01)

Kernel is kernel-3.3.2-8.fc17.x86_64

Comment 6 Guido Aulisi 2012-04-25 14:21:30 UTC
I downgraded jack to version jack-audio-connection-kit-1.9.8-3.fc16.x86_64 and it works fine, I can close ardour and qjackctl. Something between these 2 releases is causing the hang. I'll try to investigate further. By the way I tried to compile jack from git repository, but version 1.9.9 is not working with F17 ardour or qjackctl, I don't know why.

Comment 7 Orcan Ogetbil 2012-04-29 03:25:58 UTC
Please let us know if you find the root of the problem.

Comment 8 Guido Aulisi 2012-04-29 10:43:11 UTC
I'm trying to bisect revisions of jack, but testing takes some time, because I cannot always reproduce the issue, I have to open a medium ardour project, do some work and try to close it. Also I have to do it at home because at work I have a F16 system.

Jack version 1.9.8-6.fc17 x86_64 seems to work ok

I compiled it from
commit 6f32490dcc561d4cec08db16b57ba6f0ee9a57e6
in jack-audio-connection-kit git repository.

I'm using latest f17 updates from updates-testing as of april 29th.

Comment 9 Orcan Ogetbil 2012-04-29 15:49:31 UTC
The latest build 1.9.8-8.fc17 does not add anything new. It is just a rebuild against the latest ffado.

From 1.9.8-6.fc17 to 1.9.8-7.fc17, the changes in the RPM are:

1- Compile via -DJACK_32_64 RHBZ#803865 
2- Adjust rtprio limit to 70. Adjust jack default priority to 60. RHBZ#795094


In an effort to isolate the faulty change, I made a 1.9.8-6.5.fc17 build with only the first change above (-DJACK_32_64 flag)

http://koji.fedoraproject.org/koji/taskinfo?taskID=4033063

Could you try the RPMs from the above link and see how it goes?

Comment 10 Guido Aulisi 2012-05-01 13:26:34 UTC
I tried your version (1.9.8-6.5.fc17) and it works, while version 7 from
commit 3ccc2fda51346f669c4085d0f76c36b573ce246b

doesn't, so I think the problem is the priority change.

I'll try to read the source code, I can understand C coding a little, but I'm not a jack expert.

Comment 11 Guido Aulisi 2012-05-01 19:29:58 UTC
Now I can remember that I'm using a modified version of IR LV2 plugin, to compile against latest zita convolver library version 3. This plugin is used in all my projects, maybe this is causing the problems with jack priority. The patched version uses the priority and scheduler parameters, because zita convolver know wants to know the priority the thread is running at.

I'll try latest jack without ir plugin and let you know.

The original source is http://factorial.hu/system/files/ir.lv2-1.2.1.tar.gz

The patch is:

diff --git a/ir.cc b/ir.cc
index 64c1056..5b545bf 100644
--- a/ir.cc
+++ b/ir.cc
@@ -22,6 +22,7 @@
 #include <stdio.h>
 #include <math.h>
 #include <time.h>
+#include <sched.h>
 
 #include <glib.h>
 #include <gtk/gtk.h>
@@ -160,7 +161,7 @@ void free_conv_safely(Convproc * conv) {
                treq.tv_nsec = 10000000;
                nanosleep(&treq, &trem);
 
-               conv->check();
+               conv->check_stop();
                state = conv->state();
        }
        delete conv;    
@@ -466,6 +467,8 @@ void init_conv(IR * ir) {
 
        Convproc * conv;
        int req_to_use;
+     int sched_policy, return_value;
+     struct sched_param the_sched_param;
 
        if (!ir->ir_samples || !ir->ir_nfram || !ir->nchan) {
                return;
@@ -552,7 +555,15 @@ void init_conv(IR * ir) {
                        ir->nchan);
        }
 
-       conv->start_process(0);
+     sched_policy = sched_getscheduler(0);
+     if (sched_policy == -1)
+          printf("IR init_conv: error, unable to get sched policy\n");
+
+     return_value = sched_getparam(0, &the_sched_param);
+     if (return_value)
+          printf("IR init_conv: error, unable to get sched params\n");
+
+       conv->start_process(sched_policy, the_sched_param.sched_priority);
        ir->conv_req_to_use = req_to_use;
 }

Comment 12 Orcan Ogetbil 2012-05-02 02:48:09 UTC
Hmm okay, the pieces started to come together. We currently use the patch [1] in the jack package to increase the jack realtime priority to 60. At the same time, we set the realtime priority limit for the jackuser group to be 70 in limits config [2]. I believe your custom priority limit of 20 is not enough for jack to operate cleanly. You might want to modify the patch [1] for your needs and rebuild.

[1] http://pkgs.fedoraproject.org/gitweb/?p=jack-audio-connection-kit.git;a=blob;f=jack-realtime-compat.patch;h=9be69246bd03a283cf078c100efd16e7df9674f9;hb=HEAD

[2] /etc/sysconfig/limits.d/95-jack.conf

Comment 13 Guido Aulisi 2012-05-02 08:16:02 UTC
I don't think so, because your package version 1.9.8-6.5.fc17 did work correctly with IR plugin at priority 20. The system works well at priority 20. I'll test if IR plugin is the problem ASAP, and let you know. Also I'll modify my patch to check for errors in a better way.

Comment 14 Orcan Ogetbil 2012-05-02 12:26:11 UTC
Yes, but that's expected. The 1.9.8-6.5.fc17 version does not have the patch that raises the realtime priority for Fedora kernels to 60 (it does raise the realtime priority for rt kernels though, e.g. the PlanetCCRMA kernels). 

If 1.9.8-6.5.fc17 works with your realtime priority 20, then this consistent with my conjecture.

Comment 15 Brendan Jones 2012-05-02 15:35:58 UTC
Hi Guido,

I'm curious as to why you are patching lv2-ir the way you do. What is the issue you are having with lv2-ir? Is your host application running at a lower priority and need to bump it up?

I've currently got lv2-ir under review (bug 788717) so that's why i'm curious

Comment 16 Guido Aulisi 2012-05-02 15:55:29 UTC
Lv2 uses zita convolver, but now Fedora 17 ships version 3 of zita convolver. start_process has a new definition in version 3:

int start_process (int abspri, int policy);

while version 2 was (if I remeber well) without parameters.

With this patch I want to start convolver process with the same scheduler and priority IR is running. Maybe it needs some more testing, but it worked well till jack priority change

Further,  I'm patching IR to do correct work in freewheel mode, now zita convolver has the new function

int process (bool sync = false);

In freewheel mode sync should be true, so the convolver waits for all its threads to finish...

I hope you understand, sorry for my bad English...

Comment 17 Brendan Jones 2012-05-02 20:29:07 UTC
I understand the zita3 patch, you'll see I'm doing the same in the SRPM above, I'm just trying to understand why you are trying to change the priority of the plugin with respect to the convolver process.

By the way, your english is fine :)

Comment 18 Guido Aulisi 2012-05-03 08:20:03 UTC
I'm not trying to change priority, I'm trying to use the same priority and scheduler lv2-ir is running at. I only get scheduler and priority and pass them to start_process. This worked well till jack changed default priority.
If you read zita docs, they say that:

"In all other cases (other than BATCH), the
priority and scheduling class values _must_ be those of the
thread that will be calling Convproc::process(), and the
scheduling class _must_ be a real-time one (FIFO or RR)."

Now the only problem I have is that I can't close ardour in a clean manner, I have to kill jackd with SIGKILL and even qjackctl stops responfing. Sometimes I can't open Ardour projects, it hangs when it says "Setup signal flow and plugins".

I still have to try latest jack without patched lv2-ir, I'll do it ASAP.

I'm also trying to patch lv2-ir to make sync calls in freewheel mode, but it's more difficult, because I don't know how to tell when jack is in freewheel mode from the plugin view.

Comment 19 Guido Aulisi 2012-05-06 16:43:23 UTC
I can confirm that lv2-ir doesn't work well with new jack priority. I also tried Brendan Jones' patch but without any luck. I also tried to start zita convolver like this:

conv->start_process(10, SCHED_FIFO)

but it didn't work. When I try to close ardour it hangs, so does jack.
For now I have to use the old priorities.

Comment 20 Guido Aulisi 2012-07-17 18:49:54 UTC
It seems that latest build of jack (revision 9-fc17 Non-optimized build to workaround the compiler bug) solved this problem. I can now start and close ardour with lv2-ir without problem.

I will do more testing and post the results, but the problem seems to be gone.
Thanks

Comment 21 Guido Aulisi 2012-11-19 13:44:42 UTC
With jack-audio-connection-kit-1.9.8-10.fc17.x86_64 the problem seems to be here again. Somehow the optimized build breaks jack in an obscure way.

Comment 22 Guido Aulisi 2013-01-06 13:15:44 UTC
jack-audio-connection-kit-1.9.8-11.fc17.x86_64 seems to work fine, it was compiled with -O0.
There's similar bug in ardour mantis, maybe related to this: http://tracker.ardour.org/view.php?id=5257

Comment 23 Orcan Ogetbil 2013-01-06 15:11:21 UTC
I am glad the unoptimized build works for you. For a developer, these kind of bugs are extremely difficult to fix when one cannot reproduce. It is extra difficult for package maintainers (or anyone who did not write the code themselves).

Comment 24 Fedora End Of Life 2013-07-04 01:32:50 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 25 Fedora End Of Life 2013-08-01 06:07:23 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.