Bug 1205880
| Summary: | vlc dlopen hang with 2.21.90-8 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Yanko Kaneti <yaneti> | ||||
| Component: | glibc | Assignee: | Carlos O'Donell <codonell> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | rawhide | CC: | arjun, codonell, fweimer, jakub, law, pfrankli, spoyarek, vdanjean.ml | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-05-07 21:35:27 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
The process backtrace doesn't provide any really useful information. It could just be that that the threads have hit a pre-existing race condition. The next step is for someone to look through the difference in the source trees between -7 and -8. Have you seen any interesting differences that might cause the problem you're seeing? Actually I noticed something I missed, the first thread has called dlopen recursively by invoking dlopen in a constructor, followed by calling non-async-signal-safe functions. The libavcodec_plugin.so needs to be rearchitected to avoid initializing OpenCL's during startup. This has to be done at some later time after the plugin itself has been fully loaded. To be clear: - Recrusive calls to dlopen at present are only allowed to call async-signal safe functions. This is violated by having libavcodec_plugin attempt to load and initialize the OpenCL runtime in a constructor (during dlopen), which then calls dlopen itself. More investigation is needed if this can be fixed upstream. FWIW this is still happening in -9 compared to -7 the libavcodec_plugin uses ffmpeg which itself does something with OpenCL, while your suggestion of re-architecture of this tangled web might be feasible, I am not the right person to investigate it. Its interesting to me what specific change brought this about, and how it might affect other software... JFTR this is still happening the same way with -11 (In reply to Yanko Kaneti from comment #3) > FWIW this is still happening in -9 compared to -7 > > the libavcodec_plugin uses ffmpeg which itself does something with OpenCL, > while your suggestion of re-architecture of this tangled web might be > feasible, I am not the right person to investigate it. > > Its interesting to me what specific change brought this about, and how it > might affect other software... My opinion is that this is a fundamental design flaw in ocl-icd. It must not use dlopen from a constructor that runs when it itself is being loaded. It must use late binding or not support shared linkage. *** This bug has been marked as a duplicate of bug 1219646 *** Just for the record (the info is already in the other (duplicated) bug report), the bug is still there after removing the ocl-icd constructor. The bug appears when ocl-icd, in a regular exported function, calls dlopen() on a library that uses pthread_join() in its constructor. |
Created attachment 1006467 [details] pstack Description of problem: After the update in rawhide to glibc-2.21.90-8 vlc's threads doing dlopen module loading deadlock when attempting to play some files e.g. an *mp4 video. The thread resonsible for the UI works. Downgrading glibc to 2.21.90-7 fixes the problem Version-Release number of selected component (if applicable): glibc-2.21.90-8.fc23.x86_64 How reproducible: Always Attaching a pstack at the time of the lock