Bug 2190013

Summary: OpenCV 4.7.0 + -Wp,-D_GLIBCXX_ASSERTIONS seems to break DNN functionality
Product: [Fedora] Fedora Reporter: Jens Georg <mail>
Component: opencvAssignee: Nicolas Chauvet (kwizart) <kwizart>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 38CC: hhorak, jkucera, jridky, karlthered, klember, kwizart, sergio
Target Milestone: ---Keywords: Desktop, Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: opencv-4.7.0-9.fc39 opencv-4.7.0-9.fc38 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-22 02:26:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jens Georg 2023-04-26 19:33:20 UTC
Trying to use face recognition with Shotwell on Fedora 38, I get

/usr/include/c++/12/bits/stl_vector.h:1123: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = float; _Alloc = std::allocator<float>; reference = float&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
/usr/include/c++/12/bits/stl_vector.h:1123: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = float; _Alloc = std::allocator<float>; reference = float&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

This used to work on Fedora 37.

There is a similar ticket to OpenCV, coming from Arch: https://github.com/opencv/opencv/issues/23323

That ticket suggests undefining _GLIBCXX_ASSERTIONS - which I find highly suspicious tbh.

Reproducible: Always

Steps to Reproduce:
1. Compile Shotwell 0.32.0 from source with face detection enabled
2. Import an image with faces
3. Open that image
4. Click on faces
5. Click on "detect faces"
Actual Results:  
Shotwells face detect helper aborts with the assertions mentioned above

Expected Results:  
At least no assertions should happen there. ideally face detection should work as before...

Probably more of an upstream issue - like I mentioned, I find the proposed work-around highly suspicios

Comment 1 Nicolas Chauvet (kwizart) 2023-05-04 14:03:11 UTC
I'm not aware that fedora has opencv with face detection enabled, (patent issue), so I don't get why it would have worked unless a self compiled opencv...

Comment 2 Jens Georg 2023-05-04 16:28:18 UTC
The important part ist DNN, not the face recognition. That just uses the DNN part of OpenCV and that was definitely enabled in F37

Comment 3 Sergio Basto 2023-05-19 00:01:51 UTC
yeah but we don't ship Nonfree algorithms like SIFT and SURF and module/xfeatures2d

Following the upstream issue , I think we can undefined  _GLIBCXX_ASSERTIONS  to fix Shotwell as others did , until upstream fix the bug ...

Comment 4 Nicolas Chauvet (kwizart) 2023-06-12 12:08:37 UTC
scratch build (unofficial) of shotwell with face detect support in order to reproduce...

f38: https://koji.fedoraproject.org/koji/taskinfo?taskID=102054373
f37: https://koji.fedoraproject.org/koji/taskinfo?taskID=102055070
PR for shotwell https://src.fedoraproject.org/rpms/shotwell/pull-request/2

Comment 5 Nicolas Chauvet (kwizart) 2023-06-12 15:07:00 UTC
At least, I confirm that running shotwell with facedetect enabled seems to work with fc37 opencv-4.6.0
(I still need to run shotwell-facedetect manually).

$ /usr/libexec/shotwell/shotwell-facedetect
Attempting to upgrade batch norm layers using deprecated params: /usr/share/shotwell/facedetect/deploy.prototxt
Successfully upgraded batch norm layers using deprecated params.
[ INFO:0] global /builddir/build/BUILD/opencv-4.6.0/modules/core/src/parallel/registry_parallel.impl.hpp (96) ParallelBackendRegistry core(parallel): Enabled backends(2, sorted by priority): TBB(1000); OPENMP(990)
[ INFO:0] global /builddir/build/BUILD/opencv-4.6.0/modules/core/include/opencv2/core/parallel/backend/parallel_for.tbb.hpp (54) ParallelForBackend Initializing TBB parallel backend: TBB_INTERFACE_VERSION=11103
[ INFO:0] global /builddir/build/BUILD/opencv-4.6.0/modules/core/src/parallel/parallel.cpp (77) createParallelForAPI core(parallel): using backend: TBB (priority=1000)
[ INFO:0] global /builddir/build/BUILD/opencv-4.6.0/modules/core/src/ocl.cpp (1186) haveOpenCL Initialize OpenCL runtime...
[ INFO:0] global /builddir/build/BUILD/opencv-4.6.0/modules/core/src/ocl.cpp (1192) haveOpenCL OpenCL: found 1 platforms
[ INFO:0] global /builddir/build/BUILD/opencv-4.6.0/modules/core/src/ocl.cpp (984) getInitializedExecutionContext OpenCL: initializing thread execution context
[ INFO:0] global /builddir/build/BUILD/opencv-4.6.0/modules/core/src/ocl.cpp (994) getInitializedExecutionContext OpenCL: creating new execution context...
[ INFO:0] global /builddir/build/BUILD/opencv-4.6.0/modules/core/src/ocl.cpp (1012) getInitializedExecutionContext OpenCL: device=Quadro K620
[ INFO:0] global /builddir/build/BUILD/opencv-4.6.0/modules/core/src/ocl.cpp (5370) __init_buffer_pools OpenCL: Initializing buffer pool for context@0 with max capacity: poolSize=0 poolSizeHostPtr=0

Seems to use a TBB backend and OpenCL (via nvidia) with me...

Can you provide the same output ?

Comment 6 Sergio Basto 2023-06-12 15:16:41 UTC
BTW I did this PR https://src.fedoraproject.org/rpms/opencv/pull-request/22 which disable temporarily -Wp,-D_GLIBCXX_ASSERTIONS ,

Let me know if I can proceed .

Thank you

Comment 7 Jens Georg 2023-06-12 15:48:28 UTC
Sorry, what exactly do you need from me?

Comment 8 Jens Georg 2023-06-12 15:54:13 UTC
jgeorg@z400: ~/Source/shotwell [git:shotwell-0.32 $=] $ ./build/subprojects/shotwell-facedetect/shotwell-facedetect
Attempting to upgrade batch norm layers using deprecated params: /home/jgeorg/Source/shotwell/subprojects/shotwell-facedetect/deploy.prototxt
Successfully upgraded batch norm layers using deprecated params.
[ INFO:0] global registry_parallel.impl.hpp:96 ParallelBackendRegistry core(parallel): Enabled backends(2, sorted by priority): TBB(1000); OPENMP(990)
[ INFO:0] global parallel_for.tbb.hpp:54 ParallelForBackend Initializing TBB parallel backend: TBB_INTERFACE_VERSION=11103
[ INFO:0] global parallel.cpp:77 createParallelForAPI core(parallel): using backend: TBB (priority=1000)
[ INFO:0] global ocl.cpp:1186 haveOpenCL Initialize OpenCL runtime...
[ INFO:0] global ocl.cpp:1192 haveOpenCL OpenCL: found 1 platforms
[ INFO:0] global ocl.cpp:984 getInitializedExecutionContext OpenCL: initializing thread execution context
[ INFO:0] global ocl.cpp:994 getInitializedExecutionContext OpenCL: creating new execution context...
[ INFO:0] global ocl.cpp:1012 getInitializedExecutionContext OpenCL: device=NVIDIA GeForce GTX 1060 6GB
[ INFO:0] global ocl.cpp:5370 __init_buffer_pools OpenCL: Initializing buffer pool for context@0 with max capacity: poolSize=0 poolSizeHostPtr=0

** (shotwell-facedetect:87759): WARNING **: 17:53:23.593: Face recognition failed: OpenCV(4.7.0) /builddir/build/BUILD/opencv-4.7.0/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:147: error: (-215:Assertion failed) _FX_WINO_IBLOCK == 3 && _FX_WINO_KBLOCK == 4 in function '_fx_winograd_accum_f32'


** (shotwell-facedetect:87759): WARNING **: 17:53:23.595: Face recognition failed: OpenCV(4.7.0) /builddir/build/BUILD/opencv-4.7.0/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:147: error: (-215:Assertion failed) _FX_WINO_IBLOCK == 3 && _FX_WINO_KBLOCK == 4 in function '_fx_winograd_accum_f32'

malloc(): unaligned tcache chunk detected
Aborted (core dumped)

Comment 9 Jens Georg 2023-06-12 15:57:42 UTC
Not sure why this is now crashing different..

Comment 10 Jens Georg 2023-06-12 16:00:43 UTC
That seems to be https://github.com/opencv/opencv/pull/23112

Comment 11 Nicolas Chauvet (kwizart) 2023-06-12 16:38:48 UTC
Thanks for the hint. I've made a scratch build before submitting any real build (still building, it should take about 1 hour):
https://koji.fedoraproject.org/koji/taskinfo?taskID=102067629

Comment 12 Nicolas Chauvet (kwizart) 2023-06-12 18:09:49 UTC
Fixed patch application: https://koji.fedoraproject.org/koji/taskinfo?taskID=102070534

Comment 13 Fedora Update System 2023-06-12 20:20:36 UTC
FEDORA-2023-e01ec3ce94 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-e01ec3ce94

Comment 14 Fedora Update System 2023-06-12 20:25:07 UTC
FEDORA-2023-e01ec3ce94 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 15 Sergio Basto 2023-06-12 20:35:11 UTC
we should push the same fix to F38

Comment 16 Fedora Update System 2023-06-12 20:37:48 UTC
FEDORA-2023-79a0041426 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-79a0041426

Comment 17 Fedora Update System 2023-06-13 01:42:42 UTC
FEDORA-2023-79a0041426 has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-79a0041426`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-79a0041426

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2023-06-22 02:26:31 UTC
FEDORA-2023-79a0041426 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 19 Fedora Update System 2023-07-23 11:13:55 UTC
FEDORA-2023-c8fa60873d has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-c8fa60873d

Comment 20 Fedora Update System 2023-07-23 11:15:00 UTC
FEDORA-2023-c8fa60873d has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.