2119762 – Evince does not use utf-8 in search strings

Bug 2119762 - Evince does not use utf-8 in search strings

Summary: Evince does not use utf-8 in search strings

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	evince
Sub Component:
Version:	37
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Marek Kašík
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F37FinalBlocker
TreeView+	depends on / blocked

Reported:	2022-08-19 10:53 UTC by Lukas Ruzicka
Modified:	2022-08-22 17:07 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2022-08-22 17:07:09 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
See how searches are treated. (515.67 KB, image/png) 2022-08-19 10:53 UTC, Lukas Ruzicka	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
GNOME Gitlab	GNOME evince issues 1839	0	None	opened	Evince seems not to use the utf-8 encoding in searches.	2022-08-19 10:54:11 UTC

Description Lukas Ruzicka 2022-08-19 10:53:44 UTC

Created attachment 1906523 [details]
See how searches are treated.

Description of problem:
Evince does not use utf-8 in search strings and therefore is unable find occurences in language using non-ascii characters.

Version-Release number of selected component (if applicable):
evince-43~alpha-4.fc37.x86_64 

How reproducible:
Always

Steps to Reproduce:

The latest version of Evince on Fedora seems not to be using utf-8 encoding in searches which limits the search possibilities in all languages that use non-ascii characters. The following examples are made on a Czech system.
Reproducer (you can see the illustration below):

* Open the search bar (Ctrl-F).
* Type řekla (meaning [she] said)
* Notice, that řekla has not been found and is indicated by red color.
* Notice, that if Julie is found instead, there is a occurence of řekla Julie (Julie said), however the leading character has not been correctly recognized and its representation in the search results is incorrect.
* In the text itself, all characters are correctly shown.
* When I copy the text using Ctrl-C, I am getting øekla Julie instead of řekla Julie.

I believe that the strings might not be treated as utf-8 in places lacking the correct characters. It would be nice if the application would be able to use correct encoding even in searches and copied out strings.

Actual results:
Incorrect search results for non-ascii languages.

Expected results:
Searches should be possible even for different characters.

Additional info:
Also reported upstream: https://gitlab.gnome.org/GNOME/evince/-/issues/1839

Comment 1 Fedora Blocker Bugs Application 2022-08-19 10:55:47 UTC

Proposed as a Blocker for 37-final by Fedora user lruzicka using the blocker tracking app because:

 I am proposing this to for a discussion about the problem being blockery in the scope of Basic Functionality.

Comment 2 Kamil Páral 2022-08-19 11:24:13 UTC

Let's have the conversation in upstream, so that we don't split it into several places. I added a comment there.

Comment 3 Adam Williamson 2022-08-22 17:07:09 UTC

Per upstream discussion, this turned out to be a bug in the PDF file, not in Evince. Acrobat also can't find the string.

Note You need to log in before you can comment on or make changes to this bug.