From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1 Description of problem: If you search in a PDF file created with 'pdflatex' from a LaTeX source file for a string containing an underscore character ('_'), the search will fail. The PDF view can be 'xpdf' or Adobe Acroread - the result is the same. Version-Release number of selected component (if applicable): tetex-latex-2.0.2-21.3 How reproducible: Always Steps to Reproduce: 1. Create a LaTeX source file containing an underscore. 2. Run, "pdflatex file.tex" (up to 3 times as necessary). 3. Run, "xpdf file.pdf". 4. Search for a string containing an underscore character by pressing 'f' key and entering the search string. 5. Press return. Actual Results: Nothing - the string was not found. Expected Results: xpdf should have found the string and highlighted it. Additional info: To recreate the problem, put the 4 lines below in a file called "file.tex", and follow the steps above: \documentclass[12pt]{article} \begin{document} hello\_world. \end{document} ______________ If you use this document, you can search for "hello" and this will be found. You can search for "world" and this will be found. However, if you search for "hello_world", this will *NOT* be found. I initially suspected a problem with xpdf, however, I now believe the problem is with the pdflatex command since I downloaded a PDF from http://www.w3c.org and searched for underscores and these _are_ found by xpdf. This is what I did: 1. curl -O http://www.w3.org/TR/html401/html40.pdf.gz 2. gunzip html40.pdf.gz 3. xpdf html40.pdf 4. Search for string "section_2" by typing 'f' and then typing "section_2" followed by return. 5. The string will be found on page 20 in section, "2.1.2 Fragment identifiers". I then repeated the steps above using Adobe Acroread ("rpm -q acroread" shows, "acroread-5.07-2"). Again, the string was found. Note: "pdfinfo file.pdf" returns: Creator: TeX Producer: pdfTeX-1.10b CreationDate: Tue Apr 5 11:47:00 2005 Tagged: no Pages: 1 Encrypted: no Page size: 595.276 x 841.89 pts (A4) File size: 6919 bytes Optimized: no PDF version: 1.4 ...whilst "pdfinfo html40.pdf" shows: Title: HTML 4.01 Specification Subject: Keywords: Author: Creator: html2ps version 1.0 beta2 patched by Arnaud Le Hors 19990806 Producer: GNU Ghostscript 5.10 CreationDate: Fri Dec 24 18:35:43 1999 Tagged: no Pages: 389 Encrypted: no Page size: 612 x 792 pts (letter) File size: 3009579 bytes Optimized: no PDF version: 1.2 Is the PDF version relevant I wonder? Is pdflatex not generating correct PDF version 1.4 output??? This bug is a major irritant as I've got some very large PDF documents that have a lot of underscores in them and it's a real pain having to scan them by hand to find the sections I want.
Fedora Core 3 is now maintained by the Fedora Legacy project for security updates only. If this problem is a security issue, please reopen and reassign to the Fedora Legacy product. If it is not a security issue and hasn't been resolved in the current FC5 updates or in the FC6 test release, reopen and change the version to match. Thank you!
Yep, it's still a problem. Here are the current versions of my PDF viewers: xpdf-3.01-12.1 gpdf-2.8.2-4.2 kdegraphics-3.5.3-0.2.fc5 (kpdf) All 3 pdf readers suffer from the same problem: - Search for "hello" - finds it - Search for "world" - finds it - Search for "hello_world" - doesn't find it - Search for "_" - doesn't find it.
The problem is that teTeX renders underscore like graphics and not a letter so that one couldn't search for underscore directly. Note that even pdftotext outputs a space character instead of underscore so that it's not visible to other pdf viewing utilities as well.
I have an update on this issue, Quoting Karl Berry from the Mac-Tex user group: > the TeX engine generates a weird graphic rather than using > the underscore character You are correct about that (except I wouldn't call it "weird"). The standard definition of \_ is \def\_{\leavevmode \kern.06em \vbox{\hrule width.3em}} > (maybe for a good reason). Yes, the reason is that it would have been crazy for Knuth to waste a precious slot in the original 1980s fonts (limited to 128 chars) on a character that could perfectly well be created by a rule. The answer is, don't use \_. Instead, put your address in \tt and use the actual _ character. In plain TeX: $<${\tt first\char`\_last}$>$ Then the _ will be pastable (and the output will look better, too). I'm not sure if you're using LaTeX. If you are, and you load url or hyperref, you'll have a command \url that will let you type it without the extra \char sequence: \url{first_last} (And you'll get better line breaking behavior, too.) Of course a personal definition could be made to do the same thing with plain. Similar things could be done with other fonts that provide an _ character if you want something other than typewriter, but I don't have recipes at hand. Almost everything besides Knuth's original cm* fonts does have an _ character. If you feel like reposting this to any of the bug systems, feel free. Hope this helps. ------ I have a recipe as mentioned above to change to use something other than the default CM font. This recipe was contributed by Herb Schulz, also from MacTex support: Try using the Latin Modern font with T1 encoding; that font is an updated design of CM with more characters and built-in, rather than constructed, accented characters. To use the Latin Modern font with T1 encoding add the lines \usepackage{lmodern} \usepackage[T1]{fontenc} \usepackage{textcomp} to your preamble. ------