Bug 2215894

Summary: [abrt] python3-pdfminer: main(): pdf2txt:312:main:BrokenPipeError: [Errno 32] Broken pipe
Product: [Fedora] Fedora Reporter: Mihai Lazarescu <mihai>
Component: python-pdfminerAssignee: Ben Beasley <code>
Status: ASSIGNED --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 38CC: code, mihai, python-packagers-sig
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/ccc1af0ff2970d42381ff99928011706d055faf
Whiteboard: abrt_hash:1a0d47567aafb3fafceff8663d8fc5642b7ffde8;VARIANT_ID=workstation;
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: os_info
none
File: mountinfo
none
File: open_fds
none
File: namespaces
none
File: cpuinfo
none
File: backtrace
none
File: environ none

Description Mihai Lazarescu 2023-06-19 10:43:07 UTC
Version-Release number of selected component:
python3-pdfminer-20221105-2.fc38

Additional info:
reporter:       libreport-2.17.10
kernel:         6.2.15-300.fc38.x86_64
cgroup:         0::/user.slice/user-1000.slice/user/app.slice/vte-spawn-b83e0764-7757-4222-adb9-2dcc1c72c98f.scope
uid:            1000
reason:         pdf2txt:312:main:BrokenPipeError: [Errno 32] Broken pipe
executable:     /usr/bin/pdf2txt
type:           Python3
package:        python3-pdfminer-20221105-2.fc38
runlevel:       N 5
exception_type: BrokenPipeError
crash_function: main
interpreter:    python3-3.11.3-2.fc38.x86_64
cmdline:        /usr/bin/python3 -sP /usr/bin/pdf2txt /home/user/latex/didactica/cursuri/elettronica-applicata/mtl-ttpu/exams/20230615/reports/graded/U09002_1622966.pdf

Truncated backtrace:
pdf2txt:312:main:BrokenPipeError: [Errno 32] Broken pipe

Traceback (most recent call last):
  File "/usr/bin/pdf2txt", line 317, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/bin/pdf2txt", line 312, in main
    outfp.close()
BrokenPipeError: [Errno 32] Broken pipe

Local variables in innermost frame:
args: None
parsed_args: Namespace(files=['/home/user/latex/didactica/cursuri/elettronica-applicata/mtl-ttpu/exams/20230615/reports/graded/U09002_1622966.pdf'], debug=False, disable_caching=False, page_numbers=None, pagenos=None, maxpages=0, password='', rotation=0, no_laparams=False, detect_vertical=False, line_overlap=0.5, char_margin=2.0, word_margin=0.1, line_margin=0.5, boxes_flow=0.5, all_texts=False, outfile='-', output_type='text', codec='utf-8', output_dir=None, layoutmode='normal', scale=1.0, strip_control=False, laparams=<LAParams: char_margin=2.0, line_margin=0.5, word_margin=0.1 all_texts=False>)
outfp: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>

Comment 1 Mihai Lazarescu 2023-06-19 10:43:11 UTC
Created attachment 1971536 [details]
File: os_info

Comment 2 Mihai Lazarescu 2023-06-19 10:43:12 UTC
Created attachment 1971537 [details]
File: mountinfo

Comment 3 Mihai Lazarescu 2023-06-19 10:43:14 UTC
Created attachment 1971538 [details]
File: open_fds

Comment 4 Mihai Lazarescu 2023-06-19 10:43:16 UTC
Created attachment 1971539 [details]
File: namespaces

Comment 5 Mihai Lazarescu 2023-06-19 10:43:17 UTC
Created attachment 1971540 [details]
File: cpuinfo

Comment 6 Mihai Lazarescu 2023-06-19 10:43:19 UTC
Created attachment 1971541 [details]
File: backtrace

Comment 7 Mihai Lazarescu 2023-06-19 10:43:21 UTC
Created attachment 1971542 [details]
File: environ

Comment 8 Ben Beasley 2023-06-19 15:03:41 UTC
This happens when the output of pdf2txt or dumppdf is directed to a pipe, but the pipe reader closes the pipe before the command has written the complete output (for example, because the pipe reader is the head command). I asked upstream if they wanted to handle this more cleanly in https://github.com/pdfminer/pdfminer.six/issues/875, but it’s been about seven months since the last upstream activity.

This duplicates an older report, bug 2186554, which you also reported. Since this bug report is public and the older one is private, I will mark the older bug as a duplicate of this one so the issue can be tracked more publicly.

Comment 9 Ben Beasley 2023-06-19 15:04:55 UTC
*** Bug 2186554 has been marked as a duplicate of this bug. ***

Comment 10 Ben Beasley 2023-06-28 13:25:07 UTC
*** Bug 2218093 has been marked as a duplicate of this bug. ***