Description of problem: I was downloading the site https://www.randstad.nl/vacatures/1562519/medewerker-data-services which is a job opening. The command as "httrack https://www.randstad.nl/vacatures/1562519/medewerker-data-services" without options. Version-Release number of selected component: httrack-3.48.17-1.fc20 Additional info: reporter: libreport-2.2.3 backtrace_rating: 4 cmdline: httrack https://www.randstad.nl/vacatures/1562519/medewerker-data-services crash_function: abortf_ executable: /usr/bin/httrack kernel: 3.15.5-200.fc20.x86_64 runlevel: N 5 type: CCpp uid: 1000 Truncated backtrace: Thread no. 1 (10 frames) #2 abortf_ at htssafe.h:100 #3 fil_normalized at htslib.c:3458 #4 key_adrfil_hashes_generic at htshash.c:121 #5 key_adrfil_hashes at htshash.c:203 #6 coucal_calc_hashes at coucal.c:482 #7 coucal_fetch_value at coucal.c:1212 #8 coucal_read_value at coucal.c:1218 #9 coucal_read at coucal.c:1171 #10 hash_read at htshash.c:304 #11 hts_acceptlink_ at htswizard.c:153
Created attachment 921437 [details] File: backtrace
Created attachment 921438 [details] File: cgroup
Created attachment 921439 [details] File: core_backtrace
Created attachment 921440 [details] File: dso_list
Created attachment 921441 [details] File: environ
Created attachment 921442 [details] File: limits
Created attachment 921443 [details] File: maps
Created attachment 921444 [details] File: open_fds
Created attachment 921445 [details] File: proc_pid_status
Created attachment 921446 [details] File: var_log_messages
Can't reproduce the issue with the given URL - do you have the crash each time you attempt to crawl this URL ?
I just reproduced it. Copy the output here: [gbonnema@mahatma 2-site-httrack]$ httrack https://www.randstad.nl/vacatures/1562519/medewerker-data-services Mirror launched on Mon, 28 Jul 2014 21:08:17 by HTTrack Website Copier/3.48-17 [XR&CO'2014] mirroring https://www.randstad.nl/vacatures/1562519/medewerker-data-services with the wizard help.. strlen(copyBuff) == qLen failed at htslib.c:3458edewerker-data-services (22995 bytes) - OK Aborted (core dumped) [gbonnema@mahatma 2-site-httrack]$ I include a copy of the hts-log.txt: HTTrack3.48-17 launched on Mon, 28 Jul 2014 21:08:17 at https://www.randstad.nl/vacatures/1562519/medewerker-data-services 2 (httrack https://www.randstad.nl/vacatures/1562519/medewerker-data-services ) 3 4 Information, Warnings and Errors reported for this mirror: 5 note: the hts-log.txt file, and hts-cache folder, may contain sensitive information, 6 such as username/password authentication for websites mirrored in this project 7 do not share these files/folders if you want these information to remain private 8 9 21:08:18 Warning: Note: due to https://www.randstad.nl remote robots.txt rules, links beginning with these path will be forbidden: /vacatures?*, /werknemers/intern/, /klm/, /mwp2/faces/confidential/inschrijven.jspx, /ldo/, *.pd f, *.doc, *.docx, *.xls, *.xlsx, *.ppt, *.pptx, /content-snippets/, /system/, /admin/, /roxen-files/, /ifar/, /werknemers/duurzame-inzetbaarheid-secure.html, /unilever (see in the options to disable this) ~ Let me know if this is enough info. I am perfectly willing to do other tests. Kind regards, Guus.
Could it be caused by the "forbidden: /vacatures?*, ....." ? part on the last line?
I am running fedora 20 (updated). The output from gdb as requested: (gdb) set args https://www.randstad.nl/vacatures/1562519/medewerker-data-services (gdb) run Starting program: /usr/bin/httrack https://www.randstad.nl/vacatures/1562519/medewerker-data-services [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Mirror launched on Mon, 28 Jul 2014 21:38:56 by HTTrack Website Copier/3.48-17 [XR&CO'2014] mirroring https://www.randstad.nl/vacatures/1562519/medewerker-data-services with the wizard help.. strlen(copyBuff) == qLen failed at htslib.c:3458edewerker-data-services (22995 bytes) - OK Program received signal SIGABRT, Aborted. 0x00007ffff6b2dc39 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.9-1.fc20.x86_64 krb5-libs-1.11.5-5.fc20.x86_64 libcom_err-1.42.8-3.fc20.x86_64 nss-mdns-0.10-13.fc20.x86_64 openssl-libs-1.0.1e-38.fc20.x86_64 zlib-1.2.8-3.fc20.x86_64 (gdb) up 3 #3 0x00007ffff7b83c18 in fil_normalized (source=source@entry=0x7ffffffe00b0 "/intent/tweet?url=https%3A%2F%2Fwww.randstad.nl%2Fvacatures%2F1562519%2Fmedewerker-data-services&text=Medewerker+Data+Services&via=randstadnl", dest=0x7ffffffe729f "/intent/tweet") at htslib.c:3458 3458 assertf(strlen(copyBuff) == qLen); (gdb) set print elements 8000 (gdb) p source $1 = 0x7ffffffe00b0 "/intent/tweet?url=https%3A%2F%2Fwww.randstad.nl%2Fvacatures%2F1562519%2Fmedewerker-data-services&text=Medewerker+Data+Services&via=randstadnl" (gdb) p copyBuff $2 = 0x67d510 "?text=Medewerker+Data+Services&url=https%3A%2F%2Fwww.randstad.nl%2Fvac&via=randstadnl" (gdb) p amps[0] $3 = 0x7ffffffe72ff "" (gdb) p amps[1] $4 = 0x7ffffffe72ac "" (gdb) p amps[2] $5 = 0x7ffffffe731d "" (gdb) Hope this helps, kind regards, Guus.
I still had the gdb session open, so np: (gdb) p amps[0]+1 $6 = 0x7ffffffe7300 "text=Medewerker+Data+Services" (gdb) p amps[1]+1 $7 = 0x7ffffffe72ad "url=https%3A%2F%2Fwww.randstad.nl%2Fvacatures%2F1562519%2Fmedewerker-data-services" (gdb) p amps[2]+1 $8 = 0x7ffffffe731e "via=randstadnl" (gdb) Regards, Guus.
Can you tell me what gcc version is in use ? It seems that the second string hasn't been copied correctly (!) and I'm still scratching my head to find out why :)
[gbonnema@mahatma ~]$ gcc --version gcc (GCC) 4.8.3 20140624 (Red Hat 4.8.3-1) Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [gbonnema@mahatma ~]$
TBH I do not know what gcc version was used to generate the httrack of fedora 20, because I probably just got the binary version of httrack, as fedora packages it.
Ahah! I can reproduce the issue with thie GCC release!!!
Good show! I will log out in a minute, unless you have more tests to do?
No, thank you - I should be able to understand a bit more what's going on from now. Thanks again for your precious help!
Issue spotted and fixed inside src/htssafe.h. strncat(A, B, (size_t) -1) is NOT safe at all, and does not appears to behave like strcat(A, B), because of optimized version.
Created attachment 921928 [details] Patch to fix the issue
Created attachment 921942 [details] Final patch for htssafe.h
Fixed in 3.48.19 (http://mirror.httrack.com/historical/httrack-3.48.19.tar.gz)
httrack-3.48.19-1.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/httrack-3.48.19-1.fc20
Package httrack-3.48.19-1.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing httrack-3.48.19-1.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-9199/httrack-3.48.19-1.fc20 then log in and leave karma (feedback).
Package httrack-3.48.19-2.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing httrack-3.48.19-2.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-9199/httrack-3.48.19-2.fc20 then log in and leave karma (feedback).
httrack-3.48.19-2.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
Probable upstream bug in the the GLIBC reported as Bug 17279 (https://sourceware.org/bugzilla/show_bug.cgi?id=17279)