Description of problem: readpst -r incorrectly names a file according to its type: Recoverable Items/Calendar Logging/mbox When doing the recursive folders with: readpst -w -o /tmp/readpst//root -r -D /root/mailtst.pst the result has a mismatch between the name and the type at Recoverable Items/Calendar Logging/mbox Version-Release number of selected component (if applicable): 0.6.67 How reproducible: 100% Steps to Reproduce: readpst -w -o /tmp/readpst//root -r -D /root/mailtst.pst This special source file was not easy to find (my previous test files didn't contain this folder). And this example is huge and contains private information, so I can't send it (but I can perform actions needed to debug and test it -- those which you need). Actual results: the result has a mismatch between the name and the type at Recoverable Items/Calendar Logging/mbox: # find /tmp/readpst/root/mailtst/ -type f -print0 | xargs -0 file ... /tmp/readpst/root/mailtst/Recoverable Items/Calendar Logging/mbox: vCalendar calendar file /tmp/readpst/root/mailtst/Recoverable Items/Deletions/mbox: UTF-8 Unicode mail text, with very long lines ... /tmp/readpst/root/mailtst/Календарь/calendar: vCalendar calendar file ... Expected results: It is expected to be named "calendar". Additional info: This prevents a correct use of the output as a mail_location for dovecot; as in: doveadm -Dv sync -u USER -1 -R mbox:/tmp/readpst//root/mailtst:UTF-8:DIRNAME=mbox:INDEX=/tmp/readpst//root/mailtst/.dovecot-index The mail from me to the libspt mainatainer (carl at five-ten-sg etc) gets rejected. Perhaps, the mail from bugzilla will somehow reach him...
I must have explained the problem not clear enough: the types output by file are correct. (I have looked inside the file, and it indeed contains calendar cards, not mail.) /tmp/readpst/root/mailtst/Recoverable Items/Calendar Logging/mbox: vCalendar calendar file But the name as used by readpst -r is wrong. It is named "mbox", but this doesn't correspond to the actual type. It should be named "calendar".
Created attachment 1193404 [details] fgrep -1 'I have a' ../tmp/readpst/root/mailtst.readpst.log > Can you re-run that readpst command with > > -d some.log.file.txt > > and then > > grep "I have" some.log.file.txt I've done that, although the log file became really huge (27G). I have grepped for "I have a" not to get extra content from the message bodies (I've just checked the readpst.c source code, and this seems to be a narrower pattern that wouldn't miss any debugging message). I've also include one line of context.
I believe now that the problem is a more global one: Some folders contain items of different several types. I've run readpst -e, and see that in this specific case the first item parsed is probably not an email, whereas the folder type is probably undefined, therefore it is created as mbox, but then the type gets overridden, and calendar items are saved there: $ ls readpst/root.extensions/mailtst/Recoverable\ Items/Calendar\ Logging/ | head 1.ics 10.ics 100.eml 101.ics 102.eml 103.ics 104.eml 105.ics 106.ics 107.eml $ This is not nice: naming something "mbox", but not saving well-formed emails there. I believe this specific issue requires a fix. Globally, I'm thinking about a solution where readpst -r would create several files in the same folder (mbox, calendar) for such cases. This would allow to save everything and not mess things up. (Other modes except for readpst -r are not very attractive for me, because they do no create mbox--the only output format understood completely by dovecot, and hence, doveadm sync (for reading and importing whole user's accounts). MH is not supported by dovecot.) Here one can see all the folders with different types of items: $ find readpst/root.extensions/ -type d -print -exec sh -c 'ls "{}" | fgrep . | cut -d. --fields=2 | sort -u' ';' readpst/root.extensions/ readpst/root.extensions/mailtst readpst/root.extensions/mailtst/Заметки readpst/root.extensions/mailtst/МСЭД eml readpst/root.extensions/mailtst/Входящие eml readpst/root.extensions/mailtst/Входящие/103 eml readpst/root.extensions/mailtst/Входящие/Миграция ADEX eml readpst/root.extensions/mailtst/Входящие/Оперативка eml readpst/root.extensions/mailtst/Входящие/СПО - ФСТЭК eml readpst/root.extensions/mailtst/Входящие/СУТП eml readpst/root.extensions/mailtst/Входящие/Схемы ЛВС eml readpst/root.extensions/mailtst/Нежелательная почта readpst/root.extensions/mailtst/ТТ на согласование eml readpst/root.extensions/mailtst/Junk eml readpst/root.extensions/mailtst/Задачи readpst/root.extensions/mailtst/Прочитать и дать ответ! eml readpst/root.extensions/mailtst/Прочитать и дать ответ!/Что-то важное eml readpst/root.extensions/mailtst/Отправленные eml readpst/root.extensions/mailtst/aрхив eml readpst/root.extensions/mailtst/Контакты vcf readpst/root.extensions/mailtst/Контакты/Recipient Cache vcf readpst/root.extensions/mailtst/Черновики readpst/root.extensions/mailtst/Календарь ics readpst/root.extensions/mailtst/Sent eml readpst/root.extensions/mailtst/Удаленные eml ics readpst/root.extensions/mailtst/ЕКП eml readpst/root.extensions/mailtst/Recoverable Items readpst/root.extensions/mailtst/Recoverable Items/Deletions eml ics readpst/root.extensions/mailtst/Recoverable Items/Calendar Logging eml ics readpst/root.extensions/mailtst/Предлагаемые контакты vcf readpst/root.extensions/mailtst/Ошибки синхронизации readpst/root.extensions/mailtst/Ошибки синхронизации/Конфликты eml $ So, the problematic folders are not normal ones: * Удаленные (means "Deleted" in Russian) * Recoverable Items/Deletions * Recoverable Items/Calendar Logging
Created attachment 1193451 [details] libpst-no-bad-mboxes.patch Fixed the problem with bad mboxes by the attached patch. A related question that appeared to me was: in Thunderbird mode, is it OK that zero would be written to .type for such folders? (I have not changed this.) A check of the result I've made by comparing to the previous output (a big account, around 6G): -bash-4.3$ diff -rq --exclude='*.log' -Iboundary -Iname -IDTSTAMP root root.nobad Files root/mailtst/Recoverable Items/Calendar Logging/mbox and root.nobad/mailtst/Recoverable Items/Calendar Logging/mbox differ -bash-4.3$ find root.nobad/ -name mbox -print0 | xargs -0 file root.nobad/mailtst/МСЭД/mbox: ISO-8859 mail text, with very long lines root.nobad/mailtst/Входящие/Оперативка/mbox: UTF-8 Unicode mail text root.nobad/mailtst/Входящие/Схемы ЛВС/mbox: Non-ISO extended-ASCII mail text, with very long lines root.nobad/mailtst/Входящие/103/mbox: UTF-8 Unicode mail text, with very long lines root.nobad/mailtst/Входящие/mbox: ISO-8859 mail text, with very long lines root.nobad/mailtst/Входящие/СПО - ФСТЭК/mbox: Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators root.nobad/mailtst/Входящие/Миграция ADEX/mbox: UTF-8 Unicode mail text, with very long lines root.nobad/mailtst/Входящие/СУТП/mbox: UTF-8 Unicode mail text, with very long lines root.nobad/mailtst/ТТ на согласование/mbox: UTF-8 Unicode mail text, with very long lines root.nobad/mailtst/Junk/mbox: UTF-8 Unicode mail text, with very long lines root.nobad/mailtst/Прочитать и дать ответ!/Что-то важное/mbox: Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators root.nobad/mailtst/Прочитать и дать ответ!/mbox: Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators root.nobad/mailtst/Отправленные/mbox: UTF-8 Unicode HTML document text, with very long lines root.nobad/mailtst/aрхив/mbox: Non-ISO extended-ASCII mail text, with very long lines root.nobad/mailtst/Sent/mbox: UTF-8 Unicode text root.nobad/mailtst/Удаленные/mbox: Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators root.nobad/mailtst/ЕКП/mbox: UTF-8 Unicode mail text root.nobad/mailtst/Recoverable Items/Deletions/mbox: UTF-8 Unicode mail text, with very long lines root.nobad/mailtst/Recoverable Items/Calendar Logging/mbox: UTF-8 Unicode mail text root.nobad/mailtst/Ошибки синхронизации/Конфликты/mbox: UTF-8 Unicode mail text -bash-4.3$ Everything is fine. BTW, the previous output seems to have saved the calendars to "root/mailtst/Recoverable Items/Calendar Logging/mbox" -- those ones which are also present as attachments in the new correct output mbox.
http://git.altlinux.org/gears/l/libpst.git?p=libpst.git;a=commitdiff;h=e063160b07910ab1cbdf6eb57562201fad4a2068
fixed.
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle. Changing version to '27'.
fixed in 0.6.69