Bug 1738063

Summary: officeparser depends on Python 2
Product: [Fedora] Fedora Reporter: Lumír Balhar <lbalhar>
Component: officeparserAssignee: Michal Ambroz <rebus>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: mhroncok, pviktori, rebus
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: officeparser-0.20180820-1.fc32 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-26 22:15:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1698500    

Description Lumír Balhar 2019-08-06 12:47:11 UTC
Python 2.7 will reach end-of-life in January 2020, over 9 years after it was released. This falls within the Fedora 31 lifetime.
Packages that depend on Python 2 are being switched to Python 3 or removed from Fedora: https://fedoraproject.org/wiki/Changes/F31_Mass_Python_2_Package_Removal#Information_on_Remaining_Packages
Python 2 will be retired in Fedora 32: https://fedoraproject.org/wiki/Changes/RetirePython2

To help planning, we'd like to know the plans for officeparser's future. Specifically:


- What is the reason for the Python2 dependency? (Is it software written in Python, or does it just provide Python bindings, or use Python in the build system or test runner?) 

- What are the upstream/community plans/timelines regarding Python 3?

- What is the guidance for porting to Python 3? (Assuming that there is someone who generally knows how to port to Python 3, but doesn't know anything about the particular package, what are the next steps to take?)


This bug is filed semi-automatically, and might not have all the context specific to officeparser.
If you need anything from us, or something is unclear, please mention it here.

Thank you.

Comment 1 Ben Cotton 2019-08-13 16:58:46 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 2 Ben Cotton 2019-08-13 17:44:49 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 3 Lumír Balhar 2019-08-14 09:01:18 UTC
Please answer the above questions. If you don't the package can be orphaned: https://fedoraproject.org/wiki/Changes/F31_Mass_Python_2_Package_Removal#Information_on_Remaining_Packages

If you need any information or help, please let us know.

Comment 4 Lumír Balhar 2019-08-21 13:28:38 UTC
Please answer the above questions. If you don't, the package can be orphaned: https://fedoraproject.org/wiki/Changes/F31_Mass_Python_2_Package_Removal#Information_on_Remaining_Packages

If you need any information or help, or if you need some more time, please let us know.

Comment 5 Lumír Balhar 2019-08-29 05:31:20 UTC
Please answer the above questions. If you don't, the package can be orphaned: https://fedoraproject.org/wiki/Changes/F31_Mass_Python_2_Package_Removal#Information_on_Remaining_Packages

If you need any information or help, or if you need some more time, please let us know.

Comment 6 Miro Hrončok 2019-09-05 10:41:47 UTC
According to the procedure described in https://fedoraproject.org/wiki/Changes/F31_Mass_Python_2_Package_Removal#Information_on_Remaining_Packages the package was now orphaned. If you think it was a mistake, you can provide the answers and claim the package back.

Let us know if you need any help or just need more time.

Comment 7 Michal Ambroz 2019-09-19 16:15:51 UTC
Hello,
I am sorry I just learned about this ticket - and python2 cleaning in general - I need more time to check the options for officeparser.

>- What is the reason for the Python2 dependency? (Is it software written in Python, or does it just provide Python bindings, or use Python in the build system or test runner?) 
This parser is written in pytho2


>- What are the upstream/community plans/timelines regarding Python 3?
I have asked this now in the upstream github repository https://github.com/unixfreak0037/officeparser/issues/18

>- What is the guidance for porting to Python 3? (Assuming that there is someone who generally knows how to port to Python 3, but doesn't know anything about the particular package, what are the next steps to take?)
Struct is used to parse binary data from MS office OLE documents.
As this is single script it should be quire straightforward to switch to python3.

Michal Ambroz

Comment 8 Lumír Balhar 2019-09-27 10:13:53 UTC
The upstream doesn't seem to be active. Do you plan to port this tool?

If you can provide some tests or describe how this tool works, I can help you with porting it to Python 3.

Comment 9 Lumír Balhar 2019-10-04 12:25:40 UTC
No response from upstream developers for more than two weeks. Do you have any plans? Could you provide any tests which might help with porting?

Comment 10 Lumír Balhar 2019-10-14 08:04:40 UTC
Could you please respond?

Comment 11 Michal Ambroz 2019-10-14 10:36:25 UTC
Hello,
> The upstream doesn't seem to be active. Do you plan to port this tool?
Yes I have plans to migrate this one to python3, just I need to prioritize the 54 packages I maintain currently in my free time you know. 

> If you can provide some tests or describe how this tool works, I can help you with porting it to Python 3.
Help would be appreciated if you have some spare capacity.
But I guess only incident response and malware analysts have got any aim in using this package so I would not like to force to do that as you certainly have some priorities to number of users as well.

This would be the simplest test:
1) libreoffice / New / Text Document 
2) write "blabla"
3) save as / test01.doc / format Word 97-2003

4) list the streams in the document
$ officeparser.py --print-streams test01.doc 
1: CompObj
2: Ole
3: 1Table
4: SummaryInformation
5: WordDocument
6: DocumentSummaryInformation

5) dump single stream
$ officeparser.py --dump-stream-by-name=WordDocument test01.doc |xxd -a

6) extract all the streams in once
officeparser.py --create-manifest --extract-streams test01.doc
=> will produce files manifest stream_1_0.dat stream_2_0.dat stream_3_0.dat stream_4_0.dat stream_5_0.dat stream_6_0.dat

I guess if this is working after switch to python3 then the rest should be working too.

Michal Ambroz

Comment 12 Petr Viktorin (pviktori) 2019-10-15 07:46:45 UTC
> Yes I have plans to migrate this one to python3, just I need to prioritize the 54 packages I maintain currently in my free time you know.

For prioritization: Please deal with **libewf** first, specifically its "fuse-python" dependency which is unmaintained and will be removed from the distro soon (this week?) unless it finds a maintainer.
We know next to nothing about fuse-python. What should happen with it? Who knows it and wants to maintain it in Fedora? Should we rescue it and maintain it for a few months, to give you more time?


A longer answer:

> Yes I have plans to migrate this one to python3, just I need to prioritize the 54 packages I maintain currently in my free time you know.

Yup, and maintaining isn't easy. We maintain Python 2, and we're tired of maintaining it, so we want to remove it. It seems no one else wants to maintain it instead of us. What do we do?
Would it work for you if we just removed it, and didn't bother you with all these warnings?
For most packagers it wouldn't work, so we want to remove Python 2 more gently than that. But we are overwhelmed as well -- there are now about 500 packages on Python 2 (468 at last count), most of which we know nothing about. We're not trying to attack you personally.

The main thing we are trying to do in *this* issue is to ask the questions, and "I need more time" is still a valid answer. Sorry we could reach you earlier -- the Python 2 "witch hunt" is going on since 2013 ( https://fedoraproject.org/wiki/Changes/Python_3_as_Default ), with changes announced regularly on the mailing lists that packagers should monitor, and Python 2 EOL being talked about in the wider Python community since 2008. If that didn't reach you, I don't know a better way to reach you than a bugzilla like this, sorry.


So, should we go with an an exception for all your packages? If I see correctly, these are:

- officeparser (Python 2 only, no additional dependencies)
- python-volatility (Python 2 only, will need coordination with the maintainer of python2-crypto)
- libewf (Python 2 only, but depends on fuse-python, which is **orphaned** (!) -- there is nobody who wants to maintain python-fuse in Fedora, so it will be removed soon. AFAICS, all users moved to python-fuse. If you need python-fuse for libewf, let us know. Getting a Python 2 exception for it shouldn't be hard, but it needs a maintainer **soon**.)
- python-olefile, python-oletools - these have a Python 3 version. Do you need the Python 2 version as well?
- afflib, capstone, dionaea, openvas-manager, python-hexdump, python-impacket, python-pefile, python-pyev, python-smbpasswd, python-yara, scapy - these are Python3-only.

Again, I know next to nothing about these.

Comment 13 Michal Ambroz 2019-10-17 06:37:08 UTC
> For prioritization: Please deal with **libewf** first, specifically its "fuse-python" dependency which is unmaintained and will be removed from the distro soon (this week?) unless it finds a maintainer.
Thanks for the hint - I took over the ticket #1738945 now. I will follow-up there.

> So, should we go with an an exception for all your packages?
Nah ... I am happy to get rid of the python2 sub-packages in most of the cases - where there exists python3 alternative.

>- officeparser
I propose to change the package now to split python2 and python3 and actually not generate anything on f31+ rhel8+ right now.
This way I can bring python3 subpackage once it is working without making f31 dirty with python2.
This is just one file with no dependencies, it should be possible to migrate this to python3 easily.

>- libewf
I already have working version without python2 - lets followup on #1738945.

>- python-olefile, python-oletools - these have a Python 3 version. Do you need the Python 2 version as well?
No we do not need python2 version in f31+ ... I will change the conditions if python2 is still there.

> - afflib, capstone, dionaea, openvas-manager, python-hexdump, python-impacket, python-pefile, python-pyev, python-smbpasswd, python-yara, scapy - these are Python3-only
dionaea needs rebuild with re-added libemu , but should be working after that (it was not using the python2 binding, just the library)
openvas-* needs some significant rebuild - working on that as huzaifas cheese and other maintainers become silent on that.

> Would it work for you if we just removed it, and didn't bother you with all these warnings?
Do not take me wrong, I am glad for the warnings that python2 sub-package is to go.
Just it was surprising to me to see also the base packages orphaned/deleted when bug says "please provide info" it is then followed by "package removed from fedora" like in libemu.
I can't count how many bugs I have raised and there was no solution to those - still the packages were not removed from the Fedora blamed on the maintainer non-responsiveness.

Personally I would prefer some less demotivating way than removing whole packages, just because it has python2 subpackage.
I would propose mass changing the remaining packages with conditional %bcond_with python2 which removes only the python2 subpackages/py2_build/py2_install/python2 files for f31+ and rhel8+.
This way the code base would remain for keeping old releases working and maintainable from the single spec in rawhide, keeps the binary non-python libraries in fedora and leaves room to come with the python3 subpackage once it is working.

> I don't know a better way to reach you than a bugzilla like this, sorry.
I acknowledge it is formally my fault not being able to read my Fedora lists/bugzilla all the time - but it is like 150+ emails each day.
I always thought that this is exactly the reason why we fill personal email and phone number in FAS register, so it can be used for contacting me personally in case somebody can't reach me in bugzilla.
Before removing the so-far working package from Fedora I would expect something like that to happen.
Am I really the only one having issues with sorting all the buzilla and devel lists to one folder or you guys hear that from other packagers as well ?


Best regards
Michal Ambroz

Comment 14 Petr Viktorin (pviktori) 2019-10-17 14:51:04 UTC
> Nah ... I am happy to get rid of the python2 sub-packages in most of the cases - where there exists python3 alternative.

I am very happy we're communicating now -- that list is exactly the info we need.


> Just it was surprising to me to see also the base packages orphaned/deleted when bug says "please provide info" it is then followed by "package removed from fedora" like in libemu.
> I can't count how many bugs I have raised and there was no solution to those - still the packages were not removed from the Fedora blamed on the maintainer non-responsiveness.

Sorry!
If the maintainer is silent, we know of no good way to separate useful packages from ones nobody cares about. Removing them and seeing if that becomes a problem is drastic, but keep in mind we already tried other options. (If you know something we didn't try, that would also work in practice, let us know!)


> I would propose mass changing the remaining packages with conditional %bcond_with python2 which removes only the python2 subpackages/py2_build/py2_install/python2 files for f31+ and rhel8+.

That's your packaging style, which I personally disagree with a bit -- I'd have distinct specfiles for Fedora and EL.
It also can't be automated, which is a problem when there are hundreds of packages.
Both issues illustrate the main problem: we can't fix the packages in a way that *all* the maintainers would like. Even disregarding personal preferences, we don't know what most of these packages do and what is best for them -- which is why we're trying to communicate with the maintainers.


> I always thought that this is exactly the reason why we fill personal email and phone number in FAS register, so it can be used for contacting me personally in case somebody can't reach me in bugzilla.

I believe those contacts are for worse emergencies, and also, how far does this go? Some people will want us to look at the e-mail in changelog messages or Git commits :/
IMO, if you aren't reachable in Bugzilla, you're not maintaining packages well.
My Bugzilla workflow is:
- If I don't have time for a particular bugzilla, reply that "I won't have time for this; if you do, please send a pull request or help co-maintaining the package".
- If I'm getting so many bugzillas I can't even *read* them, then I'm not maintaining the packages, and I should orphan or retire them.
(Which is not to say that I'm perfect, but if mistakes are made, they can be undone. It's just more work.)


> Am I really the only one having issues with sorting all the buzilla and devel lists to one folder or you guys hear that from other packagers as well ?

There aren't many people this late in the process, but you're not alone.

A very frustrating thing from our side is that there are so, so many people that just don't care about Fedora anymore, and leave their packages rotting, buggy and useless, while other maintainers do useless work maintaining their dependencies. We'd like to help people who care but don't have enough time now, or who care but are missing notifications, but it's hard to identify them.


Anyway, we're communicating now!
We'll try to watch out for your packages, but we have automated processes which are quire drastic at this point. If more trouble happens, please don't take it personally and notify us so we can undo.

Comment 15 Lumír Balhar 2019-10-24 10:54:18 UTC
I've managed to port officeparser. PR: https://github.com/unixfreak0037/officeparser/pull/19

Could you please test it and provide some feedback to the PR? If the upstream admin won't merge it, you can use it as a downstream patch.

Comment 16 Michal Ambroz 2019-10-25 18:24:10 UTC
Thank you very much for help it is cool.
I am adding 1 more patch for xrange, 1 more for the "to_hex" output where the usage of binary strings would clash, and one little cosmetics patch to get rid of the annoying error for not passing a filename when officeparser executed without parameters.
Still testing.

Comment 17 Ben Cotton 2020-02-11 15:47:48 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.