Bug 465243

Summary: rsync-friendly deflate in zlib
Product: [Fedora] Fedora Reporter: Wojciech Pilorz <wpilorz>
Component: zlibAssignee: Ivana Varekova <varekova>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 10CC: egmont, varekova, wpilorz
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-21 08:49:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patches to add 1. rsyncable support for zlib 2. minizip binaries 3. -q for minizip
none
Slightly modified patch adding 1. rsyncable support to zlib; 2. minizip binaries; 3. -q for minizip
none
new patch adding rsyncable support to zlib none

Description Wojciech Pilorz 2008-10-02 12:15:05 UTC
Description of problem:
zlib, as installed on Fedora 8, Fedora 9 or Fedora 10beta, does not provide
rsync-friendly compression (like that given by gzip --rsyncable)
I have applied slightly modified patches from UHU-linux to add the possibility 
of rsync-friendly deflate compression to zlib (and minizip).
The proposed patches also cause minizip and miniunzip binaries
to be included to minizip binary RPM package

Version-Release number of selected component (if applicable):
zlib-1.2.3-14.fc8, zlib-1.2.3-18.fc9

How reproducible:
N/A

Steps to Reproduce:
N/A
  
Actual results:
N/A

Expected results:
N/A

Additional info:
Please note that rsync-friendly means also xdelta-friendly.
If zlib with rsync patches is used, code modifications in deflate
to generate rsync-friendly output are used if environment variable
ZLIB_RSYNC is defined to an non-empty string different from "0"

This is needed for example:
- for backup software using zlib, if we need results to be transferred
  with rsync+ssh over (relatively) slow link
- if we need to generate ZIP format files and be able to get
  smaller deltas (rsync or xdelta)

Please note that while minizip is fairly simplistic compared to zip,
it can generate rsync-friendly ZIP archived, while current version
of zip cannot.
Some limitations of minizip can be worked around using xargs, for example
find find_spec -type f -print0 | env LANG=C sort -z | env ZLIB_RSYNC=1 xargs -r0 minizip -a destfile.zip

Please consider including the zlib patches I will attach for all fedora and enterprise linux versions.

Comment 1 Wojciech Pilorz 2008-10-02 12:52:08 UTC
Created attachment 319225 [details]
patches to add 1. rsyncable support for zlib 2. minizip binaries 3. -q for minizip

I have rebuilt successfully, installed and tested resulting 
zlib-1.2.3-21.fc8 and minizip-1.2.3-21.fc8
on fully-patched Fedora8 system (Pentium 4 machine).
The base for patches was zlib-1.2.3-18.fc9.src.rpm
(which is identical to Fedora10 beta zlib src.rpm)

Files included in zlib_123_20_patch_spec.tgz :
zlib.spec is spec file
zlib.spec.diff is diff from zlib.spec in zlib-1.2.3-18.fc9.src.rpm
zlib-1.2.3-rsyncable.patch - adds rsyncable support to zlib - details in zlib.h
zlib-1.2.3-quietminizip.patch - adds -q to minizip and miniunzip

rsyncable support is turned on if ZLIB_RSYNC=1
environment variable is set when deflateInit (or eqv.) is called for the first time.

60cab76c1c223860a5d9d6456c1c3cdde1afb62b *zlib_123_20_patch_spec.tgz
f2a15c5ccc46be8f93f163a0498cd79ab2f92d8e *zlib-1.2.3-quietminizip.patch
35959e640a79629a90ff367f0ce87ef028daaa4d *zlib-1.2.3-rsyncable.patch
a286de2a1a2059f5e4fb2cc2bd446fbba67a203e *zlib.spec
32e513a05a19f91ad42cc42fa531cc0019a34204 *zlib.spec.diff

zlib-1.2.3-rsyncable.patch is adapted from
https://svn.uhulinux.hu/packages/dev/zlib/patches/02-rsync.patch
and
https://svn.uhulinux.hu/packages/dev/zlib/patches/03-rsync-from-env.patch

Getting option from environment allows that public interface of zlib
is not changed and that the feature can be used without changing
software using zlib.

I have modified the 03* patch so that it does not rely on gcc specific feature,
in the hope that it would be easier to be accepted to zlib upstream.

Also, please note that the added benefit is that you can call
setenv("ZLIB_RSYNC", "1", 1)
is program before first deflate* call

Comment 2 Ivana Varekova 2008-10-06 11:01:41 UTC
Hello,
I think it will be best to add  rsync-friendly support to upstream version (as you mention at the end of comment #1), so please have you sent the patches there too?

Comment 3 Wojciech Pilorz 2008-10-06 20:25:56 UTC
I have not sent them the patches yet.
I assumed that it would be easier to them accepted upstream once there is some use cases in the wild (and even that might be hard; gzip has --rsyncable in several Linux distribution for years, and it is not in GNU/Savannah gzip alpha or CVS ...)
So I thought first to try to push the patches to Fedora and friends.

Do you suggest me to send the patches to the zlib team right now?

Comment 4 Egmont Koblinger 2008-10-06 23:04:51 UTC
Hi folks,

I've taken another look at the old 02-rsync.patch (the one we've found somewhere on the net, created by Rusty Russell IIRC) and 03-rsync-from-env.patch (created by me). I also looked at Wojtek's patch.

Some technical notes:

The gcc-specific hack initialized the library when it was loaded; Wojtek's solution initializes it when it is first used.

The original patch introduced the external symbol zlib_rsync which enables applications to control the behavior of the library. Wojtek's patch introduces zlib_rsync_set as another external symbol. (External means that it pollutes the global namespace, no other library or the main application itself can use this symbol for other purposes, but they can always easily read or write this variable - "nm libz.so" prints an uppercase letter for this symbol.) If an application wants to explicitly control the behavior, now it has to set zlib_rsync to the desired value as well as set zlib_rsync_set to 1 - to make sure that it works no matter if the library was already initialized or not. Also it's not possible to query the value if the library wasn't initialized. This is not a very nice design.

Probably it would be way better to have these variables internal to the library ("static" in .c and not mentioned in .h) and expose a getter and setter, e.g. int get_zlib_rsync() and void set_zlib_rsync(int). The getter would initialize the value based on the env var if it wasn't initialized, and the setter would also set the initialized flag. This way it would be completely hidden from the application when initialization from the env var happens. These functions probably wouldn't be thread-safe, though.

zlib_rsync_init() is currently a non-static function, but not mentioned in the header file. If you'd like to make it public, you should have its signature in the header. If not, however, then the function should be static so that its name is not exported as a symbol. (The gcc-specific init trick was a special case in this regard - it needs to be externally available but you don't want developers to write code that calls this function.)

I'm not familiar with zlib internals at all, not even with its interface. In an ideal world, you would probably have instances ("objects") of a zipper, multiple instances could have different rsync configuration, etc... I also don't know if zlib is thread-safe at all (but I would be surprised if it wasn't).

So if you aim at mainstream inclusion, it would be nice to probably think on a cleaner design that allows more widespread usage of the feature. I think the current solution (both the external int to control the behavior, and the env var to initialize it) is quite hackish, I just don't know how to do this absolutely right.

Comment 5 Wojciech Pilorz 2008-10-06 23:34:17 UTC
I have asked zlib team whether they would consider adding the patches (modified as needed) to official zlib.

Regarding previous comment from Egmont, I wanted to avoid any change
in public interface of zlib, so that no program would require
recompiling, and to be able to 'implant' the rsync-friendly
feature 'externally' into a program using zlib, without modifying
such a program.

My Fedora 8 system works with no problem after I have installed modified
zlib rpm, so I hope it is not a complete disaster.

I will think more about points raised here this week.

Comment 6 Wojciech Pilorz 2008-10-07 10:30:30 UTC
To address some of the issues raised by Egmont
I would propose the following:

- zlib_rsync_init() changed to static
- zlib_rsync_set  changed to static
- zlib_rsync      left as external int

So we would have
a) one external name added zlib_rsync
b) ability to set zlib_rsync through ZLIB_RSYNC environment variable
c) ability to set zlib_rsync through putenv before 1st call to deflate
d) ability to directly manipulate zlib_rsync after 1st call to deflate
e) binary compatibility with programs using zlib

d) should perhaps only be used in statically linked applications;
What is most important for me is e) and b)

Is that acceptable?

Note:
Storing rsyncable flag in z_stream_s would be cleaner solution, because
1. it would allow setting to be specified for each stream independently
2. it would let us eliminate globals, namespace pollution while being thread-safe
BUT
it would require changing both public and binary interface to zlib, which
is not worth the trouble. 
Perhaps if upstream agrees to have that in some future version of zlib...

Comment 7 Wojciech Pilorz 2008-10-14 06:46:29 UTC
Unfortunately there is no response from zlib team ...

Comment 8 Ivana Varekova 2008-10-14 08:40:24 UTC
Hello,
thanks for your work, I will try to connect with upstream guys myself, I hope it will help.

Comment 9 Wojciech Pilorz 2008-10-15 08:18:05 UTC
Created attachment 320405 [details]
Slightly modified patch adding 1. rsyncable support to zlib; 2. minizip binaries; 3. -q for minizip

I have added a slightly modified patch adding rsyncable support
to zlib.
Changes are as follows:
- changed two external names to local (static)
  now zlib_rsync_set and zlib_rsync_init() are static
- changed spelling of help messages describing -q in minizip and miniunzip
  to be consistent with descriptions of other options

Comment 10 Wojciech Pilorz 2008-10-15 08:47:08 UTC
I have looked more closely at zlib source code and will try
to create a cleaner patch, with rsyncable flag set per stream

Comment 11 Wojciech Pilorz 2008-10-21 13:59:27 UTC
Created attachment 321016 [details]
new patch adding rsyncable support to zlib

This is a new patchset for adding rsyncable support to zlib:
bd33d7d2ee9b35bd4b6d456129ff2a6494a83216 *zlib123_rs23_patches.tgz
Inside the tar archive:
caac48d2215ac45c3ffd333e638d52089c6472a1 *org/minizip-1.2.3-autotools.patch
fa7daa6d5918df990aba5c7185493794bbff2c02 *org/zlib-1.2.3-autotools.patch
ce1a2337c7bdcb1f2ebe804ceb4f27bade2e53da *org/zlib-1.2.3-minizip.patch
57ad1cbd4827eccaf2d9823b7059329e39488f58 *org/zlib.spec.org
c01b34328923b2c98b524ec705f44e5510ad45b3 *zlib-1.2.3-changelog.patch
e3c23341995209d3c5f7af7ac593efa53fbfd9cd *zlib-1.2.3-minigziprs.patch
aaddfd87bf7c54d9870409116d10dc94d50d003e *zlib-1.2.3-miniziprs.patch
89019e0f1dc3b4cf1d70794632c63412ec722d4a *zlib-1.2.3-rsyncable.patch
27ce85bcf2e513a3c49bfc93bb92541e899b253b *zlib.spec
cc99669f00b838e2be2f430705f022c9498e21dd *zlib.spec.rs23.diff

org/ contains original files for reference, which I did not modify
org/*patch, *patch should go to SOURCES/ (and zlib-1.2.3.tar.gz which I do not include)
zlib.spec should go to SPECS/
zlib.spec.rs23.diff is for reference only.

In this new patch I have addressed issues raised by Egmont
as far as I was able to.
More details in a comment,

Comment 12 Wojciech Pilorz 2008-10-21 15:18:34 UTC
The rs23 patchset for zlib v1.2.3 adds rsync-friendly deflate support
for zlib, as well as:

- adds minizip and miniunzip binaries to minizip RPM
- adds -q (quiet) option to minizip and miniunzip
- adds -s option to minizip and minigzip (rsync-friendly deflate)
- adds --rsyncable option to minigzip (same function as -s, for gzip compatibility)
  Thus rsync-friendly deflate output from minigzip and minizip can be requested either
  via -s option or via exported environment variable ZLIB_RSYNC=1
- adds -- option to minizip (end of options indicator)
- adds --help option to minigzip (-h has already been used)
- allows for longer pathnames on unix and win32 in minizip and minigzip
- fixes some buffer overflows in minigzip


The current version should be almost thread-safe;
The default state of requested rsync-friendly deflate output is kept
in a static variable (shared between threads), but is it being
used only for deflate stream initialization;
So the only caveat I am aware of is that if one calls
deflateSetRsyncDflt() in one thread, deflate stream initialiazed
later in all threads use the new value.

Please look at zlib.h file for details.

The basic idea is that we have
zlib_rsync_dflt static integer, initially set to -1;
When a deflate stream it being initialized, deflateGetRsyncDflt() gets called
from lm_init; If zlib_rsync_dflt is not set at that moment to a value >= 0,
ZLIB_RSYNC environment is looked at, and if it exists and is non-empty string
different from "0", then zlib_rsync_dflt is set to 1, otherwise to 0.

The value returned by deflateGetRsyncDflt() is stored in strm->state->zlib_rsync;
That value is used during deflate operation.

It is also possible to directly set and get state->zlib_rsync of a stream using 
deflateGetStrRsync() and deflateSetStrRsync().
The requirement here (not enforced by the code) is that deflateGetStrRsync()
is called directly after deflateInit() or deflateInit2() or deflateReset()
(actually deflateReset() is called at the end of deflateInit() or deflateInit2())

I believe the design is reasonably thread safe and makes it possible
to use the rsyncable feature of zlib even with code that is not modified in any way
(just linked with new zlib).

zlib binary interace has not been changed by these patches, so programs using
zlib shared library continue to work.

I am aware there are some software using zlib (e.g. raqbackup) people are using, which
would be much more useful when rsync-friendly feature of zlib is implemented.

I would kindly request that these patches be considered for inclusion in Fedora.

Please send comment, criticism, etc.

NOTES:
 - rsync-friendly deflate is added to C code only; I did not touch (or even look at) assembler variants;
 - it might deserve some explanation why I have patches minigzip, when gzip on Fedora already has --rsyncable;
   This is because there are slight differences between gzip output and zlib output;
   If one needs to create compressed data with, say perl zlib modules and gzip (from similar source data)
   and wants small rsync delta, minigzip -s is needed for that.
 - minigzip is not packaged, but is easy to build:
   gcc -o minigzip /usr/share/doc/zlib-devel-1.2.3/minigzip.c -lz

Comment 13 Wojciech Pilorz 2008-10-31 07:05:04 UTC
Any comments / news ?
I am running Fedora 8 on i386 (P4 machine with HT) with the zlib verseion posted here for 10 days, and so far no  problems I am aware of.

Comment 14 Ivana Varekova 2008-11-07 08:58:19 UTC
Hello, 
Mark Adler from zlib upstream reply to my question regarding this bug. 
He will look to the patch you sent here but he planed to invent his own rsyncable breakpoint generation algorithm, not to use some based on gzip - and if you are willing to help him with testing of this feature then it would be helpful.

Comment 15 Ivana Varekova 2008-11-11 07:40:36 UTC
If you like to participate to the zlib testing Mark Adler just announce pigz version on zlib mailing list and there are some comments.

Comment 16 Wojciech Pilorz 2008-11-13 11:24:05 UTC
At the monent rsyncable support for pigz (Parallel Implementation of GZip) is not implemented (planned as future development).
pigz features will possibly be moved into zlib at some point if future.

Comment 17 Wojciech Pilorz 2009-03-08 23:10:26 UTC
There seems to be no progress on upstream zlib or pigz.
Could you possibly reconsider adding the patches to fedora
zlib package (just as rsyncable patches have been added to fedora gzip
package) ?

Comment 18 Ivana Varekova 2009-04-20 10:11:57 UTC
Hello,
please what is the reason why upstream does not use this patches?

Comment 19 Wojciech Pilorz 2009-04-20 22:36:16 UTC
Upstream seemed to plan a different implementation.
I am not aware of any progress, though.
I personally use the patches on Fedora10 to create
rsync-friendly zip archives (minizip) and have found
no problems with them.

Comment 20 Ivana Varekova 2009-04-21 08:49:26 UTC
Hello, 
I don't like to put it to fedora if upstream will have another solution.
I ask upstream maintainer again about the situation - for now I will close this bug.

Comment 21 Ivana Varekova 2009-05-04 09:17:09 UTC
No progress on upstream side :(.