Discussion:
[jira] [Created] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
Dawid Weiss (JIRA)
2017-01-25 15:14:26 UTC
Permalink
Dawid Weiss created COMPRESS-380:
------------------------------------

Summary: Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss


Some of the (large) ZIP files we try to process currently will throw this:
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).

https://github.com/madler/zlib/tree/master/contrib/infback9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Stefan Bodewig (JIRA)
2017-01-25 16:57:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838117#comment-15838117 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

Sounds like a good idea.

Anybody who wants to give it a try may benefit from the {{lz77support}} package added after the 1.13 release which should contain most of what is needed for the LZ77 part of DEFLATE64.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dawid Weiss (JIRA)
2017-01-25 17:12:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838156#comment-15838156 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Thanks Stefan! I don't know if I'll have the time to look into it, just wanted to mark the problem.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Stefan Bodewig (JIRA)
2017-01-26 09:21:24 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839449#comment-15839449 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

Of course. I just wanted to leave a pointer for anybody who wants to implement this (which might even be my future self).

A pure Java DEFLATE64 will certainly be slower than the JNI based DEFLATE built into the JDK (which we re-use as well).
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dawid Weiss (JIRA)
2017-01-26 10:10:24 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839507#comment-15839507 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

bq. A pure Java DEFLATE64 will certainly be slower than the JNI based DEFLATE built into the JDK (which we re-use as well).

I don't think Java supports DEFLATE64 -- tried Java8 and Java9 and they don't (Oracle and IBM). This tool does, supposedly (and is Apache licensed):

http://www.lingala.net/zip4j/

But I haven't tried it.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dawid Weiss (JIRA)
2017-01-26 10:15:24 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839517#comment-15839517 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

zip4j doesn't support DEFLATE64 either, just checked.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Stefan Bodewig (JIRA)
2017-01-26 10:19:24 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839521#comment-15839521 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

Ah, bad wording.

No the JDK's zip package doesn't support DEFLATE64 only DEFLATE, the little brother. DEFLATE64 really only has a bigger buffer size so back-references may go back for a bit further AFAIK.

What I was trying to say is that a DEFLATE64 we implement in Java will be slower than DEFLATE itself, probably quite a bit. But of course a slow DEFLATE64 will be better than none at all for people trying to read archives created with it. Once we implement DEFLATE64 we probably should advice people to not use it when creating new archives.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dawid Weiss (JIRA)
2017-01-26 10:29:25 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839531#comment-15839531 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Not necessarily bad wording -- I may be slow in the morning, Stefan, no worries. Indeed, native ZIP in Java is faster, but it comes with its own quirks, so not everything is rosy. We use commons-codec as a fallback and I thought DEFLATE64, even though considered a "proprietary" extension is widespread enough that it'd be nice to have it for completeness. I assume that if zlib has public sources to decompress it, the lawyers wouldn't get involved... (?).
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Christian Marquez Grabia (JIRA)
2017-11-09 00:29:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245003#comment-16245003 ]

Christian Marquez Grabia commented on COMPRESS-380:
---------------------------------------------------

I have tried many other options to solve this issue but not able to... the only thing I could use to replace it is the sevenzipjbinding project but it seems to have caused some JVM crashes when running unit tests (randomly), so I'm a bit concerned about it... currently spawning a separate process to execute the native code to avoid any major issues here...

Any update / suggestions on the topic would be greatly appreciated.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-11-09 07:48:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245322#comment-16245322 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

[~chalmagr84] I'm not aware of anybody actively working on a DEFLATE64 implementation for Commons Compress. It's on my personal TODO list, but that's a rather long list so I wouldn't mind if anybody else gave it a try.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Christian Marquez Grabia (JIRA)
2017-11-16 02:22:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254645#comment-16254645 ]

Christian Marquez Grabia commented on COMPRESS-380:
---------------------------------------------------

I was able to handle decompression for my current need. Will see if I can clean it up and eventually abide to the apache-compress way. At least I don't need the native library solution for now...
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2017-12-02 08:40:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275481#comment-16275481 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

The workflow would be to provide a patch file (diff) or clone the repository somewhere (github) and provide access to a branch there. A github pull request is one way to do it.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-12-11 16:44:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286212#comment-16286212 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

[~chalmagr84] having "only" decompression support would be a big step forward. The same is true for bzip2 inside of zip archives as well, for example.

WRT developer access, what Dawid said. You can attach a patch here or create your own fork on github.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2017-12-12 18:35:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288026#comment-16288026 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

I'm not a committer, but I looked at the patch and while I didn't verify its correctness I think it's overall looking very neat -- congratulations. Stefan may correct me, but I think you'll need to add Apache headers to source files if you'd like this to be merged in. Are those classes all authored by you (they're looking awfully nice)!?

Also, is there any reason your comments are not visible to the world? It'll look strange to non-jira users to just see our replies.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Attachments: compress-380.diff
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Christian Marquez Grabia (JIRA)
2017-12-13 15:30:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289388#comment-16289388 ]

Christian Marquez Grabia commented on COMPRESS-380:
---------------------------------------------------

[~dawid.weiss] Thanks for the updates. I have created this classes based on different research on multiple deflate de-compression samples I was able to find using the masking concepts for the literals / distance tables as well as the state concept, all brought into Java 'style'.

I left it all to jira since this is more about the providing patches to the apache library I figured it would be applicable mostly to jira users (didn't want to add noise around it)
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Attachments: compress-380.diff
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Christian Marquez Grabia (JIRA)
2017-12-13 15:31:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289388#comment-16289388 ]

Christian Marquez Grabia edited comment on COMPRESS-380 at 12/13/17 3:30 PM:
-----------------------------------------------------------------------------

[~dawid.weiss] Thanks for the updates. I have created this classes based on different research on multiple deflate de-compression samples I was able to find using the masking concepts for the literals / distance tables as well as the state concept, all brought into Java 'style'.

I left it all to jira since this is more about the providing patches to the apache library I figured it would be applicable mostly to jira users (didn't want to add noise around it)

-- EDIT --
About the Apache Headers, I'm ok with it. I will try to get those from other classes and update the patch later today.


was (Author: chalmagr84):
[~dawid.weiss] Thanks for the updates. I have created this classes based on different research on multiple deflate de-compression samples I was able to find using the masking concepts for the literals / distance tables as well as the state concept, all brought into Java 'style'.

I left it all to jira since this is more about the providing patches to the apache library I figured it would be applicable mostly to jira users (didn't want to add noise around it)
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Attachments: compress-380.diff
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2017-12-13 17:12:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289544#comment-16289544 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Unless it's some kind of vulnerability (which should be reported using different channels anyway) the JIRA issues should be visible for everyone so that people can see the whole history of the issue, no problem with that.

Like I said, the patch is looking great to me.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Attachments: compress-380.diff
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-12-22 18:11:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301745#comment-16301745 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

Many thanks [~chalmagr84] and sorry for the delay, I should be able to look at the patch during the next week (but not during christmas :-)

It would be good to have a real test case - a ZIP archive using DEFLATE64 that we can distribute - in addition to the real unit tests. My version of InfoZIP zip seems to be compiled without support for it.

Also I wonder whether we want to move the Deflate64 code to a package of its own.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Attachments: compress-380.diff
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-12-27 10:52:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Bodewig updated COMPRESS-380:
------------------------------------
Fix Version/s: 1.16
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: compress-380.diff
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-12-28 09:00:19 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305188#comment-16305188 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

Is anybody of the people who voted for this issue able to provide a (preferably small) ZIP archive that uses DEFLATE64 that we can add to our repository as a test case and distribute under the Apache Software License?
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: compress-380.diff
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2017-12-28 09:15:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated COMPRESS-380:
---------------------------------
Attachment: archive-deflate.zip
archive-deflate64.zip
hello.world

Hi Stefan. {{hello.world}} compressed with {{7za}}; deflate and deflate64 versions attached. I see the binary dump does show a different compression method being used, although the compressed stream seems to be identical.

I can run some randomized regression tests once we have a patch in place.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, compress-380.diff, hello.world
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-12-28 09:17:03 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305196#comment-16305196 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

@chalmagr84 many thanks. I'm still comparing your code with the one in zlib and not completely through. This looks really good.

I'd probably change the package name from ...compressors.zip to compressors.deflate64 (and maybe add a standalone {{CompressorInputStream}} as well) - and certainly add the license headers and some javadocs. I can do all of these steps myself unless you want to update the patch.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, compress-380.diff, hello.world
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-12-28 09:22:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305207#comment-16305207 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

[~dawid.weiss], great. I'm not surprised the streams look identical for small inputs. Still it will be a good start for a testcase that executes the integration with the ZipFile.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, compress-380.diff, hello.world
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-12-28 09:23:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305215#comment-16305215 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

I just realized the patch only works for ZipFile and not for ZipArchiveInputStream which is fine with me. It shouldn't be too difficult to add support to the stream as well, I'll take care of that.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, compress-380.diff, hello.world
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2017-12-28 10:25:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated COMPRESS-380:
---------------------------------
Attachment: archive.zip
archive64.zip
input2

These streams aren't binary-identical. I generated a random sequence and folded it around other random sequence. The input is also attached for reference.

{code}
rm -f archive*.zip

dd if=/dev/urandom bs=1024 count=1 2>/dev/null > input1

cat input1 > input2
dd if=/dev/urandom bs=1024 count=1 2>/dev/null >> input2
cat input1 >> input2
rm input1

7za a -mm=deflate64 archive64.zip input2
7za a -mm=deflate archive.zip input2

ls -l archive*.zip
{code}
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2017-12-28 10:26:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305302#comment-16305302 ]

Dawid Weiss edited comment on COMPRESS-380 at 12/28/17 10:25 AM:
-----------------------------------------------------------------

These (archive.zip, archive64.zip, input2) attached streams aren't binary-identical, perhaps it'll be of greater help. I generated a random sequence and folded it around other random sequence. The input is also attached for reference.

{code}
rm -f archive*.zip

dd if=/dev/urandom bs=1024 count=1 2>/dev/null > input1

cat input1 > input2
dd if=/dev/urandom bs=1024 count=1 2>/dev/null >> input2
cat input1 >> input2
rm input1

7za a -mm=deflate64 archive64.zip input2
7za a -mm=deflate archive.zip input2

ls -l archive*.zip
{code}


was (Author: dweiss):
These streams aren't binary-identical. I generated a random sequence and folded it around other random sequence. The input is also attached for reference.

{code}
rm -f archive*.zip

dd if=/dev/urandom bs=1024 count=1 2>/dev/null > input1

cat input1 > input2
dd if=/dev/urandom bs=1024 count=1 2>/dev/null >> input2
cat input1 >> input2
rm input1

7za a -mm=deflate64 archive64.zip input2
7za a -mm=deflate archive.zip input2

ls -l archive*.zip
{code}
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-12-28 11:28:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305358#comment-16305358 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

[~dawid.weiss] thanks!

[~chalmagr84] thanks again, I've been able to follow the code very well. I've got two minor issues around error checking. The result of {{readBits}} is never checked and could be -1 if we reach the end of the input prematurely. It would probably be better to throw an exception in that case immediately than work with some bogus offsets and end up with corrupted output or exceptions later on. Second the zlib code has some additional checks like verifying the number of length symbols is not too big when reading dynamic Huffman tables. I can easily adjust this myself if this is fine with you.

Once the code is in I'd like to try some micro-benchmarks and combine your {{DecodingMemory}} class parts of {{org.apache.commons.compress.compressors.lz77support.AbstractLZ77CompressorInputStream}} as the later contains some special case code that may be useful. But this really only is an optimization step after the rest fits.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2017-12-28 11:29:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305196#comment-16305196 ]

Stefan Bodewig edited comment on COMPRESS-380 at 12/28/17 11:28 AM:
--------------------------------------------------------------------

[~chalmagr84] many thanks. I'm still comparing your code with the one in zlib and not completely through. This looks really good.

I'd probably change the package name from ...compressors.zip to compressors.deflate64 (and maybe add a standalone {{CompressorInputStream}} as well) - and certainly add the license headers and some javadocs. I can do all of these steps myself unless you want to update the patch.


was (Author: bodewig):
@chalmagr84 many thanks. I'm still comparing your code with the one in zlib and not completely through. This looks really good.

I'd probably change the package name from ...compressors.zip to compressors.deflate64 (and maybe add a standalone {{CompressorInputStream}} as well) - and certainly add the license headers and some javadocs. I can do all of these steps myself unless you want to update the patch.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-03 14:23:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309708#comment-16309708 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

I've started to add the original patch the first changes I suggested (added license headers and renamed the package) inside of the COMPRESS-380 branch. Next step will be to add testcases based on Dawid's archives.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-04 08:39:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310990#comment-16310990 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

I've added a first testcase and it passes. The next steps will be adding support to {{ZipArchiveInputStream}}, make the stream a full {{CompressorInputStream}} and even a read-only codec for the 7z package.

Many thanks so far.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-04 09:56:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311099#comment-16311099 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

[~chalmagr84] I'd like to remove the {{uncompressedSize}} from the stream's constructor for two reasons:

* ze.getSize() when using the stream in a {{ZipArchiveInputStream}} context the uncompressed size may be unknown as it may be stpred inside of a data descriptor rather than the local file header
* the stream only uses it inside of {{available}} which is supposed to return the number of bytes that can be read without blocking. The implementation of {{available}} is probably not correct for general {{InputStream}}s as we may well be blocking while trying to read bits from it, it is probably OK for the seekable input underlying {{ZipFile}}

I'd make {{available}} return 0 unconditionally. Alternatively the {{DecoderState}}s may know a bit more about data they have already read and could provide a less pessimistic answer.

Any objections?
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-04 09:57:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311099#comment-16311099 ]

Stefan Bodewig edited comment on COMPRESS-380 at 1/4/18 9:56 AM:
-----------------------------------------------------------------

[~chalmagr84] I'd like to remove the {{uncompressedSize}} from the stream's constructor for two reasons:

* ze.getSize() when using the stream in a {{ZipArchiveInputStream}} context the uncompressed size may be unknown as it may be stpred inside of a data descriptor rather than the local file header
* the stream only uses it inside of {{available}} which is supposed to return the number of bytes that can be read without blocking. The implementation of {{available}} is probably not correct for general {{InputStream}} s as we may well be blocking while trying to read bits from it, it is probably OK for the seekable input underlying {{ZipFile}}

I'd make {{available}} return 0 unconditionally. Alternatively the {{DecoderState}} s may know a bit more about data they have already read and could provide a less pessimistic answer.

Any objections?


was (Author: bodewig):
[~chalmagr84] I'd like to remove the {{uncompressedSize}} from the stream's constructor for two reasons:

* ze.getSize() when using the stream in a {{ZipArchiveInputStream}} context the uncompressed size may be unknown as it may be stpred inside of a data descriptor rather than the local file header
* the stream only uses it inside of {{available}} which is supposed to return the number of bytes that can be read without blocking. The implementation of {{available}} is probably not correct for general {{InputStream}}s as we may well be blocking while trying to read bits from it, it is probably OK for the seekable input underlying {{ZipFile}}

I'd make {{available}} return 0 unconditionally. Alternatively the {{DecoderState}}s may know a bit more about data they have already read and could provide a less pessimistic answer.

Any objections?
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-04 10:29:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311128#comment-16311128 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Makes sense to me. Available must not block -- that's the whole point of it -- so unless there's an internal buffer with already decompressed data (or an estimate of how many bytes can be decompressed without fetching new bytes) 0 is the best choice.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-04 15:55:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311543#comment-16311543 ]

Stefan Bodewig edited comment on COMPRESS-380 at 1/4/18 3:54 PM:
-----------------------------------------------------------------

https://github.com/apache/commons-compress/commit/07cc1a278b217d45cb090ff6cb3a7934105cb2d0 changes {{available}}, does this look OK? I'll certainly have to add a few more tests.


was (Author: bodewig):
https://github.com/apache/commons-compress/commit/07cc1a278b217d45cb090ff6cb3a7934105cb2d0 changes {{available}}, does this look OK?
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-04 15:55:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311543#comment-16311543 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

https://github.com/apache/commons-compress/commit/07cc1a278b217d45cb090ff6cb3a7934105cb2d0 changes {{available}}, does this look OK?
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-04 21:01:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312024#comment-16312024 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Looks good to me. I thought you'd just return 0 and be done with it, to be honest :) It's nice to see it can be estimated better, but in all honesty I don't think I ever saw {{available}} in any practical use case that made sense. Those times I did see it, it was typically used in the wrong way (code only calling {{read}} when available returned non-zero, etc.).
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-05 06:40:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312574#comment-16312574 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

Support for {{ZipArchiveInputStream}} is in now, but only if no data descriptor is used. I need to verify with InfoZip whether they'd support one for Deflate64. Also I stumbled over COMPRESS-436 while implementing it.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-05 13:51:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313126#comment-16313126 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

I've changed some of the close logic so that the underlying stream is not closed once all deflated data has been read and added a few tests and documentation. As it is I am fine with merging the branch to master unless anybody yells. Reviews of the COMPRESS-380 branch are more than welcome.

I know we need to add a few more tests.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-05 21:48:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313959#comment-16313959 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

I'll take a look and run some tests over the weekend, hopefully.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-05 22:05:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314000#comment-16314000 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Ran a super-quick sanity check on a huge zip of my local files (~8gb) and I get this.
{code}
Exception in thread "main" java.lang.NullPointerException
at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder.buildTree(HuffmanDecoder.java:417)
at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder.populateDynamicTables(HuffmanDecoder.java:343)
at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder.decode(HuffmanDecoder.java:156)
at org.apache.commons.compress.compressors.deflate64.Deflate64CompressorInputStream.read(Deflate64CompressorInputStream.java:77)
at java.io.InputStream.read(InputStream.java:101)
at dweiss.Check.main(Check.java:20)
{code}

I didn't even check what it is and have to run now.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-06 16:19:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314726#comment-16314726 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

I isolated a smaller example file that still fails at runtime, with a more complex exception. Total commander (info-zip) decompresses this file just fine (it's a png file) so it has to be something in the decoding routine.

https://github.com/apache/commons-compress/pull/58
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-06 16:20:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314726#comment-16314726 ]

Dawid Weiss edited comment on COMPRESS-380 at 1/6/18 4:19 PM:
--------------------------------------------------------------

I isolated a smaller example file that still fails at runtime, with a more complex exception. Total commander (info-zip) decompresses this file just fine (it's a png file) so it has to be something in the decoding routine.

https://github.com/apache/commons-compress/pull/58

{code}
java.lang.IllegalStateException: Attempt to read beyond memory: dist=5955
at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder$DecodingMemory.recordToBuffer(HuffmanDecoder.java:471)
at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder$HuffmanCodes.decodeNext(HuffmanDecoder.java:292)
at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder$HuffmanCodes.read(HuffmanDecoder.java:264)
at org.apache.commons.compress.compressors.deflate64.HuffmanDecoder.decode(HuffmanDecoder.java:165)
at org.apache.commons.compress.compressors.deflate64.Deflate64CompressorInputStream.read(Deflate64CompressorInputStream.java:77)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.commons.compress.compressors.deflate64.Deflate64BugTest.readBeyondMemoryException(Deflate64BugTest.java:23)
{code}


was (Author: dweiss):
I isolated a smaller example file that still fails at runtime, with a more complex exception. Total commander (info-zip) decompresses this file just fine (it's a png file) so it has to be something in the decoding routine.

https://github.com/apache/commons-compress/pull/58
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-06 16:32:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314733#comment-16314733 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Just an additional comment -- the large archive's stats of exception vs. ok file:
{code}
OK: 27847
Exception: 4183
{code}

So it's not bad, but could be better. ;)
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-07 09:42:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315128#comment-16315128 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

Thanks a lot Dawid. With any luck the failing archives are due to a handful of root causes so having you test stuff is incredibly valuable. I'm glad I've created a branch for now :-)

I'll need to dig into the code deeper than I've done before, so fixing this may take some time.

Your support is very much appreciated.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-07 10:08:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315139#comment-16315139 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

It looks as if I have been able to at least fix this particular bug which likely stems from porting C code that assumed unsigned bytes.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-07 20:16:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315460#comment-16315460 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Hi Stefan. Seems to be this assertion causing this particular problem:
{code}
if (memory[start] == -1) {
throw new IllegalStateException("Attempt to read beyond memory: dist=" + distance);
}
{code}

{{memory}} is a byte array so {{0xff}} will trip this. I commented it out and the test passes then. I'd remove this check entirely (and Array.fill(-1) too).

There are still plenty of other exceptions. For example the non-compressed block header fails to verify on len/nlen equality. I think this condition is wrong:

{code}
case 0:
readBits(Byte.SIZE - 3);
{code}

because the spec says:

{code}
Any bits of input up to the next byte boundary are ignored.
{code}

and while it is true on the first block, it may not be on subsequent blocks (I think). The bit reader would have to be able to align to a byte boundary here.

I didn't check for other errors (there's still quite a few of them), but the above is definitely a start for looking deeper.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-08 06:34:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315732#comment-16315732 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

Thanks Dawid, the first part is what I fixed yesterday already.

With commit 2d25368d stored blocks should now be read starting at a byte boundary.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-08 08:17:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315853#comment-16315853 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

I'll check again once I get back home, thanks Stefan. I did look at the changes and they look good. I can't decide whether those extensive sanity checks should be in there at all times (and shouldn't be just in assertion-enabled mode); I think it'll slow down decompression by a large factor. Whether this is of practical importance, I've no idea.

I also looked at the bit reader yesterday and wondered if it'd be a better idea to make a bit reader factory depending on byte order, then the returned implementation could be optimized for a particular byte order and not do so many conditionals. While this would mean virtual method calls, in practice these would be predominantly homogeneous and would inline fairly quickly.

Maybe worth taking a look at - if you agree and file an issue, I may take a look in a spare moment. I am fond of bit-fiddling.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-08 08:46:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315885#comment-16315885 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

The C code contains even more checks, so I'd rather leave them in for now.

As for BitInputStream I'm sure it could be improved and it will may really be worth it as the class is used by bzip2 as well - which is probably used by more people than we'll see for Deflate64. The JIT might be doing a good job anyway, though, as the byte order for every instance of the class is constant. If you want to give it a try, then please just open an enhancement request yourself.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-08 11:51:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316165#comment-16316165 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Sure, I'll experiment and if it shows any improvement, I'll file an issue/ patch. With regard to deflate64 -- what C code are you referring to, btw.? I was curious to see how deflate was different to deflate64 and looked at 7zip sources; the decompressor is virtually the same, just passed with different constants. 7zip is LGPL though, so we can't just borrow the code from there. info-zip and zlib would be probably better candidates.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-08 12:26:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316203#comment-16316203 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

I'm looking at the zlib infback9 files linked in the description of this issue.

DEFLATE64 isn't documented officially by PKWARE, http://binaryessence.com/dct/imp/en000225.htm is useful to see the differences to DEFLATE. A DEFLATE64 decoder should be able to decode a DEFLATE stream that doesn't use the length code 285 (i.e. with no distance of exactly 258 bytes).
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-08 12:26:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316203#comment-16316203 ]

Stefan Bodewig edited comment on COMPRESS-380 at 1/8/18 12:25 PM:
------------------------------------------------------------------

I'm looking at the zlib infback9 files linked in the description of this issue.

DEFLATE64 isn't documented officially by PKWARE, http://binaryessence.com/dct/imp/en000225.htm is useful to see the differences to DEFLATE. A DEFLATE64 decoder should be able to decode a DEFLATE stream that doesn't use the length code 285 (i.e. with no back-reference length of exactly 258 bytes).


was (Author: bodewig):
I'm looking at the zlib infback9 files linked in the description of this issue.

DEFLATE64 isn't documented officially by PKWARE, http://binaryessence.com/dct/imp/en000225.htm is useful to see the differences to DEFLATE. A DEFLATE64 decoder should be able to decode a DEFLATE stream that doesn't use the length code 285 (i.e. with no distance of exactly 258 bytes).
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-08 22:07:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317182#comment-16317182 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

I've decompressed the entire ~6GB archive without any errors with the code from the current head of COMPRESS-380 branch (I just decompressed it, didn't diff against the original!). This is great, although the performance penalty is huge at the moment (read: it takes forever... an order of magnitude slower than the C version; the archive took ~3 hours to just decompress while total commander manages the same in a few minutes). I looked at zlib implementation of inflate (they create smart lookup tables for larger bit windows) and found a Java implementation of this concept in [1]. The speed this implementation achieves is quite nice (see [2]); could be something to think about in the future as an improvement. For the time being, even a slow implementation is better than none (although this seems almost suspiciously slow).

[1] https://github.com/nayuki/DEFLATE-library-Java/blob/master/src/io/nayuki/deflate/InflaterInputStream.java
[2] https://www.nayuki.io/page/deflate-library-java
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Gary Gregory (JIRA)
2018-01-08 22:11:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317187#comment-16317187 ]

Gary Gregory commented on COMPRESS-380:
---------------------------------------

Note that your link points to code licensed under the MIT License, which is OK per https://www.apache.org/legal/resolved.html#category-a
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-08 22:19:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317203#comment-16317203 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Yup, I realize that. I wasn't trying to suggest to just take that implementation though because Stefan and Christian put significant effort in their own implementation. My point was more to give a ballpark figure about speed that can be achieved by a pure-java inflate implementation that is aligned with the ideas present in Mark Adler's original code.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-08 22:29:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317237#comment-16317237 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

I just realized that a *lot* of data in that archive is probably zipped as stored blocks. The currect code uses costly bit-by-bit routine while those stored blocks are (by the spec) byte-aligned so UncompressedState could just read byte-by-byte from the underlying reader (remembering that some data could be still stored in the bit buffer and flushing it first). This alone would probably give a significant speed boost.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-09 09:19:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318101#comment-16318101 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

It is good that we've got an implementation that seems to work OK by now, performance can certainly be improved later. Some of the current implementation could be augmented by stuff I've done in the LZ77 package.

Yes, I've also seen treating the stored blocks as bitstreams as something that should be possible to improve, I will give it a try before I merge the branch to master. After that I'd likely want to cut a release and spend time on improving performance in subsequent updates.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-09 11:03:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318260#comment-16318260 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

Dawid, could you give commit 32d507b0 a try? I think this should help, The first if branch may even never be reached, I just left it in to be completely sure.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-09 17:29:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318776#comment-16318776 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Yup, much better.
{code}
before throughput: 826,371 bytes/sec
after throughput: 3,077,912 bytes/sec
{code}
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-09 17:38:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318790#comment-16318790 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Also... it's really bad if the input stream is not buffered anyhow... This is typically the easiest performance booster in any application. I added a simple buffer in the constructor (on top of your patch):
{code}
public Deflate64CompressorInputStream(InputStream in) {
this(new HuffmanDecoder(new BufferedInputStream(in)));
originalStream = in;
}
{code}

and the throughput increased to 90,947,655 bytes per second (yes, you read that right!).
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-09 17:42:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318796#comment-16318796 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

In fact, I just decompressed the whole 6 gig archive in under a minute with this buffer patch (fast SSD drive, but otherwise a regular laptop). I think it should be the default for any kind of decoder to actually require a BufferedInputStream (unless it has an internal buffer and reads from the input in blocks).
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-09 17:43:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Bodewig resolved COMPRESS-380.
-------------------------------------
Resolution: Fixed

Branch merged to master. Many thanks to everybody who helped with this.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-09 17:47:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318805#comment-16318805 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

We already recommend buffering input, I'd prefer to leave this to the caller so that we don't add a second layer of buffering.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-09 18:52:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318920#comment-16318920 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

Yeah... I bet there'll be lots of people (like me) who overlook this somehow. I'd just enforce it by the type contract, even if this means in-memory streams have double buffering layer (double buffering means minor memory penalty, not buffering means order of magnitude slowdown).
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Gary Gregory (JIRA)
2018-01-09 19:04:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318963#comment-16318963 ]

Gary Gregory commented on COMPRESS-380:
---------------------------------------

Would adding a note the Javadoc of each of our stream classes help? Something like: "For best performance, consider wrapping this stream in a buffered stream."

Gary
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-09 20:22:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319080#comment-16319080 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

The thing is: I really like when the compiler makes it explicit for me. I do read the docs, but mistakes do happen. Also, this really doesn't look trappy in the current form:

{code}
try (ZipFile zfile = new ZipFile("/my/file.zip")) {
Enumeration<ZipArchiveEntry> entries = zfile.getEntries();
while (entries.hasMoreElements()) {
ZipArchiveEntry e = entries.nextElement();
try (InputStream is = zfile.getInputStream(e)) {
// .. read is in blocks or wrap in a BufferedInputStream... doesn't matter,
// it'll be slow.
{code}

and it is trappy. That constructor on ZipFile creates an unbuffered stream and this causes 10x slower performance than it could have been if the stream was buffered. I don't see how it can be fixed from the user side, actually, as even if you do wrap the output from zfile.getInputStream in a buffered input stream (or read in large byte[] blocks), the performance will still be very, very poor.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-09 20:25:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319085#comment-16319085 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

I'll file another issue for this, I think it's worth an improvement.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-09 21:01:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319080#comment-16319080 ]

Dawid Weiss edited comment on COMPRESS-380 at 1/9/18 9:00 PM:
--------------------------------------------------------------

The thing is: I really like when the compiler makes it explicit for me. I do read the docs, but mistakes do happen. Also, this really doesn't look trappy in the current form:

{code}
try (ZipFile zfile = new ZipFile("/my/file.zip")) {
Enumeration<ZipArchiveEntry> entries = zfile.getEntries();
while (entries.hasMoreElements()) {
ZipArchiveEntry e = entries.nextElement();
try (InputStream is = zfile.getInputStream(e)) {
// .. read is in blocks or wrap in a BufferedInputStream... doesn't matter,
// it'll be slow.
{code}

and it is trappy. That constructor on ZipFile creates an unbuffered stream and this causes 10x slower performance than it could have been if the stream was buffered. I don't see how it can be fixed from the user side, actually, even if you do wrap the output from zfile.getInputStream in a buffered input stream (or read in large byte[] blocks), the performance will still be very, very poor.



was (Author: dweiss):
The thing is: I really like when the compiler makes it explicit for me. I do read the docs, but mistakes do happen. Also, this really doesn't look trappy in the current form:

{code}
try (ZipFile zfile = new ZipFile("/my/file.zip")) {
Enumeration<ZipArchiveEntry> entries = zfile.getEntries();
while (entries.hasMoreElements()) {
ZipArchiveEntry e = entries.nextElement();
try (InputStream is = zfile.getInputStream(e)) {
// .. read is in blocks or wrap in a BufferedInputStream... doesn't matter,
// it'll be slow.
{code}

and it is trappy. That constructor on ZipFile creates an unbuffered stream and this causes 10x slower performance than it could have been if the stream was buffered. I don't see how it can be fixed from the user side, actually, as even if you do wrap the output from zfile.getInputStream in a buffered input stream (or read in large byte[] blocks), the performance will still be very, very poor.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Gary Gregory (JIRA)
2018-01-09 23:31:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319404#comment-16319404 ]

Gary Gregory commented on COMPRESS-380:
---------------------------------------

+1
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-10 07:59:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319857#comment-16319857 ]

Stefan Bodewig commented on COMPRESS-380:
-----------------------------------------

There is an additional danger when we add buffering inside of the compressor streams unconditionally: we may read too far.

Take DEFLATE64, {{ZipArchiveInputStream}} and the case where a data descriptor is used as an example. https://github.com/apache/commons-compress/blob/master/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveInputStream.java#L347

When the deflater has found the "end of stream" marker it will return -1 and we are done. An additional {{BufferedInputStream}} may have consumed additional bytes from the original ZIP stream that we would now need to "push back" into the original stream so the rest of the archive can be read properly.

Having multiple consecutive streams isn't that uncommon (multiple attachments of an email, for example) that's why we need the caller to be able to control things.

COMPRESS-438 is a completely different case and I fully agree with it.
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Dawid Weiss (JIRA)
2018-01-10 08:13:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319873#comment-16319873 ]

Dawid Weiss commented on COMPRESS-380:
--------------------------------------

I agree with you, Stefan. I was only talking about a buffer before the codec (this buffer's input is bounded externally anyway, so it shouldn't go past the compressed stream's limit).
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-10 09:48:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Bodewig updated COMPRESS-380:
------------------------------------
Component/s: Compressors
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Components: Archivers, Compressors
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-10 09:48:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Bodewig updated COMPRESS-380:
------------------------------------
Component/s: Archivers
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Components: Archivers, Compressors
Reporter: Dawid Weiss
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Stefan Bodewig (JIRA)
2018-01-10 09:49:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/COMPRESS-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Bodewig updated COMPRESS-380:
------------------------------------
Labels: zip (was: )
Post by Dawid Weiss (JIRA)
Support for ENHANCED_DEFLATED (Deflate64) in ZIP files
------------------------------------------------------
Key: COMPRESS-380
URL: https://issues.apache.org/jira/browse/COMPRESS-380
Project: Commons Compress
Issue Type: New Feature
Components: Archivers, Compressors
Reporter: Dawid Weiss
Labels: zip
Fix For: 1.16
Attachments: archive-deflate.zip, archive-deflate64.zip, archive.zip, archive64.zip, compress-380.diff, hello.world, input2
{code}
UnsupportedZipFeatureException: unsupported feature method 'ENHANCED_DEFLATED'
{code}
which is a bummer since JDK's implementation also doesn't support Deflate64. This seems to be PKWare's extensions, although code to decrypt it exists in zlib (and is appropriately licensed, I believe).
https://github.com/madler/zlib/tree/master/contrib/infback9
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Continue reading on narkive:
Search results for '[jira] [Created] (COMPRESS-380) Support for ENHANCED_DEFLATED (Deflate64) in ZIP files' (Questions and Answers)
7
replies
What are Zip files, please help?
started 2008-10-10 00:59:49 UTC
software
Loading...