zip 3.0.0: unzip over HTTP

Yesterday I published zip 3.0.0. In this post I discuss some of the coolest new features in zip 3.0.0. See the change log for the complete list of changes.

Remote unzip

zip_list() and unzip() can now work directly with HTTP(S) URLs. zip_list() only downloads the directory of the entries from the zip file, and unzip() only downloads the directory and the requested entries. Listing the files and extracting a few entries does not download the whole file.

For this to work the web server needs to support range requests. Most web servers do, but not all of them. Notably, when downloading the contents of a GitHub repository as a zip file, the web server does not support range requests, so zip_list() and unzip() will always fall back to downloading the whole file.

zip needs the curl package to be installed for HTTP(S) URLs to work.

This was requested in issue #39.

Password support

zip now supports passwords, both when compressing and uncompressing. It supports the (unsecure) PKWARE ZipCrypto stream cipher and two (secure) AES ciphers. See the password argument of unzip and zip().

This was requested in issue #38.

Vectorized, concurrent unzip()

unzip() can now handle a vector of zip files to uncompress. Moreover, unzip() will use a pool of threads to uncompress the files concurrently. You can set the zip_threads option or the ZIP_THREADS environment variable to control the size of the thread pool.

Progress bars

unzip() and zip() now can create progress bars when the cli package is installed. For zip() the progress bar is byte-level, so zipping a large file will produce a smooth progress bar. For unzip() the progress bar only counts the extracted entries.

Progress bars are currently opt-in, the ZIP_PROGRESS=true environment variable or the zip.progress option. I did this to avoid unexpected progress bars when using zip downstream. E.g. pak has its own progress bars, and zip’s new progress bars would possibly garble them when pak calls zip to uncompress R package files.

This was requested in issue #48.

unzip_process fallback

zip includes two small command line executables (cmdzip and cmdunzip) that are lightweight versions of zip() and unzip() and run independently of R. The main motivation for this is that pak uses zip to install (uncompress, really) binary packages on Windows. Starting a cmdunzip process is very fast compared to starting a new R process, and pak starts a number of concurrent cmdunzip processes to install many binary packages quickly.

This usually works great, but sometimes the cmdunzip process is blocked by system policies. It is quite reasonable to block executables that are included in R packages. Currently pak just fails in this case, and the only workaround is to avoid pak or to whitelist the cmdunzip process.

zip 3.0.0 includes a fallback mechanism for this, and if cmdunzip cannot run, then it will use unzip() in an R subprocess. pak will update to use zip 3.0.0 in the next release.

This was requested in issue #135.

Other notable changes

  • unzip() now returns a data frame with data about the uncompressed files, in the same format as zip_list() (Issue #35.)
  • zip_list() and unzip() now do a much better job with file names in non-UTF-8 encodings. (Issue #101.)
  • zip_append() and zipr_append() now replace existing entries when appending a file whose archive path already exists in the zip file, instead of creating duplicate entries. (Issue #111.)
  • New keys argument to zip(), zipr() and zip_append() lets you specify custom paths for entries inside the archive. (Issue #50.)

Thank you!

All the new features in zip 3.0.0 were requested by people in the community. I thank all contributors to zip so far, for opening issues, submitting pull requests, and providing feedback: @8qube, @alliesaizan, @AndreM84, @ArtemSokolov, @AshesITR, @babayoshihiko, @bart1, @batpigandme, @bersbersbers, @cboettig, @chainsawriot, @cimentadaj, @context-dependent, @cstepper, @daattali, @davidgohel, @dhersz, @dovydas88, @dracodoc, @egillax, @emmamendelsohn, @enricoschumann, @fproske, @FrancoisR95, @fsteinhi, @gacolitti, @hhmacedo, @jbfagotfede39, @jeanchristophechiem, @jefferis, @jennybc, @jeroen, @jeroenjanssens, @jimhester, @jwijffels, @k5cents, @lz100, @m-muecke, @madihahamza786-debug, @md0u80c9, @MichaelChirico, @Minhoux, @mirickmi, @MislavSag, @Moohan, @msgoussi, @nymphs97, @philipp-baumann, @QuLogic, @RadioPete24, @rosinnia, @schuemie, @scott-uses-git, @scschwa, @sda030, @sebffischer, @sjentsch, @skeydan, @stefanoborini, @tradeli, @Triamus, @weshinsley, @wibeasley, @WilDoane, @xhdong-umd, @yusuzech, and @zeehio.