LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old Today, 12:42 AM   #1
exerceo
Member
 
Registered: Oct 2022
Posts: 128

Rep: Reputation: 30
Question How to split a gzip into valid smaller files?


[Log in to get rid of this advertisement]
How to split a gzip file to chunks of a pre-determined size that are valid gzip files themselves?

Something like split -b 4095M example.img.gz would not work given that it would cut through the internal structure of the gzip file. My goal is for the gzip chunks to be re-asssemble-able from multiple devices, so each gzip part file has to be valid on its own.

The gzip file itself should be of a predetermined size, not the content it expands to.

For re-assembling, the following is useless because it requires all the files to be available at the same time, making it useless for splitting across devices:
Code:
cat example.img.gz.part* |gzip -d -c >> exaple.img
What I want is this:
Code:
gzip -d -c example.img.part*.gz >> example.img
I want each part of the gzip to be a valid gzip file of its own, so they can be assembled from multiple devices.

The pre-determined size doesn't have to be exact, but within a few megabytes of a desired value. For example, if I want 4096 MiB chunks, something like 4090 MiB is acceptable too.

This is already possible with bzip2 by "abusing" bzip2recover. Normally, bzip2recover was intended for merging undamaged parts of copies of damaged archives, but it can be used to break down any bzip2 into one file per bzip2 block. This way, multiple blocks can be concatenated into parts of any desired approximate size.

The reason I prefer gzip for this purpose is its much faster speed.

Last edited by exerceo; Today at 12:47 AM. Reason: added clarification
 
Old Today, 03:38 AM   #2
lvm_
Senior Member
 
Registered: Jul 2020
Posts: 1,602

Rep: Reputation: 545Reputation: 545Reputation: 545Reputation: 545Reputation: 545Reputation: 545
Can't be done. You'll have to use archiver with multi-volume support - 7z or rar, and if speed is so important - reduce compression quality. Or you may split a single gzip archive using split and combine it back into a named pipe attached at the other end to the decompressing process, this way it will wait for parts. Technique to prevent pipe from closing after each file is described e.g. here https://superuser.com/questions/7664...to-named-pipes

Last edited by lvm_; Today at 03:50 AM.
 
Old Today, 04:13 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,418

Rep: Reputation: 4197Reputation: 4197Reputation: 4197Reputation: 4197Reputation: 4197Reputation: 4197Reputation: 4197Reputation: 4197Reputation: 4197Reputation: 4197Reputation: 4197
I was thinking similar. Maybe lzip as well - I like the idea of lziprecover and ddrescue but haven't pursued it as I should have.
 
Old Today, 04:31 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 24,514

Rep: Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064
Quote:
Originally Posted by exerceo View Post
How to split a gzip file to chunks of a pre-determined size that are valid gzip files themselves?

The gzip file itself should be of a predetermined size, not the content it expands to.
That is just wrong. the size of the "result" is determined by the user, the parameters passed to it.
gzip produces a single compressed archive.
If you want to make smaller archives you need to split the original files.
Additionally the size of the result cannot be calculated without actually doing the compression (because it depends on the content and the algorithm too), therefore you can never predict the exact size of it.
You can use gzip file by file too, and play with them to construct your preferred archive, just it will take a very long time.

Even in case of 7zip you need to have all the parts available to be able to unpack: https://askubuntu.com/questions/1342...t-7zip-archive.
By the way we have a gzip recover too, but that still won't help on it.
 
Old Today, 07:28 AM   #5
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,994

Rep: Reputation: 2863Reputation: 2863Reputation: 2863Reputation: 2863Reputation: 2863Reputation: 2863Reputation: 2863Reputation: 2863Reputation: 2863Reputation: 2863Reputation: 2863
Quote:
Originally Posted by exerceo View Post
My goal is for the gzip chunks to be re-asssemble-able from multiple devices
Why don't you describe what you're actually trying to achieve?

 
Old Today, 11:21 AM   #6
exerceo
Member
 
Registered: Oct 2022
Posts: 128

Original Poster
Rep: Reputation: 30
Lightbulb gzip splitting

Quote:
Originally Posted by lvm_ View Post
Can't be done.
Quote:
Originally Posted by pan64 View Post
That is just wrong. the size of the "result" is determined by the user, the parameters passed to it.
gzip produces a single compressed archive.
If you want to make smaller archives you need to split the original files.
Additionally the size of the result cannot be calculated without actually doing the compression (because it depends on the content and the algorithm too), therefore you can never predict the exact size of it.
You can use gzip file by file too, and play with them to construct your preferred archive, just it will take a very long time.
While it indeed is impossible to predict a compressed size without doing the compression work, it should be technically possible to check the size of the gzip while it is being created. Once it comes as close as possible below the preferred size, the compressor should close the file and start a new gzip file. My goal is that each gzip is a valid file of its own.

I want to store huge a huge tar file with compression across smaller flash drives so it can be reassembled later by gunzipping the individual gzip's back into the original uncompressed file. I want a pre-determined size in order to waste as little space as possible.

Splitting after gzipping would require two passes to get back to the original data. First, reassembling the gzip file and then decompressing it, because splitting a gzip at an arbitrary position will corrupt data near the edges. So you'd need two passes to get back to the original data.

But if you had the data split across intact gzip files, you could gzip -d -c each gzip file back into the original file in a single pass. You don't have to reassemble the gzip file. You can directly reassemble the original uncompressed file from the gzip pieces.

I am surprised nothing like this has been implemented in decades. It is clearly doable.
 
  


Reply

Tags
gzip



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: How to Split Large Text File into Smaller Files in Linux LXer Syndicated Linux News 0 07-05-2022 01:41 PM
LXer: How to split a large archive*file into multiple small files using Split command in Linux LXer Syndicated Linux News 0 11-07-2016 05:20 PM
Split large file into smaller files mikes88 Programming 29 03-22-2012 10:14 AM
how to sort text file and split into smaller files michaeljoser Linux - Software 8 10-19-2007 01:50 AM
Compress and split a big sized file into smaller files hicham007 Programming 3 07-28-2005 08:56 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:28 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration