Preservetube - A Youtube archival site.

With respect, it seems like you need to think on exactly what you expect your service to achieve. Youtube is estimated to have an exabyte of video data online right now. The situation you've found yourself in seems like becoming a victim of your own success, in a way.

One thought: If your goal is to preserve YouTube content against instances of censorship or deletion, then perhaps you should serve back content only when you detect the original version on YouTube is no longer online. You'd still have the storage overhead, but it's much cheaper if the bulk of your retained data is near- or off-line. Traffic costs should also reduce considerably, assuming those are non-negligible. Still doesn't solve the David vs. Goliath issue fully, but that problem seems fundamental in the operation of the service in any capacity.
 
This makes me think a a custom captcha like one of those traffic light selectors, but for if something should be archived, could work. Give the user a mix of known good archived videos ("Markiplier says nigger"), a known bad archive video("I'm a little teapot short animation for children"), and the rest are unseen videos and ask them to select which videos to delete or archive. You'll end up with those unseen videos categorized into 'treasure' and 'trash'.
 
One thought: If your goal is to preserve YouTube content against instances of censorship or deletion, then perhaps you should serve back content only when you detect the original version on YouTube is no longer online. You'd still have the storage overhead, but it's much cheaper if the bulk of your retained data is near- or off-line. Traffic costs should also reduce considerably, assuming those are non-negligible. Still doesn't solve the David vs. Goliath issue fully, but that problem seems fundamental in the operation of the service in any capacity.
How does this help storage costs? You can't monitor every video then grab it after it's been deleted. The solution is for people to have to log into their account and refresh their archive times. Add in limited storage space (pay for more) and then you have a real solution. People who archive and leave won't ever come back for their vids. People who use the service as intended will keep logging in and are likely to donate to back up more. And it's an income stream.

The idea of a free archive for everything is nice. But someone is always paying. The old nerd saying is "Free as in beer".
 
This is sad but understandable. A lot of the old internet history from my youth (pre-yt) is gone and I wish there was some way to make sure history won't repeat itself. But preservation is always in the hands of the people without the money. Local copies are of course the only true solution, but having some kind of distributed place helps as well.

In my opinion it's just so difficult to see what actually is worth backing up. The only real example I have is The Escapist. It shat the bed and got taken over by gamba jews. The old videos are still mostly up there on yt, but not all, and for how long. Jim Sterling and MovieBlob content is there. There's unlisted videos, so even if you check their channel you won't see everything. Is this something for archives? Who really wants to watch 15 year old vidya reviews by faggots. I would not be surprised if the preservetube copies have zero views.

Maybe some voting system where videos can be tagged for removal and you can reverse it by voting. I guess it's the same as counting views though, only you vote by watching. It'd be nice if you could vote for keeping certain channels and all their associated videos though, without watching every single one.
 
How does this help storage costs? You can't monitor every video then grab it after it's been deleted. The solution is for people to have to log into their account and refresh their archive times. Add in limited storage space (pay for more) and then you have a real solution. People who archive and leave won't ever come back for their vids. People who use the service as intended will keep logging in and are likely to donate to back up more. And it's an income stream.

The idea of a free archive for everything is nice. But someone is always paying. The old nerd saying is "Free as in beer".
Grab the video and put it in LTO cold storage, then monitor the archived link to see if the video is taken down. When it is, load the video from cold storage onto the server. That way, only videos that have been taken down take up server space.
 
Agreeing with many others itt. Make users log in. With logins you can track who is uploading good media and who is uploading "48 hours of rain noises," and could reward and punish users respectively. Limit user up/downloads to x#/day and then have different payment tiers to increase them. It would also be neat if there was a way to link KF accounts in the account making process so that users could recognize each other, but that might be stupid.
 
Preservetube already downloads low quality 480p or 360p VP9 streams from Youtube automatically. The fact that no one here knows this kind of proves that to date not a single person has watched a video on PreserveTube.
Alright yes, I didn't bother actually checking any archived videos. The newest videos are AV1 encoded, slightly older ones use VP9 but the very oldest are still encoded using H264 such as this Concorde video so there is room for improvement.
 
I don’t know what your hardware is like but maybe look into one of those tape storage thingies and only have the videos be readily accessible for like a month. Afterwards they would go into long term storage to be restored to the main website when a video gets taken down from YouTube itself.

I’ll send you a couple bucks for archiving my favorite retarded canadian.
Maybe magnetic tape is good for enterprise solutions but for an individual you're honestly just better off getting an external BD-R drive and backing up videos to a good quality BD-R disc. If you store them in a cool dark place like a box in your closet you can easily get fifteen years out of them, and the low cost means you can justify redundant copies. If you include backup information on the disc you can stretch that lifespan even further.
A Blu-ray external writer is under a hundred bucks, and with the cost of solid state drives nowadays it's actually competitive for personal use to just burn discs.

T. I run a plex server, a peertube instance, and have a big library of video game trolling videos backed up to Blu-ray disc so that the future generations can see the heroic exploits of Sebastian Brightbottom
 
If anyone has any ideas on how this can be worked on, I'd be more than open to them.
Given fast accessibility of files is a way lower priority than the preservation of them, I'd look at lowering the $/TB storage cost as much as possible. Whether it's grabbing an old JBOD like a NetApp DS4246 and loading it up with whatever trash you can find, or tiering off to magnetic storage, just try to get the costs way down even if it means people have to queue for files.

Could also make it possible for people to nominate videos with no preservation value to be removed?
Requires lab-like conditions to properly store magnetic tape. Also they don't really last that long. Interesting idea though
Just needs to be climate and humidity controlled. i.e. stored in a room with AC.
When I tried to search it, instead I found the following:
You can filter out modern slop using the before: filter, like "bunny yelling before:2009" just gives me ancient home videos of bunnies.
 
Last edited:
How does this help storage costs? You can't monitor every video then grab it after it's been deleted. The solution is for people to have to log into their account and refresh their archive times. Add in limited storage space (pay for more) and then you have a real solution. People who archive and leave won't ever come back for their vids. People who use the service as intended will keep logging in and are likely to donate to back up more. And it's an income stream.

The idea of a free archive for everything is nice. But someone is always paying. The old nerd saying is "Free as in beer".
I may have described it poorly. The site continues to do what it does now - Grab a copy of a YouTube video whenever asked to do so by a web client. But instead of then immediately making it available on PreserveTube for playback, it does not serve the video until it detects the original is no longer available.

Right now, PreserveTube keeps all of the videos it 'archives' online and available. Online storage is the most expensive tiers of storage in a datacenter. Near-Line storage (data is stored, but not instantly available for access) is much cheaper. Offline storage (Tapes, powered-down disks) is cheaper again.

By detecting when media on YouTube becomes unavailable and bringing your archived copy back from Near- or Off-line storage to be served up, you minimize the amount of expensive Online storage you need. The trade-off is more site logic needed to detect unavailable content and bring cold copies back from storage to live.
 
I may have described it poorly. The site continues to do what it does now - Grab a copy of a YouTube video whenever asked to do so by a web client. But instead of then immediately making it available on PreserveTube for playback, it does not serve the video until it detects the original is no longer available.
I think this is a great idea, @PreserveTube, do you think this would help the site? It would also prevent users from using the site as a downloader.
 
I doubt it, because the complaint was that the archive is 65 TB and growing. Storage is the problem. He was saying that one of the issues was that people weren't watching the videos, he wanted them to watch them.
 
3. Idk what your philosophy regarding your site is, but perhaps you could delete music videos with static backgrounds since that's music and not a video IMO or alternatively store them as audio files with an embedded thumbnail to further save space.
You can actually take these and make the video 1 frame per minute and get the same thing while saving 95% of the space. However I don't think this is going to be worth the effort. The potential space savings will be small since most videos aren't static background music videos. Detecting them will be annoying and there could be false positives that cause issues.

5. If push comes to shove, charge a small fee
Any fees run the risk of getting alphabets ire since you'd be "making money" off their copyrighted content.

With pricing like $0.00099 per GB ($0.99 per TB) per month, you could save on monthly costs while long term programmatic solutions are developed (pruning slop, etc). I think having to wait up to 12 hours to pull out a video from storage and have it accessible on the site is a fair compromise (for keeping costs down) if the content hasn't even been looked at in a year. If something becomes relevant again (after years), and such content is requested to be made available, are there really scenarios in which is must be available immediately?
Ah yes. Rent your archival data from amazon. And what a steal at 1$/mo./TB. After all buying storage is so messy and the costs! HDD storage is over 10$/TB now so... doing some napkin math here... carry the one... Look, I'm no math genius so "10 / 1" is hard for me but amazon is clearly cheaper for at least a couple years right? You also pay through the nose to retrieve data. You only looked at the monthly storage rate. Glacier is supposed to be insurance / never need it type deal. Braindead suggestion.

One thought: If your goal is to preserve YouTube content against instances of censorship or deletion, then perhaps you should serve back content only when you detect the original version on YouTube is no longer online. You'd still have the storage overhead, but it's much cheaper if the bulk of your retained data is near- or off-line. Traffic costs should also reduce considerably, assuming those are non-negligible.
This is an excellent suggestion and you could even just embed the existing youtube video in the preserve tube page. The only stumbling block are things that were muted / edited on youtube but a button to "no, get me the archive" would solve that.

I still think going AV1 only is the best first step. There's no trying to decided what is worth it to save, no logins, no token systems, etc. etc. It's clean and straightforward. Even if you end up having to prune things at some point this is a permanent buff. @geckogoy You still got that A310 you're willing to give away? It could be useful to my nigga @PreserveTube.
 
You still got that A310 you're willing to give away? It could be useful to my nigga @PreserveTube.
What, seriously? Sure I guess, anything for preserve tbh tube
 
Back
Top Bottom