Parler Data & Tools
Data & Tools: | |
Many contributors. Thanks to all. | |
ParlerAnalysis@protonmail.com - Do not expect timely replies. | |
Channel: #parlerparsers at https://webirc.hackint.org/ | |
#parlerparsers-video for video IDing | |
FBI Tips: https://tips.fbi.gov/digitalmedia/aad18481a3e8f02 | |
Want to help but don't know how? | |
Download copies of data and scripts. rehost them elsewhere, and seed torrents. | |
Help make this file easier for other to understand. | |
Develop ways to make data easy to visualize | |
Come ask in IRC about current efforts. | |
================================ | |
(1) Metadata json files with EXIF data on all MP4 videos scraped from Parler: | |
donk.sh/metadata.tar.gz | |
magnet:?xt=urn:btih:1723e27bc79186c4574ff056ddb458d771c26e2f&dn=metadata.tar.gz&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2 | |
SHA256: 66809d9ae0a5a6577a3c80bb623562274ceccd96b35519f15f568d09cefc56f8 metadata.tar.gz | |
(2) Script to download WARCS from archive.org once they process: | |
https://github.com/ozywog/parler-data-tools | |
(3) Magnet URI for torrent of file that contains 1.8 million texts scraped from | |
Parler and is subet of full data. Originally hosted on https://parler-archive.deadops.de/ | |
This is the parler_2020-01-06_posts-partial | |
magnet:?xt=urn:btih:FF29970B902657A32D561C0720E70FACFB8C4284&dn=parler_2020-01-06_posts-partial&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.internetwarriors.net%3a1337%2fannounce | |
(4) Script to generate a list of unique names and usernames then collect all the | |
posts and associate them with the person who posted them | |
Requires raw html source: | |
https://github.com/billstrobl/Prooter | |
https://github.com/billstrobl/Prooter/blob/master/prooter.py | |
(5) Script to scrape videos: video scraper: | |
https://github.com/Nithindanday/parlervideoscraper | |
You will need the metadata.tar.gz from (1) to use this | |
(6) JSON / CSV / KML Scrapes: | |
https://gofile.io/d/p8RxUC - csv with all non-zero lat/log from donk's josn | |
https://gofile.io/d/WVmqhR - quick 'n dirty KMLmade from the csv | |
View KML Data on map - See (9) | |
https://gofile.io/d/DsUUte - KML of posts made 1/6/2020, DC Area Only | |
https://gofile.io/d/zKTsWr - list of videos taken with 100m of a LE or gov't building, all-time | |
(7) Script to extract images/videos from WARCs: | |
https://gist.github.com/redd-dedd/9a200a9ba789f312faf53b25ac63e024 | |
(8) Needs to be sorted. | |
http://donk.sh/06d639b2-0252-4b1e-883b-f275eff7e792/ | |
https://web.archive.org/web/timemap/?url=https%3A%2F%2Fimage-cdn.parler.com%2F&matchType=prefix&collapse=urlkey&output=json&fl=original%2Cuniqcount&filter=!statuscode%3A%5B45%5D | |
https://irc.gammaspectra.live/eaa6fa678444b5f4/videos.txt | |
https://gist.github.com/kylemcdonald/8fdabd6526924012c1f5afe538d7dc09 | |
(9) Maps, both interactive and static heatmaps | |
kylemcdonald.net/parler/map/ | |
https://fortress.maptive.com/ver4/a3486a6ab9a9a12aa9a9cb067839079c/410491 | |
https://nithindanday.github.io/earth/index.html | |
=================================== | |
Videos From DC Area, Jan 6th. Estimated to only be about 10% of what was available, at this moment | |
https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw | |
https://mega.nz/file/Pkk2VSRT#x-Gnl1-FddGwHumBXAGsCJ2FL1VHE-Y-u2SFW48KpeQ | |
Some -notable Video IDs, list open to public contrib | |
https://docs.google.com/spreadsheets/d/1ThPUH5HgTcVKCoyfr2oJ21AWKTGq-dR-cRZjPOER-Q0/edit#gid=0 | |
=================================== | |
HOW TO VIEW WARC/ZSTD from ArchiveTeam's Parler scrape | |
# How to View Parler Archive "megawarc.warc.zst" files. | |
These are official zstd archive and warc standards. | |
They are uploading to: https://archive.org/details/archiveteam_neparlepas | |
$ tar -I zstd -xvf archive.tar.zst | |
===Old. | |
1. Install Python 3.7 | |
2. Execute: pip install zstandard==0.10.2 | |
3. Download archive from here: https://archive.org/details/archiveteam_neparlepas?tab=collection | |
4. Copy this script into a new file called xtract.py: https://hastebin.com/bugedubaxi.py | |
5. Execute: python ./xtract.py /path/to/parler_blahblah.megawarc.warc.zst > dict | |
6. Execute: zstd -d /path/to/parler_blahblah.megawarc.warc.zst -D dict | |
7. Import the decompressed parler_blahblah.megacarc.warc file into this tool: https://github.com/webrecorder/webrecorder-desktop | |
If you cannot install Python 3.7 for some reason, a dockerfile is available at: | |
https://gist.github.com/shoghicp/6ce05806ffc805929667ec2d4c62aba2 |
Here's a magnet of one of the WARC obtained before it went noindex+private from https://archive.org/download/archiveteam_parler_20210111054136_45579d03: |
Extract images and video from WARC: https://gist.github.com/redd-dedd/9a200a9ba789f312faf53b25ac63e024 |
|
Any additional magnets? |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
shoghicp commentedyesterday
•
edited
Made the instructions for extracting megawarc.warc.zst files into a small Docker container, for whoever can't just install Python 3.7 directly. https://gist.github.com/shoghicp/6ce05806ffc805929667ec2d4c62aba2
Nothing new in there, but should ease the setup of the extraction process.