⚠️ Warning:
This repository is in experimental state. 90% of the code has been generated by Sonnet 4 based on the TypeScript code from the Tezos Archiver website.
A Python command-line tool for archiving Tezos NFTs to the Wayback Machine. This tool fetches NFT metadata from Tezos wallets using the TzKT API and automatically archives IPFS-hosted artifacts to ensure long-term preservation.
- Python 3.10+
- Internet Archive account with API access
- Required Python packages (see installation)
- Clone the repository:
git clone https://github.com/melon-dog/tz-archiver-cli.git
cd tz-archiver-cli- Install dependencies:
pip install requests python-dotenv wayback-utils- Create a
.envfile in thesrc/directory with your Internet Archive credentials:
ARCHIVE_ACCESS=your_access_key_here
ARCHIVE_SECRET=your_secret_key_hereArchive NFTs from a specific Tezos wallet:
python src/main.py -w tz1U7C2NVwbhdvG3fJixLLUWUyZHuXWNiF7VRun without specifying a wallet to archive random tokens:
python src/main.pySpecify a custom limit for the number of tokens to process:
python src/main.py -w tz1U7C2NVwbhdvG3fJixLLUWUyZHuXWNiF7V -l 500-w, --wallet(optional): Tezos wallet address (e.g., tz1...). If not provided, runs in spider mode-l, --limit(optional): Number of tokens to process (default: 10,000)-h, --help: Show detailed help message with examples
The tool queries the TzKT API to find:
- Minted tokens: Tokens created by the wallet
- Owned tokens: Tokens currently in the wallet
- Contract tokens: Tokens from contracts associated with the wallet
Scans token metadata for artifactUri fields containing IPFS URLs (ipfs://...)
For each IPFS artifact:
- Pre-check: Verifies if already archived (doesn't count for rate limit)
- Rate limiting: Only applied to actual archiving operations
- URL conversion: Converts IPFS CID to HTTP URL via
ipfs.fileship.xyz - Wayback submission: Submits for archiving with optimized parameters
- Maintains up to 4 concurrent archiving processes
- Smart queue management with available slot detection
- Automatic retry logic for failed operations
All data is automatically saved to src/data/:
processed_cids.json: Successfully processed IPFS CIDserrors_cids.json: CIDs that failed to archive (for manual retry)
The tool automatically:
- Loads previous session data on startup
- Skips already processed CIDs
- Continues from where it left off
Create a .env file in the src/ directory:
ARCHIVE_ACCESS=your_access_key_here
ARCHIVE_SECRET=your_secret_key_hereNote: You can obtain your API keys at the following link:
https://archive.org/account/s3.php
The tool implements intelligent rate limiting:
- Wayback Machine limit: 12 captures/minute (configurable)
- Check operations:
wayback.indexed()calls don't count towards limit - Archive operations: Only
wayback.save()calls count towards limit - Sliding window: 60-second rolling window for accurate rate tracking
Optimized Wayback Machine settings:
js_behavior_timeout: 7 secondsdelay_wb_availability: Falseif_not_archived_within: 31,536,000 seconds (1 year)max_concurrent_processes: 4
tz-archiver-cli/
├── src/
│ ├── data/ # Persistent state storage (auto-created)
│ │ ├── processed_cids.json # Successfully processed CIDs
│ │ └── errors_cids.json # Failed CIDs for retry
│ ├── utils/ # Utility modules
│ │ ├── __init__.py # Package initialization
│ │ ├── logger.py # Colored logging system
│ │ └── tzkt.py # TzKT API client with full type hints
│ ├── main.py # CLI entry point with argument parsing
│ ├── config.py # Centralized configuration management
│ ├── processor.py # Core business logic and rate limiting
│ ├── archiver.py # Wayback Machine integration
│ ├── state_manager.py # Persistent state management
│ └── .env # Environment variables (create this)
├── README.md # This documentation
└── requirements.txt # Python dependencies (optional)
main.py: CLI entry point with comprehensive argument validationprocessor.py: Token processing with smart rate limitingarchiver.py: Wayback Machine integration with concurrency controlstate_manager.py: Atomic file operations for data persistenceconfig.py: Centralized configuration with environment variable supportutils/logger.py: Advanced logging with ANSI colors and Windows compatibilityutils/tzkt.py: Fully typed TzKT API client with comprehensive dataclasses
Integrates with TzKT API for Tezos blockchain data:
- Mints:
/v1/tokens?firstMinter={address}&limit={limit} - Balances:
/v1/tokens/balances?account={address}&limit={limit} - Contract Tokens:
/v1/tokens?contract={address}&limit={limit} - Random Tokens:
/v1/tokens?select=*&limit={limit}&sort=random
Uses wayback-utils library:
- Check Archive Status:
wayback.indexed()(doesn't count for rate limit) - Submit for Archiving:
wayback.save()(counts for rate limit) - Rate Limiting: 12 captures/minute with sliding window algorithm
The tool automatically creates a data/ folder in the source directory to store:
- processed_cids.json: List of successfully processed IPFS CIDs with timestamps
- errors_cids.json: List of CIDs that failed to archive (for manual retry)
Data format:
{
"processed_cids": ["Qm...", "bafy..."],
"errors_cids": ["Qm...", "bafy..."],
}- Smart caching: Avoids reprocessing already handled CIDs
- Concurrent processing: Up to 4 parallel archiving operations
- Rate limit optimization: Only counts actual archiving requests
- Memory efficient: Streams data and processes in batches
- Resumable sessions: No work lost on interruption
- Fork the repository
- Make your changes with proper type hints
- Ensure code follows the established patterns
- Submit a pull request
MIT License - see LICENSE file for details
- Rate Limits: The tool respects Wayback Machine's 12 captures/minute limit
- Processing Time: Large collections may take significant time to process
- Asynchronous Results: Internet Archive archiving is asynchronous - results may not be immediately available
- Network Dependency: Requires stable internet connection for API calls
- Storage: Local state files grow with processed CID count
# Simply run the same command - the tool automatically resumes
python src/main.py -w tz1YourWalletAddress# The tool displays current rate status:
# "Archiving CID (rate: 8/12/min): QmHashHere"# Process different wallets sequentially
python src/main.py -w tz1FirstWallet -l 1000
python src/main.py -w tz2SecondWallet -l 1000# Continuous random token discovery
python src/main.py
# Press Ctrl+C to stop gracefullyGenerated with ❤️ for the Tezos NFT community