TZ Archiver CLI

⚠️ Warning:
This repository is in experimental state. 90% of the code has been generated by Sonnet 4 based on the TypeScript code from the Tezos Archiver website.

A Python command-line tool for archiving Tezos NFTs to the Wayback Machine. This tool fetches NFT metadata from Tezos wallets using the TzKT API and automatically archives IPFS-hosted artifacts to ensure long-term preservation.

Prerequisites

Python 3.10+
Internet Archive account with API access
Required Python packages (see installation)

Installation

Clone the repository:

git clone https://github.com/melon-dog/tz-archiver-cli.git
cd tz-archiver-cli

Install dependencies:

pip install requests python-dotenv wayback-utils

Create a .env file in the src/ directory with your Internet Archive credentials:

ARCHIVE_ACCESS=your_access_key_here
ARCHIVE_SECRET=your_secret_key_here

Usage

Basic Usage

Archive NFTs from a specific Tezos wallet:

python src/main.py -w tz1U7C2NVwbhdvG3fJixLLUWUyZHuXWNiF7V

Spider Mode (Random Discovery)

Run without specifying a wallet to archive random tokens:

python src/main.py

Advanced Usage

Specify a custom limit for the number of tokens to process:

python src/main.py -w tz1U7C2NVwbhdvG3fJixLLUWUyZHuXWNiF7V -l 500

Command-line Options

-w, --wallet (optional): Tezos wallet address (e.g., tz1...). If not provided, runs in spider mode
-l, --limit (optional): Number of tokens to process (default: 10,000)
-h, --help: Show detailed help message with examples

How It Works

1. Token Discovery

The tool queries the TzKT API to find:

Minted tokens: Tokens created by the wallet
Owned tokens: Tokens currently in the wallet
Contract tokens: Tokens from contracts associated with the wallet

2. IPFS Detection

Scans token metadata for artifactUri fields containing IPFS URLs (ipfs://...)

3. Smart Archiving Process

For each IPFS artifact:

Pre-check: Verifies if already archived (doesn't count for rate limit)
Rate limiting: Only applied to actual archiving operations
URL conversion: Converts IPFS CID to HTTP URL via ipfs.fileship.xyz
Wayback submission: Submits for archiving with optimized parameters

4. Concurrent Processing

Maintains up to 4 concurrent archiving processes
Smart queue management with available slot detection
Automatic retry logic for failed operations

5. State Persistence

All data is automatically saved to src/data/:

processed_cids.json: Successfully processed IPFS CIDs
errors_cids.json: CIDs that failed to archive (for manual retry)

6. Resume Capability

The tool automatically:

Loads previous session data on startup
Skips already processed CIDs
Continues from where it left off

Configuration

Environment Variables

Create a .env file in the src/ directory:

ARCHIVE_ACCESS=your_access_key_here
ARCHIVE_SECRET=your_secret_key_here

Note: You can obtain your API keys at the following link:
https://archive.org/account/s3.php

Rate Limiting

The tool implements intelligent rate limiting:

Wayback Machine limit: 12 captures/minute (configurable)
Check operations: wayback.indexed() calls don't count towards limit
Archive operations: Only wayback.save() calls count towards limit
Sliding window: 60-second rolling window for accurate rate tracking

Archiving Parameters

Optimized Wayback Machine settings:

js_behavior_timeout: 7 seconds
delay_wb_availability: False
if_not_archived_within: 31,536,000 seconds (1 year)
max_concurrent_processes: 4

Project Structure

tz-archiver-cli/
├── src/
│   ├── data/                     # Persistent state storage (auto-created)
│   │   ├── processed_cids.json   # Successfully processed CIDs
│   │   └── errors_cids.json      # Failed CIDs for retry
│   ├── utils/                    # Utility modules
│   │   ├── __init__.py           # Package initialization
│   │   ├── logger.py             # Colored logging system
│   │   └── tzkt.py               # TzKT API client with full type hints
│   ├── main.py                   # CLI entry point with argument parsing
│   ├── config.py                 # Centralized configuration management
│   ├── processor.py              # Core business logic and rate limiting
│   ├── archiver.py               # Wayback Machine integration
│   ├── state_manager.py          # Persistent state management
│   └── .env                      # Environment variables (create this)
├── README.md                     # This documentation
└── requirements.txt              # Python dependencies (optional)

Architecture

Core Components

main.py: CLI entry point with comprehensive argument validation
processor.py: Token processing with smart rate limiting
archiver.py: Wayback Machine integration with concurrency control
state_manager.py: Atomic file operations for data persistence
config.py: Centralized configuration with environment variable support
utils/logger.py: Advanced logging with ANSI colors and Windows compatibility
utils/tzkt.py: Fully typed TzKT API client with comprehensive dataclasses

API Integration

TzKT API

Integrates with TzKT API for Tezos blockchain data:

Mints: /v1/tokens?firstMinter={address}&limit={limit}
Balances: /v1/tokens/balances?account={address}&limit={limit}
Contract Tokens: /v1/tokens?contract={address}&limit={limit}
Random Tokens: /v1/tokens?select=*&limit={limit}&sort=random

Wayback Machine API

Uses wayback-utils library:

Check Archive Status: wayback.indexed() (doesn't count for rate limit)
Submit for Archiving: wayback.save() (counts for rate limit)
Rate Limiting: 12 captures/minute with sliding window algorithm

Data Persistence

The tool automatically creates a data/ folder in the source directory to store:

processed_cids.json: List of successfully processed IPFS CIDs with timestamps
errors_cids.json: List of CIDs that failed to archive (for manual retry)

Data format:

{
  "processed_cids": ["Qm...", "bafy..."],
  "errors_cids": ["Qm...", "bafy..."],
}

Performance Features

Smart caching: Avoids reprocessing already handled CIDs
Concurrent processing: Up to 4 parallel archiving operations
Rate limit optimization: Only counts actual archiving requests
Memory efficient: Streams data and processes in batches
Resumable sessions: No work lost on interruption

Contributing

Fork the repository
Make your changes with proper type hints
Ensure code follows the established patterns
Submit a pull request

License

MIT License - see LICENSE file for details

Important Notes

Rate Limits: The tool respects Wayback Machine's 12 captures/minute limit
Processing Time: Large collections may take significant time to process
Asynchronous Results: Internet Archive archiving is asynchronous - results may not be immediately available
Network Dependency: Requires stable internet connection for API calls
Storage: Local state files grow with processed CID count

Advanced Usage Examples

Resume a Previous Session

# Simply run the same command - the tool automatically resumes
python src/main.py -w tz1YourWalletAddress

Monitor Rate Limiting

# The tool displays current rate status:
# "Archiving CID (rate: 8/12/min): QmHashHere"

Process Multiple Wallets

# Process different wallets sequentially
python src/main.py -w tz1FirstWallet -l 1000
python src/main.py -w tz2SecondWallet -l 1000

Spider Mode for Discovery

# Continuous random token discovery
python src/main.py
# Press Ctrl+C to stop gracefully

Generated with ❤️ for the Tezos NFT community

Name	Name	Last commit message	Last commit date
Latest commit melon-dog Bump version to 1.0.2 in __init__.py Sep 6, 2025 20b8026 · Sep 6, 2025 History 38 Commits
src	src	Bump version to 1.0.2 in __init__.py	Sep 6, 2025
.gitignore	.gitignore	Implement TZ Archiver CLI for archiving Tezos NFTs	Aug 30, 2025
LICENSE	LICENSE	Initial commit	Aug 30, 2025
README.md	README.md	Update README.md	Sep 4, 2025
TECHNICAL.md	TECHNICAL.md	A little less cringe.	Aug 31, 2025
requirements.txt	requirements.txt	Implement TZ Archiver CLI for archiving Tezos NFTs	Aug 30, 2025

License

melon-dog/tz-archiver-cli

Folders and files

Latest commit

History

Repository files navigation

TZ Archiver CLI

Prerequisites

Installation

Usage

Basic Usage

Spider Mode (Random Discovery)

Advanced Usage

Command-line Options

How It Works

1. Token Discovery

2. IPFS Detection

3. Smart Archiving Process

4. Concurrent Processing

5. State Persistence

6. Resume Capability

Configuration

Environment Variables

Rate Limiting

Archiving Parameters

Project Structure

Architecture

Core Components

API Integration

TzKT API

Wayback Machine API

Data Persistence

Performance Features

Contributing

License

Important Notes

Advanced Usage Examples

Resume a Previous Session

Monitor Rate Limiting

Process Multiple Wallets

Spider Mode for Discovery

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages

Languages