Menu

Overview

Relevant source files

This document provides a high-level introduction to PaperEater, explaining its purpose, architecture, and core components. It describes what PaperEater is, how it is structured as a Tauri desktop application, and the main workflows it implements. For detailed information about specific subsystems, see the following pages:


What is PaperEater

PaperEater is a desktop application for browsing AI/ML research papers from arXiv. It fetches the 2000 most recent computer science papers from arXiv's listing page, applies keyword-based filtering to identify AI/ML-related publications, and provides PDF download capabilities with progress tracking. An optional DeepL integration translates paper titles to Japanese.

The application addresses the problem of information overload in academic research by automatically filtering arXiv's daily paper flood to surface only publications relevant to current AI/ML trends (LLMs, diffusion models, specific companies like OpenAI/Anthropic, etc.).

Sources: README.md1-11

Target Use Case

ScenarioDescription
Daily Research MonitoringResearchers check for new AI/ML papers each day without manually scanning hundreds of publications
Trend TrackingUsers interested in specific topics (e.g., "Claude", "Sora", "RAG") get automatic filtering
Japanese ResearchersOptional translation enables non-English speakers to quickly understand paper topics
Offline AccessPDF downloads allow reading papers without continuous internet connectivity

Sources: README.md15-45


System Architecture Overview

PaperEater is built on the Tauri framework, which provides a three-layer architecture: a web-based frontend (HTML/JavaScript), a native Rust backend, and a bridge layer that enables communication between them.

Three-Layer Architecture Diagram

Sources: package.json12-16 README.md87-96

Layer Responsibilities

LayerTechnologyPrimary Responsibilities
FrontendHTML, JavaScriptBusiness logic: arXiv parsing, keyword filtering, UI rendering, translation workflow
BridgeTauri PluginsCapability-restricted APIs: HTTP requests, file system access, URL opening
BackendRustRuntime management: window creation, plugin registration, native OS integration

The frontend contains most of the application logic, while the backend primarily serves as a secure runtime environment that exposes controlled native capabilities through the plugin system.

Sources: package.json1-22


Core Components and Workflows

Component Interaction Diagram

Sources: README.md15-45

Workflow Sequence

  1. Application Launch: User starts PaperEater desktop application
  2. Data Acquisition: Frontend calls fetchPapers() which uses @tauri-apps/plugin-http to GET https://arxiv.org/list/cs/recent?show=2000
  3. Parsing: HTML response is parsed to extract paper metadata (title, authors, arXiv ID, submission date)
  4. Filtering: isInterestingPaper() evaluates each paper against KEYWORD_PATTERNS, checking for AI/ML-related terms
  5. Display: Filtered papers (typically 10-50 from the original 2000) are rendered to the #papers div
  6. Translation (Optional): If DEEPL_API_KEY is configured, each title is translated via POST to DeepL API
  7. User Interaction: User can click "Open in Browser" (via @tauri-apps/plugin-opener) or "Download PDF" (streaming download with progress)

Sources: README.md17-45


Technology Stack

Frontend Stack

ComponentTechnologyPurpose
UI LayerHTML, JavaScriptSingle-page application in index.html
Build ToolVite 6.xDevelopment server and production bundling
Type CheckingTypeScript 5.6.xStatic type verification (compilation only, no runtime)
Module SystemES ModulesModern JavaScript module format ("type": "module")
Tauri API@tauri-apps/api v2Frontend-to-backend communication interface

Sources: package.json1-22

Backend Stack

ComponentTechnologyPurpose
RuntimeRust (stable)Native application backend
FrameworkTauri v2Cross-platform desktop framework
HTTP Clienttauri-plugin-http v2Network requests with capability restrictions
File Openertauri-plugin-opener v2Launch external URLs and files
Build SystemCargoRust package manager and build orchestrator

Sources: package.json12-22 README.md48-53

Cross-Platform Support

PaperEater runs natively on:

  • Windows: Uses WebView2 (Chromium-based) for rendering
  • macOS: Uses WebKit with Objective-C bindings
  • Linux: Uses WebKit2GTK (not actively tested but supported by Tauri)

The Tauri framework abstracts platform differences, allowing a single codebase to compile to native executables for each platform.

Sources: README.md76-83


Project Organization

Directory Structure

PaperEater/
├── index.html                    # Frontend application (HTML + embedded JS)
├── package.json                  # npm package manifest
├── README.md                     # Project documentation (Japanese)
├── LICENSE                       # MIT License
├── .gitignore                    # Git exclusion patterns
├── .vscode/                      # VS Code workspace settings
│   └── extensions.json           # Recommended extensions
├── .github/
│   └── workflows/
│       └── tauri-build.yml       # CI/CD for Windows/macOS builds
└── src-tauri/                    # Rust backend directory
    ├── Cargo.toml                # Rust package manifest
    ├── tauri.conf.json           # Tauri application configuration
    ├── build.rs                  # Rust build script
    ├── src/
    │   └── lib.rs                # Main Rust entry point
    ├── capabilities/
    │   └── default.json          # Security permissions configuration
    ├── icons/                    # Application icon assets
    └── target/                   # Build output directory (gitignored)

Sources: README.md87-96

Key Files by Purpose

FilePurposeRelated Documentation
index.htmlFrontend application containing all business logicFrontend Architecture
src-tauri/src/lib.rsRust backend entry point for Tauri runtimeBackend Architecture
src-tauri/capabilities/default.jsonCapability-based security configurationSecurity and Capabilities
package.jsonFrontend dependency management and npm scriptsFrontend Dependencies
src-tauri/Cargo.tomlBackend dependency management and package metadataBackend Dependencies
.github/workflows/tauri-build.ymlAutomated build pipeline for releasesCI/CD Pipeline

Sources: README.md87-96 package.json1-22


Data Flow Overview

End-to-End Data Pipeline

This diagram illustrates the complete data pipeline from application launch to user interaction. Each node represents a specific function or operation implemented in the codebase. For detailed documentation of each subsystem:

Sources: README.md15-45


Filtering System Overview

The core value proposition of PaperEater is its filtering system, which reduces 2000 papers to a manageable subset (typically 10-50 papers) based on AI/ML relevance.

Keyword Pattern Categories

CategoryExample KeywordsMatch Count (Typical)
General AI TermsAI, AGI, ASI, LLM, VLM, multimodal~20-30 papers
Major CompaniesOpenAI, Anthropic, DeepMind, Meta, x-ai~5-10 papers
Specific ModelsGPT, ChatGPT, o1, Claude, Gemini, Mistral, LLaMA, Sora~15-25 papers
Technical Conceptsdiffusion, text-to-video, RAG, alignment~10-15 papers

The isInterestingPaper() function checks paper titles, author lists, and submission dates against regular expression patterns for each category. Papers matching any pattern are included in the filtered result set.

For complete details on pattern matching logic and filter implementation, see Paper Fetching and Filtering.

Sources: README.md20-31


Build and Distribution

Local Development

# Install dependencies
npm install

# Run development mode (hot reload enabled)
npm run tauri dev

# Build production executable
npm run tauri build

Development mode launches both the Vite dev server (for frontend hot module replacement) and the Tauri runtime (for backend execution), providing rapid iteration during development.

Production builds generate platform-specific executables in src-tauri/target/release/bundle/, including installers (.msi for Windows, .dmg for macOS).

Sources: README.md48-72 package.json6-10

Automated CI/CD

GitHub Actions workflow (.github/workflows/tauri-build.yml) automatically builds Windows and macOS versions when:

  • Manually triggered via workflow_dispatch
  • A version tag matching v* is pushed

The CI pipeline produces two artifacts:

  • PaperEater-windows: Contains .exe and .msi installers
  • PaperEater-macos: Contains .app bundle and .dmg disk image

For complete CI/CD documentation, see CI/CD Pipeline.

Sources: README.md76-83


Optional Features

DeepL Translation

Translation is entirely optional and disabled by default. To enable:

  1. Obtain a DeepL API key (free tier available)
  2. Edit index.html and set const DEEPL_API_KEY = "your-key-here";
  3. Restart the application

When enabled, the application translates each filtered paper's title from English to Japanese and displays both versions in the UI. If the API key is empty, the application functions normally but displays only English titles.

The translation system is documented in detail at Translation Integration.

Sources: README.md35-45


License

PaperEater is released under the MIT License, permitting free use, modification, and distribution with minimal restrictions.

Sources: LICENSE1-21 README.md100-101