🚀
巨人の肩に乗る

2025/12/24に公開
本記事は 仮想通貨 Advent Calendar 2025 の24日目の記事です。

 はじめにはじめまして、ymdと申します。普段は、株や暗号資産の分析をし、マーケットが盛り上がったときに落ちているお金を拾っています。

今年のAdvent Calendarを眺めていると、DEXの分析やLLMを活用した自動トレード戦略作成など、非常に有益な記事が目白押しです。
これらを見て思い出したのが、ニュートンの「巨人の肩に乗る」という言葉。本記事では、この精神に倣い、AIの力と先人の知見という2つの「肩」を借りながら、お金拾いの方法を探っていきます。

 AIの肩に乗る
 AI駆動開発の3つのアプローチAIを活用した開発には、大きく3つの方向性があります：

情報収集の自動化：論文や API ドキュメントの要約

戦略生成の自動化：複数のアプローチを並行生成

コーディングの自動化：コードそのものを AI に書かせる
このうち、情報収集の自動化は既に実用レベルに達しています。フロンティアモデルを使えば、大量の論文を数分で要約できるでしょう。一方、戦略生成の自動化には根本的な限界があります。AI は多様な戦略の列挙は得意ですが、どの方向性が有望かという判断には、市場の構造やデータの性質に関する深いドメイン知識が不可欠です。アンサンブル学習のように機械的に組み合わせる方向性は有効ですが、新規性のある戦略仮説を立てるには、結局のところ人間の洞察が必要になります。
では、コーディングの自動化はどうでしょうか？もし AI が「論文に書かれた手法を正確に実装できる」なら、人間は「何を試すべきか」に集中し、AI は「どう実装するか」を担当するという理想的な分業が可能です。しかし、この分業が成立するためには、AI が科学論文を正確に再現できることが前提条件となります。本記事では、この最も野心的な問いに焦点を当てます。

問題は、AIが本当に科学論文を正確に実装できるのか？ということです。

 PaperBench：科学論文再現コーディングベンチマークその答えを出すために OpenAI が設計したのが PaperBench です。タスクとしては、ICML 2024から選ばれた20本の論文について、論文PDFのみから実証的貢献を完全に再現するコードを生成することです。これを実行するためには、論文の文脈を理解し、数式・疑似コード・散在するハイパーパラメータを統合しながら抽象的な記述を実行可能なコードに翻訳する必要があります。
評価基準：PaperBenchの評価は、単純な「図表の一致度」ではありません。各論文には、論文著者と共同で作成された詳細なルーブリック（評価基準）が付属します。
各ルーブリックは階層的な木構造で、論文の再現タスクを細分化
末端ノード（leaf node）では、具体的な達成基準を二値判定（Pass/Fail）
各ノードには論文への重要度に応じた重みが設定される
最終スコアは、重み付き平均で算出
例：
論文の主要貢献の再現 (weight: 1.0)
├─ 実装の正確性 (weight: 0.4)
│  ├─ アルゴリズムの実装 (0 or 1)
│  └─ ハイパーパラメータの設定 (0 or 1)
├─ 実行の成功 (weight: 0.3)
│  └─ reproduce.sh の正常終了 (0 or 1)
└─ 結果の再現 (weight: 0.3)
   ├─ Table 1 の数値の一致 (0 or 1)
   └─ Figure 2 の傾向の一致 (0 or 1)
規模：PaperBench全体で、20本の論文に対し合計8,316個の評価ノードが存在します。AI は論文理解、コード実装、実行環境構築、結果検証という研究再現の全プロセスを評価されます。
興味深いのは、Claude Code など最新ツールでも正答率は限定的という事実です。これは、複雑なドメイン知識を要する実装において、AI の「コード腐敗」が顕在化することを示しています。

 Deep Code：商用ツールと人間専門家を超える再現性ツールそんな中、Deep Codeが出したスコアは、不安定ながらも衝撃でした。

① 人間専門家との比較（3論文サブセット）
Top ML PhD（Best@3）: 72.4%
DeepCode（平均）: 75.9%
DeepCode は、トップ大学の機械学習博士を上回る再現精度を達成しています。
② 商用コードエージェントとの比較（5論文サブセット）
Codex: 40.0%
Claude Code: 58.7%
Cursor: 58.4%
DeepCode: 84.8%（+26.1pt）
単純な LLM プロンプティングでは、科学論文の再現は極めて困難であることがわかります。特に興味深い点は、DeepCodeはculaude codeとcursorと同じLLMを用いてるという点です。これはLLMハーネスの重要性を示しています。

 再現1簡単な例としてBatch Normalizationの論文で試してみました。
生成されたコードは以下です。

ちゃんと研究者が作るようなディレクトリ構造になっています。
batch_normalization_reproduction/
├── batch_norm/              # Core BN implementation
│   ├── bn_layer.py          # BatchNorm1d (fully-connected)
│   ├── bn_conv.py           # BatchNorm2d (convolutional)
│   └── moving_stats.py      # Population statistics tracking
│
├── models/                  # Network architectures
│   ├── mnist_net.py         # 3-layer FC network
│   ├── inception_module.py  # Inception building block
│   └── inception_network.py # Full Inception architecture
│
├── training/                # Training infrastructure
│   ├── trainer.py           # Main training loop
│   ├── sgd_optimizer.py     # SGD with momentum
│   └── lr_scheduler.py      # Learning rate schedules
│
├── data/                    # Data loading
│   ├── mnist_loader.py      # MNIST pipeline
│   └── imagenet_loader.py   # ImageNet pipeline
│
├── experiments/             # Experiment runners
│   ├── run_mnist.py         # MNIST experiments
│   ├── run_inception_baseline.py  # Inception without BN
│   ├── run_inception_bn.py  # BN variants (x5, x30, sigmoid)
│   └── run_ensemble.py      # Ensemble training/evaluation
│
├── analysis/                # Result analysis
│   ├── plot_training_curves.py    # Figure 2
│   ├── plot_distributions.py      # Figure 1b,c
│   ├── compute_speedups.py        # Figure 3
│   └── visualize_results.py       # Master script
│
├── configs/                 # Hyperparameter configs
│   ├── mnist.yaml
│   ├── inception_base.yaml
│   ├── bn_x5.yaml
│   └── bn_x30.yaml
│
├── tests/                   # Unit tests
│   ├── test_bn_forward.py
│   ├── test_bn_backward.py
│   ├── test_scale_invariance.py
│   └── test_train_eval_modes.py
│
├── requirements.txt         # Dependencies
└── README.md               # This file
生成されたコードには期待すべき結果がついています。
"""
MNIST Experiment Runner - Section 4.1 Validation

Trains 3-layer fully-connected networks with and without Batch Normalization
to validate BN effectiveness on MNIST digit classification.

Expected Results (from paper):
- Without BN: ~92-93% test accuracy, slower convergence
- With BN: ~96-97% test accuracy, faster convergence
- BN should achieve ≥1% higher accuracy and converge in <80% steps

Usage:
    python -m experiments.run_mnist --with_bn --epochs 50
    python -m experiments.run_mnist --no_bn --epochs 50
"""

import numpy as np
import argparse
import os
import json
import time
from typing import Dict, List, Tuple
import pickle

# Import MNIST data loader
このコードは、mnistの分類問題に対して、BNありとなしを比較しています。

論文では、BNがあれば、収束が早いこと、最終Accがより高いこと、covariance shiftに対応できていることを書いてありました。

そしてDeepCodeを実行して作った図は以下です。



重要な点はほぼ再現できています。

 再現2次は、ペアトレーディングの金融の論文の再現を試みる
現在、reflection機能が付いていないため安定したコーディングは難しいが、論文を再現実装するためのドキュメントを生成してくれるのでそれだけで便利です。再現すべきことやパラメータが明確になるだけで、あとはantigravityやclaude codeでコーディングすることができます。
生成されたドキュメント
complete_reproduction_plan:
  paper_info:
    title: "Pairs Trading: Performance of a Relative Value Arbitrage Rule"
    authors: "Evan Gatev, William N. Goetzmann, K. Geert Rouwenhorst"
    publication: "Review of Financial Studies, 2006"
    core_contribution: >
      Systematic empirical analysis of pairs trading strategy over 40 years (1962-2002),
      demonstrating average annualized excess returns of ~11% for top pairs portfolios
      using a minimum distance matching algorithm, with results robust to transaction
      costs and distinct from simple mean reversion strategies.

  file_structure: |
    pairs_trading/
    ├── config/
    │   ├── __init__.py
    │   ├── settings.py              # All hyperparameters (formation=12mo, trading=6mo, etc.)
    │   └── constants.py             # Industry mappings, size decile breakpoints
    │
    ├── data/
    │   ├── __init__.py
    │   ├── crsp_loader.py           # CRSP daily data loading
    │   ├── factor_loader.py         # Fama-French, momentum, reversal factors
    │   ├── normalizer.py            # Cumulative total return index computation
    │   └── filters.py               # Liquidity filtering (no missing trades)
    │
    ├── formation/
    │   ├── __init__.py
    │   ├── distance_calculator.py   # Sum of squared deviations matrix
    │   ├── pair_matcher.py          # Minimum distance pair selection
    │   └── sector_matcher.py        # Industry-restricted pairs (Table 3)
    │
    ├── trading/
    │   ├── __init__.py
    │   ├── signal_generator.py      # 2-sigma divergence/convergence signals
    │   ├── position_manager.py      # $1 long/$1 short position tracking
    │   └── delisting_handler.py     # DLRET handling for delistings
    │
    ├── returns/
    │   ├── __init__.py
    │   ├── portfolio_returns.py     # Value-weighted daily returns (Eq 2-3)
    │   ├── overlapping_portfolios.py # 6 staggered strategies aggregation
    │   └── excess_returns.py        # Fully invested vs committed capital
    │
    ├── analysis/
    │   ├── __init__.py
    │   ├── factor_regression.py     # 5-factor model with Newey-West
    │   ├── var_calculator.py        # Value at Risk (Table 5)
    │   ├── bootstrap_tester.py      # Random pairs bootstrap (Table 6)
    │   ├── long_short_decomp.py     # Long vs short analysis (Table 7)
    │   ├── subperiod.py             # Pre/post 1988 analysis (Table 8)
    │   └── short_recall.py          # High volume recall simulation (Table 9)
    │
    ├── utils/
    │   ├── __init__.py
    │   ├── newey_west.py            # HAC standard errors (6 lags)
    │   └── statistics.py            # Sharpe, skewness, kurtosis
    │
    ├── visualization/
    │   ├── __init__.py
    │   └── figures.py               # Reproduce Figures 1-4
    │
    ├── tests/
    │   └── test_*.py                # Unit tests for each module
    │
    ├── main.py                      # Main backtest execution
    ├── reproduce_tables.py          # Generate Tables 1-9
    ├── requirements.txt             # Dependencies (LAST)
    └── README.md                    # Documentation (LAST)

  implementation_components: |
    ═══════════════════════════════════════════════════════════════════════════════
    COMPONENT 1: Data Pipeline (data/*.py)
    ═══════════════════════════════════════════════════════════════════════════════
    Purpose: Load CRSP daily stock data and compute normalized price series
    
    Key Algorithm - Price Normalization:
    ```
    P_normalized[i,t] = Π_{τ=1}^{t} (1 + r_{i,τ})
    where r includes dividend returns (CRSP 'ret' field)
    Initialize: P_normalized[i,0] = 1.0
    ```
    
    Filtering Logic:
    - Exclude stocks with ANY missing trading day in 12-month formation period
    - Use CRSP daily file fields: PERMNO, DATE, RET, PRC, VOL, DLRET, SICCD
    
    ═══════════════════════════════════════════════════════════════════════════════
    COMPONENT 2: Pairs Formation Engine (formation/*.py)
    ═══════════════════════════════════════════════════════════════════════════════
    Purpose: Match stocks into pairs using minimum sum of squared deviations
    
    Core Algorithm - Distance Calculation:
    ```
    distance(i,j) = Σ_t (P_normalized[i,t] - P_normalized[j,t])²
    
    FOR each stock i in universe:
      FOR each stock j > i:
        ssd = SUM over all days t of (P_norm[i,t] - P_norm[j,t])^2
        distances[i,j] = ssd
    SORT pairs by distance ascending
    RETURN top N pairs (N = 5, 20, 100, 120, or all)
    ```
    
    Historical Spread Standard Deviation:
    ```
    spread_std[i,j] = std(P_normalized[i,:] - P_normalized[j,:]) over formation period
    ```
    
    Sector-Restricted Mode (Table 3):
    - Map SIC codes to S&P sectors: Utilities (4900-4999), Transportation (4000-4799),
      Financials (6000-6799), Industrials (all other)
    - Only match pairs within same sector
    
    ═══════════════════════════════════════════════════════════════════════════════
    COMPONENT 3: Trading Signal Generator (trading/signal_generator.py)
    ═══════════════════════════════════════════════════════════════════════════════
    Purpose: Generate entry/exit signals based on 2-sigma divergence rule
    
    Trading Algorithm:
    ```
    trigger_threshold = 2 × spread_std (from formation)
    
    FOR each day t in 6-month trading period:
      spread[t] = P_norm[stock_i,t] - P_norm[stock_j,t]
      
      IF position_closed AND |spread[t]| > trigger_threshold:
        IF spread[t] > 0:  # Stock i is winner (higher price)
          OPEN: SHORT stock_i ($1), LONG stock_j ($1)
        ELSE:  # Stock j is winner
          OPEN: LONG stock_i ($1), SHORT stock_j ($1)
        signal_date = t + wait_days  # wait_days = 0 or 1
        
      IF position_open AND sign(spread[t]) ≠ sign(spread[open_date]):
        CLOSE position (convergence - prices crossed)
        signal_date = t + wait_days
        
      IF end_of_trading_period AND position_open:
        FORCE CLOSE position
    ```
    
    Variants:
    - No waiting (Panel A): Execute same day as signal
    - One-day waiting (Panel B): Execute next day (reduces bid-ask bounce)
    
    ═══════════════════════════════════════════════════════════════════════════════
    COMPONENT 4: Position Manager & Returns Calculator (trading/, returns/)
    ═══════════════════════════════════════════════════════════════════════════════
    Purpose: Track positions and compute value-weighted portfolio returns
    
    Value-Weighted Return Formula (Equation 2-3):
    ```
    r_{P,t} = Σ_{i∈P} (w_{i,t} × r_{i,t}) / Σ_{i∈P} w_{i,t}
    
    Weight evolution (buy-and-hold):
    w_{i,t} = w_{i,t-1} × (1 + r_{i,t-1})
    w_{i,1} = 1.0 (initial $1 investment)
    ```
    
    Excess Return Calculation:
    - Fully Invested: portfolio_return / n_pairs_that_opened
    - Committed Capital: portfolio_return / n_pairs_selected (more conservative)
    
    Overlapping Portfolios (6 concurrent strategies):
    ```
    Monthly_return = (1/6) × Σ_{k=1}^{6} r_strategy_k
    Strategies staggered 1 month apart
    ```
    
    ═══════════════════════════════════════════════════════════════════════════════
    COMPONENT 5: Risk Factor Analysis (analysis/factor_regression.py)
    ═══════════════════════════════════════════════════════════════════════════════
    Purpose: Compute risk-adjusted returns using factor models
    
    Five-Factor Model (Table 4):
    ```
    R_pairs - R_f = α + β₁(MKT-RF) + β₂(SMB) + β₃(HML) + β₄(MOM) + β₅(REV) + ε
    
    Factors:
    - MKT-RF: Market excess return (Fama-French)
    - SMB: Small minus Big (Fama-French)
    - HML: High minus Low B/M (Fama-French)
    - MOM: Momentum (Carhart, 2-12 month)
    - REV: Short-term reversal = Top 3 deciles prior month - Bottom 3 deciles
    
    Standard Errors: Newey-West HAC with 6 lags
    ```
    
    Ibbotson Model (Alternative):
    - S&P 500 excess return, Small stock premium, Bond default premium, Bond horizon premium
    
    ═══════════════════════════════════════════════════════════════════════════════
    COMPONENT 6: Bootstrap & Robustness Tests (analysis/*.py)
    ═══════════════════════════════════════════════════════════════════════════════
    
    Bootstrap Algorithm (Table 6):
    ```
    FOR iteration = 1 to 200:
      FOR each actual pair (stock_i, stock_j) opened at date t:
        1. Get prior-month return for stock_i and stock_j
        2. Find return decile for each stock
        3. Sample random_stock_i from same decile as stock_i
        4. Sample random_stock_j from same decile as stock_j
        5. Compute return of (random_i, random_j) over same holding period
      Aggregate returns for this bootstrap sample
    Compare distribution to actual returns
    ```
    
    Long/Short Decomposition (Table 7):
    - Track long and short portfolio returns separately
    - Run factor regressions on each leg
    
    Short Recall Simulation (Table 9):
    ```
    IF daily_volume[short_stock] > avg_volume + 1*std_volume (18mo lookback):
      Force close position (simulated recall)
    ```
    
    ═══════════════════════════════════════════════════════════════════════════════
    COMPONENT 7: Newey-West Standard Errors (utils/newey_west.py)
    ═══════════════════════════════════════════════════════════════════════════════
    ```
    def newey_west_se(returns, lags=6):
        T = len(returns)
        mean = np.mean(returns)
        residuals = returns - mean
        
        # Autocovariances
        gamma = np.zeros(lags + 1)
        for j in range(lags + 1):
            gamma[j] = np.sum(residuals[j:] * residuals[:T-j]) / T
        
        # HAC variance (Bartlett kernel)
        V = gamma[0]
        for j in range(1, lags + 1):
            V += 2 * (1 - j/(lags+1)) * gamma[j]
        
        return np.sqrt(V / T)
    ```

  validation_approach: |
    ═══════════════════════════════════════════════════════════════════════════════
    PRIMARY VALIDATION: Reproduce Table 1 (Main Results)
    ═══════════════════════════════════════════════════════════════════════════════
    
    Panel A - No Waiting Rule:
    ┌─────────────┬──────────────────┬─────────────┬───────────┐
    │ Portfolio   │ Monthly Return   │ NW t-stat   │ Std Dev   │
    ├─────────────┼──────────────────┼─────────────┼───────────┤
    │ Top 5       │ 1.31%            │ 8.84        │ 2.28%     │
    │ Top 20      │ 1.44%            │ 11.56       │ 1.69%     │
    └─────────────┴──────────────────┴─────────────┴───────────┘
    
    Panel B - One-Day Waiting (Primary):
    ┌─────────────┬──────────────────┬─────────────┬───────────┐
    │ Portfolio   │ Monthly Return   │ NW t-stat   │ Std Dev   │
    ├─────────────┼──────────────────┼─────────────┼───────────┤
    │ Top 5       │ 0.75%            │ 6.26        │ -         │
    │ Top 20      │ 0.90%            │ 9.29        │ -         │
    └─────────────┴──────────────────┴─────────────┴───────────┘
    
    Tolerance: Within ±15% of reported values
    
    ═══════════════════════════════════════════════════════════════════════════════
    SECONDARY VALIDATION: Additional Tables
    ═══════════════════════════════════════════════════════════════════════════════
    
    Table 2 - Trading Statistics:
    - Average trigger deviation: 4.76% (top 5 pairs)
    - Average pairs traded: 4.81 out of 5
    - Average round trips: 2.02 per pair per 6 months
    - Average time open: 3.75 months
    - 71% utilities in top 20 pairs
    
    Table 3 - Industry Results (one-day wait, top 20):
    - Utilities: 1.08% (t=10.26)
    - Transportation: 0.58% (t=4.26)
    - Financials: 0.78% (t=7.60)
    - Industrials: 0.61% (t=6.93)
    
    Table 4 - Five-Factor Alpha (top 20):
    - Alpha: 0.76% monthly (t=7.08)
    - Market beta: -0.03 (insignificant)
    - R-squared: ~0.09
    
    Table 5 - Value at Risk (top 20 monthly):
    - 1% VaR: -1.94%
    - 5% VaR: -0.70%
    - Prob(negative): 18%
    
    Table 6 - Bootstrap:
    - Random pairs return: ≈ -0.11% (vs +0.90% actual)
    
    Table 7 - Long/Short Split:
    - Short (winner) portfolio drives alpha
    
    Table 8 - Subperiods:
    - 1963-1988: 1.18% raw, 0.67% alpha (t=4.41)
    - 1989-2002: 0.38% raw, 0.42% alpha (t=3.77)
    
    ═══════════════════════════════════════════════════════════════════════════════
    FIGURES TO REPRODUCE
    ═══════════════════════════════════════════════════════════════════════════════
    - Figure 1: Example pair (Kennecott vs Uniroyal, Aug 1963 - Jan 1964)
    - Figure 2: Monthly excess returns time series (1963-2002)
    - Figure 3: Cumulative returns vs S&P 500
    - Figure 4: 24-month rolling correlation (Top 20 vs pairs 101-120)
    
    ═══════════════════════════════════════════════════════════════════════════════
    SUCCESS CRITERIA
    ═══════════════════════════════════════════════════════════════════════════════
    MUST achieve:
    □ Top 20 monthly return (one-day wait) ≈ 0.90% (±15%)
    □ NW t-statistic > 5 for main result
    □ Positive significant alpha after 5-factor adjustment
    □ Bootstrap random pairs return << actual pairs return
    □ Profits positive across all four industry sectors

  environment_setup: |
    ═══════════════════════════════════════════════════════════════════════════════
    PYTHON ENVIRONMENT
    ═══════════════════════════════════════════════════════════════════════════════
    Python: 3.9+
    
    Core Dependencies:
    - numpy>=1.20.0          # Array operations
    - pandas>=1.3.0          # Data manipulation
    - scipy>=1.7.0           # Statistical functions
    - statsmodels>=0.13.0    # OLS regression, Newey-West
    
    Visualization:
    - matplotlib>=3.4.0      # Plotting
    - seaborn>=0.11.0        # Statistical plots
    
    Data Access:
    - wrds>=3.1.0            # CRSP data access (if using WRDS)
    - pyarrow>=6.0.0         # Parquet file handling
    
    Development:
    - pytest>=6.0.0          # Testing
    - jupyter>=1.0.0         # Notebooks
    - tqdm>=4.60.0           # Progress bars
    
    ═══════════════════════════════════════════════════════════════════════════════
    DATA REQUIREMENTS
    ═══════════════════════════════════════════════════════════════════════════════
    CRSP Daily Stock File (1962-2002):
    - PERMNO, DATE, RET, PRC, VOL, SHROUT, DLRET, SICCD
    
    Fama-French Factors (Kenneth French Data Library):
    - F-F_Research_Data_Factors_daily.csv
    - F-F_Momentum_Factor_daily.csv
    
    Alternative: Yahoo Finance for recent period testing (limited historical depth)

  implementation_strategy: |
    ═══════════════════════════════════════════════════════════════════════════════
    PHASE 1: Data Infrastructure (Days 1-3)
    ═══════════════════════════════════════════════════════════════════════════════
    1. Implement CRSP data loader with liquidity filtering
    2. Build price normalization (cumulative total return)
    3. Load Fama-French factors
    4. Set up configuration with all hyperparameters
    
    Test: Verify normalized prices start at $1, compound correctly
    
    ═══════════════════════════════════════════════════════════════════════════════
    PHASE 2: Pairs Formation (Days 4-6)
    ═══════════════════════════════════════════════════════════════════════════════
    1. Implement O(n²) distance matrix computation
    2. Build pair ranking and selection
    3. Compute historical spread standard deviations
    4. Add sector-restricted matching option
    
    Test: Top pairs should have lowest distance, verify with small dataset
    Optimization: Vectorize distance calculation for performance
    
    ═══════════════════════════════════════════════════════════════════════════════
    PHASE 3: Trading Engine (Days 7-10)
    ═══════════════════════════════════════════════════════════════════════════════
    1. Implement signal generator with 2-sigma rule
    2. Build position manager for long/short tracking
    3. Add delisting return handling
    4. Implement both wait_days=0 and wait_days=1 variants
    
    Test: Manually verify signals for example pair (Figure 1 style)
    
    ═══════════════════════════════════════════════════════════════════════════════
    PHASE 4: Returns Calculation (Days 11-13)
    ═══════════════════════════════════════════════════════════════════════════════
    1. Implement value-weighted daily returns (Eq 2-3)
    2. Build overlapping portfolio aggregator (6 strategies)
    3. Compute fully invested vs committed capital returns
    4. Monthly compounding and excess return calculation
    
    Test: Run single formation/trading period, verify math manually
    
    ═══════════════════════════════════════════════════════════════════════════════
    PHASE 5: Risk Analytics (Days 14-17)
    ═══════════════════════════════════════════════════════════════════════════════
    1. Implement Newey-West HAC standard errors
    2. Build 5-factor regression (use statsmodels)
    3. Construct reversal factor from return deciles
    4. Implement VaR calculation
    
    Test: Compare alpha/t-stats to Table 4 values
    
    ═══════════════════════════════════════════════════════════════════════════════
    PHASE 6: Robustness Tests (Days 18-21)
    ═══════════════════════════════════════════════════════════════════════════════
    1. Bootstrap random pairs test (200 iterations)
    2. Long/short decomposition
    3. Subperiod analysis (pre/post 1988)
    4. Short recall simulation
    
    Test: Bootstrap should yield ~0% vs ~0.90% actual
    
    ═══════════════════════════════════════════════════════════════════════════════
    PHASE 7: Full Backtest & Validation (Days 22-25)
    ═══════════════════════════════════════════════════════════════════════════════
    1. Run full 1962-2002 backtest
    2. Generate all 9 tables
    3. Create figures 1-4
    4. Compare results to paper, document deviations
    
    ═══════════════════════════════════════════════════════════════════════════════
    KEY IMPLEMENTATION NOTES
    ═══════════════════════════════════════════════════════════════════════════════
    - Liquidity filter is STRICT: no missing days allowed
    - Use CRSP 'ret' field (includes dividends) for total returns
    - One-day waiting rule is primary result (Panel B)
    - 6 overlapping portfolios require careful date management
    - Newey-West with 6 lags for all t-statistics
    - Self-financing: $1 long + $1 short = market neutral

 先人の肩に乗る毎年12月、実践者たちが知見を共有する仮想通貨 Advent Calendarは貴重な情報源だ。今年も多数の記事が投稿されており、中には「ここまで教えてくれていいの？？」と思うレベルの詳細な解説もある。

今回、Advent Calendar 2025の記事群をNotebookLMに突っ込んで分析してみた。
分析の結果、技術力"だけ"では、もう勝てない。記事の執筆者たちが口を揃えて言っているのは、戦場選びの重要性だった。つまり：レッドオーシャンで消耗するより、ブルーオーシャンを探せというメッセージである。
そこでおすすめブルーオーシャンランキングを尋ねてみた。

 1位、新興・マイナープロトコルおよび「魔界」DEX・チェーンでのイベント攻略主要チェーン（ETH、SOL）での真っ向勝負を避け、人が少ないところで戦う。

具体的には：過去に流行ったトレードマイニングの焼き直しを探す（InjectiveやAsterの事例）マイナーなレンディングで清算を独占（公式Botの不備を突く）「もう終わった」とされるチェーンで、実は動いているイベントを見つける。
競合が少ない理由は単純で、みんな主要取引所に群がるから。あと、「終わった手法」は若い参入者が知らないため、知識の非対称性が生まれる。

 2位、ブラウザ操作による力技APIが提供されていない場所を、PlaywrightやSeleniumで自動化する。
競合が少ない理由は、エンジニアは「綺麗なコード」を好むから。

 おわりにnotebooklmを通して分析した結果、Advent Calendarの記事群が、最も強く主張していたこと、それは、Botは不労所得ではないという事実だ。Botは過労所得、この言葉にすべてが凝縮されている。
APIの仕様変更：突然動かなくなる
市場環境の変化：昨日まで機能した戦略が今日は死ぬ
競争の激化：ブルーオーシャンは瞬時にレッドオーシャンになる
継続的なコードの修正、戦略の練り直し、深夜のアラート対応——これらから逃れることはできない。DeepCodeで論文を実装できても、NotebookLMで有望な領域を見つけられても、結局のところ自分の戦略が生きているかの確認をはじめとするメンテナンスや、新しい戦略を考える起点は人間である。
結論：Botterは、過労所得なのでやめたほうがいい。

もし「寝ていても稼げる」ことを期待しているなら、インデックス投資をしたほうがいい。それでも、この泥臭さを面白いと感じられる変人だけが、この世界で生き残っている。先人たちは、そのことを優しく教えてくれていた。
Discussion

ログインするとコメントできます
巨人の肩に乗る

はじめに

AIの肩に乗る

AI駆動開発の3つのアプローチ

PaperBench：科学論文再現コーディングベンチマーク

Deep Code：商用ツールと人間専門家を超える再現性ツール

再現1

再現2

先人の肩に乗る

1位、新興・マイナープロトコルおよび「魔界」DEX・チェーンでのイベント攻略

2位、ブラウザ操作による力技

おわりに

Discussion

Rustで作る！Market Maker Bot (mmbot)

仮想通貨Botterに贈るRust×bot開発あるある21連発

時系列基盤モデルはBotの役に立つか？

botter のためのテスト入門

仮想通貨のbotを作りたいが、みんなどう書いてるの？

Tailscaleを利用して開発環境を整える