Menu

Overview

Relevant source files

The wikinder-rebase-2025-12-10 repository is a specialized Git history maintenance tool designed to clean up and optimize the commit history of the Wikinder wiki repository (https://github.com/wikinder/wikinder.wiki.git). The system executes a three-stage pipeline that transforms messy, verbose commit history into a consolidated, maintainable format by rewriting commit messages, squashing related commits, and removing empty commits while preserving chronological order.

This document describes the overall system architecture, processing workflow, and repository structure. For detailed information about the wiki content being processed, see The Wikinder Wiki. For technical details about individual processing scripts, see Processing Scripts. For usage instructions, see Usage Guide.

System Purpose

The system addresses the following Git history maintenance problems through a three-stage transformation pipeline:

ProblemSolution StageScript/ToolInputOutput
Verbose "Revert X...Y on Title" messages with unstable hash referencesStage 1`01-revert-to-updated.pl`()`$FIRST_LOG`()`$FIRST_REBASE_TODO`() with exec git commit --amend commands
Multiple sequential edits to same file by same author on same day (JST)Stage 2`02-squash.pl`()`$SECOND_LOG`()`$SECOND_REBASE_TODO`() with pick/fixup sequences
Empty commits after squashing, mismatched committer metadataStage 3git-filter-repo in `00.sh101-105()`$THIRD_LOG`()`$FOURTH_LOG`() with pruned history

Transformation metrics:

The final result is a clean, linear history where each file has at most one consolidated commit per author per calendar day (JST timezone), with standardized "Updated Title (markdown)" messages and proper author/committer metadata alignment.

Sources: 00.sh1-16(), 00.sh32-37(), README.md1-4()

System Architecture

Diagram: Core System Components and Data Flow

Sources: 00.sh1-124(), .gitignore1-2()

Processing Pipeline

The system executes the following sequential stages, with user confirmation checkpoints between each stage:

Diagram: Checkpoint-Driven Execution Flow

Sources: 00.sh66-116()

Stage 1: Message Rewriting

The first stage normalizes commit messages by converting verbose "Revert" messages into standardized "Updated" messages.

Input format (from `01-git-log.txt`()):

pick e1372d3 # "2025-03-10T10:49:49+09:00", "yuuki", "Revert 3135fb48...fb0a8ca7 on yuuki"
pick a6ccc7a # "2025-03-10T13:20:54+09:00", "yuuki", "Updated yuuki (markdown)"
pick 9adb70e # "2025-03-10T13:21:08+09:00", "yuuki", "Revert 8a0c6523...8062ba60 on bear"

Output format (to `01-git-rebase-todo.txt`()):

pick e1372d3 # "2025-03-10T10:49:49+09:00", "yuuki", "Revert 3135fb48...fb0a8ca7 on yuuki"
exec GIT_COMMITTER_DATE='2025-03-10T10:49:49+09:00' git commit --amend --date='2025-03-10T10:49:49+09:00' --message='Updated yuuki (markdown)'
pick a6ccc7a # "2025-03-10T13:20:54+09:00", "yuuki", "Updated yuuki (markdown)"
pick 9adb70e # "2025-03-10T13:21:08+09:00", "yuuki", "Revert 8a0c6523...8062ba60 on bear"
exec GIT_COMMITTER_DATE='2025-03-10T13:21:08+09:00' git commit --amend --date='2025-03-10T13:21:08+09:00' --message='Updated bear (markdown)'

The script inserts exec commands after pick commands with "Revert" messages to amend them with standardized messages while preserving the original timestamp.

Sources: 01-revert-to-updated.pl1-34(), 01-git-rebase-todo.txt40-44()

Stage 2: Commit Squashing

The second stage consolidates commits based on three criteria: same file title, same author, same day (JST timezone).

Squashing criteria:

CriterionImplementationLocation
Same file titleExtracted from commit message via regex02-squash.pl45-48()
Same dayDate comparison in JST timezone00.sh40(), 02-squash.pl52()
Same authorAuthor comparison (yuuki or bear only)02-squash.pl51()

Example transformation from `02-git-log.txt`() to `02-git-rebase-todo.txt`():

Before (multiple commits to same file on same day):

pick 2c2acd4 # "2025-04-16T10:01:20+09:00", "yuuki", "Updated Mathematics of blocks (markdown)"
pick f4edbe1 # "2025-04-16T10:44:40+09:00", "yuuki", "Updated Mathematics of blocks (markdown)"
pick 1109fe5 # "2025-04-16T11:29:20+09:00", "yuuki", "Updated Mathematics of blocks (markdown)"

After (squashed with preserved final timestamp):

pick 2c2acd4 # "2025-04-16T10:01:20+09:00", "yuuki", "Updated Mathematics of blocks (markdown)"
fixup f4edbe1
fixup 1109fe5
exec GIT_COMMITTER_DATE='2025-04-16T11:29:20+09:00' git commit --amend --date='2025-04-16T11:29:20+09:00' --no-edit

Sources: 02-squash.pl1-86(), 02-git-rebase-todo.txt72-91()

Stage 3: History Cleanup

The final stage uses git-filter-repo to remove empty commits and standardize metadata.

Operations performed:

commit.committer_name  = commit.author_name
commit.committer_email = commit.author_email
commit.committer_date  = commit.author_date

This ensures that the committer metadata matches the author metadata for all commits, fixing any discrepancies introduced during rebasing.

The --prune-empty=always flag removes commits that become empty after squashing (e.g., commits that were pure reverts).

Sources: 00.sh101-105()

Repository Structure

File inventory with variable names:

FileVariable NameTypePurposeGit Status
`00.sh`()N/ABash scriptMaster orchestrator with set -xeuo pipefailTracked
`01-revert-to-updated.pl`()N/APerl scriptStage 1: Parses "Revert" messages, generates exec commandsTracked
`02-squash.pl`()N/APerl scriptStage 2: Groups commits by file/author/day, generates fixup sequencesTracked
`README.md`()N/AMarkdownProject identifier with DeepWiki documentation badgeTracked
`LICENSE`()N/ATextMIT License (2025)Tracked
`.gitignore`()N/AConfigExcludes production-repo.git/ and work-repo/ directoriesTracked
production-repo.git/N/ADirectoryBare mirror created by git clone --mirrorIgnored
work-repo/N/ADirectoryWorking repository where git rebase -i operations executeIgnored
../01-git-log.txt$FIRST_LOGLog fileInitial history: git log --reverse before Stage 1Generated
../01-git-rebase-todo.txt$FIRST_REBASE_TODOTodo fileStage 1 rebase instructions with pick/exec commandsGenerated
../02-git-log.txt$SECOND_LOGLog fileHistory after Stage 1 message rewritingGenerated
../02-git-rebase-todo.txt$SECOND_REBASE_TODOTodo fileStage 2 rebase instructions with pick/fixup/exec commandsGenerated
../03-git-log.txt$THIRD_LOGLog fileHistory after Stage 2 squashing (before pruning)Generated
../04-git-log.txt$FOURTH_LOGLog fileFinal history after git-filter-repo pruningGenerated

Configuration format:

All log files use the format configured by:

git config --local log.date iso-strict-local
git config --local format.pretty 'pick %h # "%ad", "%an", "%s"'

Sources: 00.sh32-37(), 00.sh62-63(), .gitignore1-2(), README.md1-4()

Git Configuration and Remotes

Diagram: Repository Configuration and Backup Strategy

Remote configuration variables:

# Line 26-27 in 00.sh
PRODUCTION_REMOTE='https://github.com/wikinder/wikinder.wiki.git'
WORK_REMOTE="git@github.com:wikinderbear/wikinder-rebase-$TIMESTAMP.git"

# Line 29-30 in 00.sh
BACKUP_BRANCH='backup-before-rebase'
BACKUP_TAG="$BACKUP_BRANCH-$TIMESTAMP"

The $PRODUCTION_REMOTE uses HTTPS for anonymous read access, while $WORK_REMOTE uses SSH for authenticated write access. The $TIMESTAMP variable is set via TIMESTAMP=$(date -u +'%Y-%m-%d') (`00.sh24()), creating a unique staging repository name like wikinder-rebase-2025-12-10 for each execution.

Backup strategy execution order:

  1. Line 52: git branch "$BACKUP_BRANCH" master creates branch backup-before-rebase from master
  2. Lines 55-56: git tag --annotate "$BACKUP_TAG" "$BACKUP_BRANCH" --message="Backup before rebase ($TIMESTAMP)" creates annotated tag
  3. Line 59: git push work-remote "$BACKUP_BRANCH" "$BACKUP_TAG" pushes both to work remote before any history modification

This ensures complete rollback capability via git reset --hard backup-before-rebase if issues are discovered after rebasing.

Sources: 00.sh24-59()

Git Log Configuration

The system configures git log format to match the git rebase --interactive todo file format:

git config --local log.date iso-strict-local
git config --local format.pretty 'pick %h # "%ad", "%an", "%s"'

This produces output like:

pick fc7162f # "2025-01-01T21:09:52+09:00", "yuuki", "Initial Home page"
pick 350891d # "2025-01-16T17:54:18+09:00", "yuuki", "Created _Footer (markdown)"

The format includes:

  • %h: abbreviated commit hash
  • %ad: author date in ISO 8601 format with local timezone offset
  • %an: author name
  • %s: commit subject (first line of message)

This format is directly parseable by the Perl scripts and can be used as-is in git rebase -i todo files.

Sources: 00.sh62-66(), 01-git-log.txt1-10()

Error Recovery

The system includes automated error recovery for the common case where squashing creates empty commits:

git -c sequence.editor="cp '$SECOND_REBASE_TODO'" \
  rebase -i --root --committer-date-is-author-date \
  || { until git commit --amend --allow-empty --no-edit && git rebase --continue; do :; done }

When git rebase encounters an empty commit and exits with an error, the shell loop automatically:

  1. Amends the commit with --allow-empty to accept the empty state
  2. Continues the rebase with git rebase --continue
  3. Repeats until the rebase completes successfully

This automation prevents manual intervention during Stage 2, which can have hundreds of squash operations.

Sources: 00.sh87-92()

Chronological Verification

After all processing completes, the system verifies that all commits remain in chronological order:

git log --reverse --format='%at' | sort -nc

This command:

  1. Lists all commit timestamps (%at = author timestamp in Unix epoch format)
  2. Pipes through sort -nc which checks if the input is numerically sorted
  3. Exits with error if any timestamp is out of order

This verification ensures that despite extensive history rewriting, the final timeline remains logically consistent.

Sources: 00.sh111()

Timezone Handling

The entire pipeline operates in JST (Japan Standard Time, UTC+9):

export TZ='Asia/Tokyo'

This ensures that day-based grouping in Stage 2 uses JST dates consistently. For example, commits at "2025-04-16T23:30:00+09:00" and "2025-04-17T00:30:00+09:00" are treated as different days despite being only one hour apart, because they occur on different calendar days in JST.

The iso-strict-local date format preserves timezone information in all log files for audit purposes.

Sources: 00.sh40(), 00.sh62()