Menu

Overview

Relevant source files

The wikinder-rebase-2025-12-10 repository is a specialized Git history maintenance tool designed to clean up and optimize the commit history of the Wikinder wiki repository (https://github.com/wikinder/wikinder.wiki.git). The system executes a three-stage pipeline that transforms messy, verbose commit history into a consolidated, maintainable format by rewriting commit messages, squashing related commits, and removing empty commits while preserving chronological order.

This document describes the overall system architecture, processing workflow, and repository structure. For detailed information about the wiki content being processed, see The Wikinder Wiki. For technical details about individual processing scripts, see Processing Scripts. For usage instructions, see Usage Guide.

System Purpose

The system addresses the following Git history maintenance problems:

ProblemSolution StageMechanism
Verbose "Revert" commit messages with unstable hash referencesStage 1Message rewriting via `01-revert-to-updated.pl`()
Multiple sequential edits to same file by same author on same dayStage 2Commit squashing via `02-squash.pl`()
Empty commits and mismatched committer metadataStage 3History cleanup via git-filter-repo in `00.sh`()

The final result is a clean, linear history where each file has one consolidated commit per author per day, with standardized commit messages and proper metadata.

Sources: 00.sh1-16(), README.md1-4()

System Architecture

Diagram: Core System Components

Sources: 00.sh1-124(), .gitignore1-2()

Processing Pipeline

The system executes the following sequential stages, with user confirmation checkpoints between each stage:

Diagram: Three-Stage Processing Flow

Sources: 00.sh66-116()

Stage 1: Message Rewriting

The first stage normalizes commit messages by converting verbose "Revert" messages into standardized "Updated" messages.

Input format (from `01-git-log.txt`()):

pick e1372d3 # "2025-03-10T10:49:49+09:00", "yuuki", "Revert 3135fb48...fb0a8ca7 on yuuki"
pick a6ccc7a # "2025-03-10T13:20:54+09:00", "yuuki", "Updated yuuki (markdown)"
pick 9adb70e # "2025-03-10T13:21:08+09:00", "yuuki", "Revert 8a0c6523...8062ba60 on bear"

Output format (to `01-git-rebase-todo.txt`()):

pick e1372d3 # "2025-03-10T10:49:49+09:00", "yuuki", "Revert 3135fb48...fb0a8ca7 on yuuki"
exec GIT_COMMITTER_DATE='2025-03-10T10:49:49+09:00' git commit --amend --date='2025-03-10T10:49:49+09:00' --message='Updated yuuki (markdown)'
pick a6ccc7a # "2025-03-10T13:20:54+09:00", "yuuki", "Updated yuuki (markdown)"
pick 9adb70e # "2025-03-10T13:21:08+09:00", "yuuki", "Revert 8a0c6523...8062ba60 on bear"
exec GIT_COMMITTER_DATE='2025-03-10T13:21:08+09:00' git commit --amend --date='2025-03-10T13:21:08+09:00' --message='Updated bear (markdown)'

The script inserts exec commands after pick commands with "Revert" messages to amend them with standardized messages while preserving the original timestamp.

Sources: 01-revert-to-updated.pl1-34(), 01-git-rebase-todo.txt40-44()

Stage 2: Commit Squashing

The second stage consolidates commits based on three criteria: same file title, same author, same day (JST timezone).

Squashing criteria:

CriterionImplementationLocation
Same file titleExtracted from commit message via regex02-squash.pl45-48()
Same dayDate comparison in JST timezone00.sh40(), 02-squash.pl52()
Same authorAuthor comparison (yuuki or bear only)02-squash.pl51()

Example transformation from `02-git-log.txt`() to `02-git-rebase-todo.txt`():

Before (multiple commits to same file on same day):

pick 2c2acd4 # "2025-04-16T10:01:20+09:00", "yuuki", "Updated Mathematics of blocks (markdown)"
pick f4edbe1 # "2025-04-16T10:44:40+09:00", "yuuki", "Updated Mathematics of blocks (markdown)"
pick 1109fe5 # "2025-04-16T11:29:20+09:00", "yuuki", "Updated Mathematics of blocks (markdown)"

After (squashed with preserved final timestamp):

pick 2c2acd4 # "2025-04-16T10:01:20+09:00", "yuuki", "Updated Mathematics of blocks (markdown)"
fixup f4edbe1
fixup 1109fe5
exec GIT_COMMITTER_DATE='2025-04-16T11:29:20+09:00' git commit --amend --date='2025-04-16T11:29:20+09:00' --no-edit

Sources: 02-squash.pl1-86(), 02-git-rebase-todo.txt72-91()

Stage 3: History Cleanup

The final stage uses git-filter-repo to remove empty commits and standardize metadata.

Operations performed:

commit.committer_name  = commit.author_name
commit.committer_email = commit.author_email
commit.committer_date  = commit.author_date

This ensures that the committer metadata matches the author metadata for all commits, fixing any discrepancies introduced during rebasing.

The --prune-empty=always flag removes commits that become empty after squashing (e.g., commits that were pure reverts).

Sources: 00.sh101-105()

Repository Structure

File inventory:

FileTypePurposeGit Status
`00.sh`()Bash scriptMaster orchestrator that executes entire pipelineTracked
`01-revert-to-updated.pl`()Perl scriptStage 1: Message rewritingTracked
`02-squash.pl`()Perl scriptStage 2: Commit squashingTracked
`README.md`()MarkdownProject identifier with DeepWiki badgeTracked
`LICENSE`()TextMIT LicenseTracked
`.gitignore`()ConfigExcludes production-repo.git/ and work-repo/Tracked
production-repo.git/DirectoryBare mirror of production repositoryIgnored
work-repo/DirectoryWorking repository where rebases occurIgnored
01-git-log.txtLog fileInitial commit history before Stage 1Ignored
01-git-rebase-todo.txtInstruction fileRebase instructions for Stage 1Ignored
02-git-log.txtLog fileCommit history after Stage 1Ignored
02-git-rebase-todo.txtInstruction fileRebase instructions for Stage 2Ignored
03-git-log.txtLog fileCommit history after Stage 2Ignored
04-git-log.txtLog fileFinal commit history after Stage 3Ignored

Sources: 00.sh32-37(), .gitignore1-2(), README.md1-4()

Git Configuration and Remotes

Diagram: Repository Relationships

Remote configuration:

PRODUCTION_REMOTE='https://github.com/wikinder/wikinder.wiki.git'
WORK_REMOTE="git@github.com:wikinderbear/wikinder-rebase-$TIMESTAMP.git"

The production remote uses HTTPS for anonymous read access, while the work remote uses SSH for authenticated write access. The work remote name includes a timestamp (TIMESTAMP=$(date -u +'%Y-%m-%d')) to create a unique staging repository for each execution.

Backup strategy:

Before any rebasing begins, the system creates:

  1. A backup branch backup-before-rebase from master (`00.sh52())
  2. An annotated tag backup-before-rebase-YYYY-MM-DD pointing to the backup branch (`00.sh55-56())
  3. Both are pushed to the work remote before any history modification (`00.sh59())

This ensures complete rollback capability if issues are discovered after the rebase completes.

Sources: 00.sh24-59()

Git Log Configuration

The system configures git log format to match the git rebase --interactive todo file format:

git config --local log.date iso-strict-local
git config --local format.pretty 'pick %h # "%ad", "%an", "%s"'

This produces output like:

pick fc7162f # "2025-01-01T21:09:52+09:00", "yuuki", "Initial Home page"
pick 350891d # "2025-01-16T17:54:18+09:00", "yuuki", "Created _Footer (markdown)"

The format includes:

  • %h: abbreviated commit hash
  • %ad: author date in ISO 8601 format with local timezone offset
  • %an: author name
  • %s: commit subject (first line of message)

This format is directly parseable by the Perl scripts and can be used as-is in git rebase -i todo files.

Sources: 00.sh62-66(), 01-git-log.txt1-10()

Error Recovery

The system includes automated error recovery for the common case where squashing creates empty commits:

git -c sequence.editor="cp '$SECOND_REBASE_TODO'" \
  rebase -i --root --committer-date-is-author-date \
  || { until git commit --amend --allow-empty --no-edit && git rebase --continue; do :; done }

When git rebase encounters an empty commit and exits with an error, the shell loop automatically:

  1. Amends the commit with --allow-empty to accept the empty state
  2. Continues the rebase with git rebase --continue
  3. Repeats until the rebase completes successfully

This automation prevents manual intervention during Stage 2, which can have hundreds of squash operations.

Sources: 00.sh87-92()

Chronological Verification

After all processing completes, the system verifies that all commits remain in chronological order:

git log --reverse --format='%at' | sort -nc

This command:

  1. Lists all commit timestamps (%at = author timestamp in Unix epoch format)
  2. Pipes through sort -nc which checks if the input is numerically sorted
  3. Exits with error if any timestamp is out of order

This verification ensures that despite extensive history rewriting, the final timeline remains logically consistent.

Sources: 00.sh111()

Timezone Handling

The entire pipeline operates in JST (Japan Standard Time, UTC+9):

export TZ='Asia/Tokyo'

This ensures that day-based grouping in Stage 2 uses JST dates consistently. For example, commits at "2025-04-16T23:30:00+09:00" and "2025-04-17T00:30:00+09:00" are treated as different days despite being only one hour apart, because they occur on different calendar days in JST.

The iso-strict-local date format preserves timezone information in all log files for audit purposes.

Sources: 00.sh40(), 00.sh62()