Menu

Overview

Relevant source files

This document provides a high-level introduction to the github-wiki-to-html system, explaining its purpose, architecture, and core concepts. For detailed setup instructions, see Getting Started. For in-depth component documentation, see Architecture and Core Ruby Components.

Purpose and Scope

The github-wiki-to-html system is a static site generator that converts GitHub Wiki content into a deployable HTML website. The system reads Markdown files from a Git-backed wiki repository, processes them through a multi-stage pipeline, and outputs semantically-structured HTML pages suitable for hosting on GitHub Pages or any static web server.

This overview covers:

  • The problem domain and solution approach
  • High-level system architecture with code-level mappings
  • Core processing workflow
  • Key files and their responsibilities
  • Output structure and deployment model

What is github-wiki-to-html?

github-wiki-to-html is a Ruby-based conversion system that transforms GitHub Wiki repositories into static HTML sites. Unlike dynamic wiki engines that require server-side rendering, this system performs a one-time conversion that generates plain HTML files, making the output fast to serve, easy to cache, and simple to deploy.

The system solves several key problems:

  • Preservation: Converts ephemeral wiki content into archival HTML format
  • Performance: Eliminates server-side rendering overhead by pre-generating all pages
  • Portability: Produces self-contained HTML that can be hosted anywhere
  • Customization: Applies custom templates, styling, and metadata to wiki content
  • SEO: Generates structured data, sitemaps, and semantic HTML for search engines

Sources: README.md1-6

Core Concepts

Static Site Generation

The system operates on a static site generation model:

  1. Input: A Git repository containing Markdown wiki files (managed as submodule source-wiki)
  2. Processing: Ruby scripts read, parse, and transform the content
  3. Output: HTML files written to an output directory (managed as submodule target-site)
  4. Deployment: The output directory is pushed to a GitHub Pages repository

This approach decouples content (wiki repository) from presentation (generated site) and tooling (converter scripts).

Git-Based Workflow

The system uses Git submodules to manage both input and output:

SubmodulePathPurposeConfiguration
source-wikiDefined by WIKI_REPO constantInput wiki content from GitHub.gitmodules1-3
target-siteDefined by OUTPUT_DIRECTORY constantOutput static site for GitHub Pages.gitmodules4-6

The Git history in source-wiki provides metadata (publication dates, authors, modification history) that enriches the generated HTML pages.

Sources: .gitmodules1-7 constants.rb10 constants.rb26

Multi-Stage Processing Pipeline

Content undergoes four distinct processing stages:

  1. Extraction: github-wiki-to-html.rb reads wiki pages via gollum-lib
  2. Conversion: Markdown is parsed to HTML using commonmarker with GitHub Flavored Markdown extensions
  3. Rendering: HTML content is injected into template.html.liquid via liquid template engine
  4. Post-Processing: github-wiki-to-html.sh runs js-beautify to format the output HTML

For detailed pipeline documentation, see Content Processing Pipeline.

System Architecture

The following diagram maps the system's conceptual architecture to actual code entities and file paths:

Key Architecture Principles:

  1. Separation of Concerns: Configuration (constants.rb), logic (github-wiki-to-html.rb), utilities (methods.rb), and presentation (template.html.liquid) are cleanly separated
  2. Declarative Configuration: All site-specific values are defined in constants.rb as Ruby constants
  3. Template-Based Rendering: The liquid template engine provides flexibility in page structure without modifying core logic
  4. Modular Dependencies: Each external gem has a specific responsibility (Git/wiki, Markdown, templating, HTML processing)

Sources: constants.rb1-69 .gitmodules1-7

Data Flow Through Code Entities

This diagram shows how data transforms as it flows through the system's code constructs:

Data Transformation Details:

StageInputProcessingOutput
Wiki LoadingMarkdown files in WIKI_REPOGollum::Wiki initializationGollum::Page objects
Markdown Parsingpage.formatted_datacommonmarker with GFM extensionsRaw HTML string
HTML SanitizationRaw HTMLnokogiri + loofah sanitizationClean HTML string
Metadata ExtractionGit commit historyGit log parsingHash with date, author, updated
Template RenderingClean HTML + metadataliquid.render() with template.html.liquidFinal HTML page
Post-ProcessingHTML filesjs-beautify via shell scriptFormatted HTML

Sources: constants.rb10-26

Key Components and Files

The system consists of the following primary files:

FileTypePurposeKey Constructs
github-wiki-to-html.rbRuby scriptMain orchestrator that iterates wiki pages and generates HTMLMain execution loop, Gollum::Wiki initialization
constants.rbRuby moduleCentralized configuration constantsWIKI_REPO, OUTPUT_DIRECTORY, SITE_URL, SITE_NAME, etc.
methods.rbRuby moduleUtility functions for HTML processing and file generationpostprocess_html(), generate_html_file(), generate_sitemap_file()
gollum-config.rbRuby configConfigures Gollum wiki behavior and Markdown renderingGollum::Wiki.default_options, filter_chain
template.html.liquidLiquid templateHTML page structure with placeholders for dynamic contentLiquid variables like {{ page_title }}, {{ page_content }}
github-wiki-to-html.shShell scriptWrapper that executes Ruby script and runs beautificationCalls ruby github-wiki-to-html.rb, then html-beautify
.jsbeautifyrcJSON configFormatting rules for HTML outputIndentation, line breaks, whitespace handling
GemfileBundler configRuby dependency specificationgollum-lib, commonmarker, liquid, nokogiri, etc.
package.jsonnpm configNode.js dependency specificationjs-beautify
DockerfileDocker configContainer environment definitionRuby 4.0.1 base image, system dependencies

For detailed documentation of each component, see:

Sources: constants.rb1-69 README.md1-6

Output Structure

The system generates a complete static website in the OUTPUT_DIRECTORY (defined as target-site via the OUTPUT_DIRECTORY constant):

target-site/
├── index.html                  # Home page with list of all wiki pages
├── <page-slug>.html            # Individual page (one per wiki page)
├── sitemap.xml                 # XML sitemap for search engines
├── assets/
│   ├── css/
│   │   └── style.css          # Stylesheet (not generated, must exist)
│   ├── js/
│   │   └── mathjax-config.js  # MathJax configuration (not generated)
│   └── images/
│       └── icon.jpg           # Site logo (not generated)

Generated Files:

  • index.html: Home page that lists all wiki pages with titles and publication dates, using heading defined by HOME_HEADING constant
  • <slug>.html: Individual article pages, where <slug> is the URL-friendly version of the wiki page title
  • sitemap.xml: SEO sitemap containing URLs for all pages

Static Assets:

The system references but does not generate static assets:

  • STYLESHEET_URL constant points to /assets/css/style.css
  • MATHJAX_CONFIG_SCRIPT_URL constant points to /assets/js/mathjax-config.js
  • PUBLISHER_LOGO_URL constant points to /assets/images/icon.jpg

These files must be manually placed in the OUTPUT_DIRECTORY structure.

Sources: constants.rb26-66

Deployment Model

The system uses a Git-submodule-based deployment strategy:

  1. Source Management: source-wiki submodule tracks the GitHub Wiki repository at WIKI_URL
  2. Conversion: Ruby scripts generate static HTML into OUTPUT_DIRECTORY
  3. Version Control: target-site submodule is itself a Git repository (wikinder.github.io)
  4. Deployment: Pushing target-site to GitHub automatically deploys to GitHub Pages
  5. Custom Domain: SITE_URL constant defines the custom domain (https://wikinder.org)

This architecture separates the conversion tooling (parent repository) from both the source content and the generated output, enabling independent versioning and deployment of each component.

For detailed deployment procedures, see Deployment Strategy.

Sources: .gitmodules1-7 constants.rb13 constants.rb37

Next Steps

To begin using the system:

  1. Review Prerequisites for system requirements
  2. Follow Installation to set up the environment
  3. Run your first conversion using Running a Conversion

For understanding the internals:

  1. Study Architecture for detailed component relationships
  2. Read Core Ruby Components for code-level documentation
  3. Explore Dependencies to understand the technology stack

Sources: README.md1-6 constants.rb1-69 .gitmodules1-7