Skip to content

Commit 0054550

Browse files
committedApr 22, 2026·
Update imagegen system skill for gpt-image-2
1 parent ef071cf commit 0054550

File tree

8 files changed

+331
-47
lines changed

8 files changed

+331
-47
lines changed
 

‎codex-rs/skills/src/assets/samples/imagegen/SKILL.md‎

Lines changed: 42 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -12,17 +12,19 @@ Generates or edits images for the current project (for example website assets, g
1212
This skill has exactly two top-level modes:
1313

1414
- **Default built-in tool mode (preferred):** built-in `image_gen` tool for normal image generation and editing. Does not require `OPENAI_API_KEY`.
15-
- **Fallback CLI mode (explicit-only):** `scripts/image_gen.py` CLI. Use only when the user explicitly asks for the CLI path. Requires `OPENAI_API_KEY`.
15+
- **Fallback CLI mode:** `scripts/image_gen.py` CLI. Use when the user explicitly asks for the CLI/API/model path, or when they explicitly require a transparent background that needs `gpt-image-1.5`. Requires `OPENAI_API_KEY`.
1616

17-
Within the explicit CLI fallback only, the CLI exposes three subcommands:
17+
Within CLI fallback, the CLI exposes three subcommands:
1818

1919
- `generate`
2020
- `edit`
2121
- `generate-batch`
2222

2323
Rules:
24-
- Use the built-in `image_gen` tool by default for all normal image generation and editing requests.
25-
- Never switch to CLI fallback automatically.
24+
- Use the built-in `image_gen` tool by default for normal image generation and editing requests.
25+
- Do not switch to CLI fallback for ordinary quality, size, or file-path control.
26+
- If the user explicitly asks for a transparent image/background, use CLI fallback with `gpt-image-1.5` and explain briefly that transparent backgrounds are not supported in `gpt-image-2`, the latest model.
27+
- The word `batch` by itself does not mean CLI fallback. If the user asks for many assets or says to batch-generate assets without explicitly asking for CLI/API/model controls, stay on the built-in path and issue one built-in call per requested asset or variant.
2628
- If the built-in tool fails or is unavailable, tell the user the CLI fallback exists and that it requires `OPENAI_API_KEY`. Proceed only if the user explicitly asks for that fallback.
2729
- If the user explicitly asks for CLI mode, use the bundled `scripts/image_gen.py` workflow. Do not create one-off SDK runners.
2830
- Never modify `scripts/image_gen.py`. If something is missing, ask the user before doing anything else.
@@ -79,12 +81,13 @@ Built-in edit semantics:
7981

8082
Execution strategy:
8183
- In the built-in default path, produce many assets or variants by issuing one `image_gen` call per requested asset or variant.
82-
- In the explicit CLI fallback path, use the CLI `generate-batch` subcommand only when the user explicitly chose CLI mode and needs many prompts/assets.
84+
- In the CLI fallback path, use the CLI `generate-batch` subcommand only when the user explicitly chose CLI mode and needs many prompts/assets.
85+
- For many distinct assets, do not use `n` as a substitute for separate prompts. `n` is for variants of one prompt; distinct assets need distinct built-in calls or distinct CLI `generate-batch` jobs.
8386

8487
Assume the user wants a new image unless they clearly ask to change an existing one.
8588

8689
## Workflow
87-
1. Decide the top-level mode: built-in by default, fallback CLI only if explicitly requested.
90+
1. Decide the top-level mode: built-in by default; fallback CLI if explicitly requested or if the user explicitly needs transparent output.
8891
2. Decide the intent: `generate` or `edit`.
8992
3. Decide whether the output is preview-only or meant to be consumed by the current project.
9093
4. Decide the execution strategy: single asset vs repeated built-in calls vs CLI `generate-batch`.
@@ -99,13 +102,13 @@ Assume the user wants a new image unless they clearly ask to change an existing
99102
- If the user's prompt is already specific and detailed, normalize it into a clear spec without adding creative requirements.
100103
- If the user's prompt is generic, add tasteful augmentation only when it materially improves output quality.
101104
10. Use the built-in `image_gen` tool by default.
102-
11. If the user explicitly chooses the CLI fallback, then and only then use the fallback-only docs for quality, `input_fidelity`, masks, output format, output paths, and network setup.
105+
11. If the user explicitly chooses the CLI fallback, or explicitly asks for transparent output, then use the fallback-only docs for model, quality, size, `input_fidelity`, masks, output format, output paths, and network setup.
103106
12. Inspect outputs and validate: subject, style, composition, text accuracy, and invariants/avoid items.
104107
13. Iterate with a single targeted change, then re-check.
105108
14. For preview-only work, render the image inline; the underlying file may remain at the default `$CODEX_HOME/generated_images/...` path.
106109
15. For project-bound work, move or copy the selected artifact into the workspace and update any consuming code or references. Never leave a project-referenced asset only at the default `$CODEX_HOME/generated_images/...` path.
107-
16. For batches, persist only the selected finals in the workspace unless the user explicitly asked to keep discarded variants.
108-
17. Always report the final saved path for any workspace-bound asset, plus the final prompt and whether the built-in tool or fallback CLI mode was used.
110+
16. For batches or multi-asset requests, persist every requested deliverable final in the workspace unless the user explicitly asked to keep outputs preview-only. Discarded variants do not need to be kept unless requested.
111+
17. Always report the final saved path(s) for any workspace-bound asset(s), plus the final prompt or prompt set and whether the built-in tool or fallback CLI mode was used.
109112

110113
## Prompt augmentation
111114

@@ -140,6 +143,9 @@ Generate:
140143
- product-mockup — product/packaging shots, catalog imagery, merch concepts.
141144
- ui-mockup — app/web interface mockups and wireframes; specify the desired fidelity.
142145
- infographic-diagram — diagrams/infographics with structured layout and text.
146+
- scientific-educational — classroom explainers, scientific diagrams, and learning visuals with required labels and accuracy constraints.
147+
- ads-marketing — campaign concepts and ad creatives with audience, brand position, scene, and exact tagline/copy.
148+
- productivity-visual — slide, chart, workflow, and data-heavy business visuals.
143149
- logo-brand — logo/mark exploration, vector-friendly.
144150
- illustration-story — comics, children’s book art, narrative scenes.
145151
- stylized-concept — style-driven concept art, 3D/stylized renders.
@@ -179,7 +185,7 @@ Avoid: <negative constraints>
179185
Notes:
180186
- `Asset type` and `Input images` are prompt scaffolding, not dedicated CLI flags.
181187
- `Scene/backdrop` refers to the visual setting. It is not the same as the fallback CLI `background` parameter, which controls output transparency behavior.
182-
- Fallback-only execution notes such as `Quality:`, `Input fidelity:`, masks, output format, and output paths belong in the explicit CLI path only. Do not treat them as built-in `image_gen` tool arguments.
188+
- Fallback-only execution notes such as `Quality:`, `Input fidelity:`, masks, output format, and output paths belong in the CLI path only. Do not treat them as built-in `image_gen` tool arguments.
183189

184190
Augmentation rules:
185191
- Keep it short.
@@ -220,18 +226,41 @@ Constraints: change only the background; keep the product and its edges unchange
220226
- Iterate with single-change follow-ups.
221227
- If the prompt is generic, add only the extra detail that will materially help.
222228
- If the prompt is already detailed, normalize it instead of expanding it.
223-
- For explicit CLI fallback only, see `references/cli.md` and `references/image-api.md` for `quality`, `input_fidelity`, masks, output format, and output-path guidance.
229+
- For CLI fallback only, see `references/cli.md` and `references/image-api.md` for model, `quality`, `input_fidelity`, masks, output format, and output-path guidance.
224230

225231
More principles shared by both modes: `references/prompting.md`.
226232
Copy/paste specs shared by both modes: `references/sample-prompts.md`.
227233

228234
## Guidance by asset type
229235
Asset-type templates (website assets, game assets, wireframes, logo) are consolidated in `references/sample-prompts.md`.
230236

237+
## gpt-image-2 guidance for CLI fallback
238+
239+
The fallback CLI defaults to `gpt-image-2`.
240+
241+
- Use `gpt-image-2` for new CLI/API workflows unless the request needs transparent output.
242+
- If the user explicitly asks for transparent output, use `gpt-image-1.5` and explain that transparent backgrounds are not supported in `gpt-image-2`, the latest model.
243+
- `gpt-image-2` always uses high fidelity for image inputs; do not set `input_fidelity` with this model.
244+
- `gpt-image-2` supports `quality` values `low`, `medium`, `high`, and `auto`.
245+
- Use `quality low` for fast drafts, thumbnails, and quick iterations. Use `medium`, `high`, or `auto` for final assets, dense text, diagrams, identity-sensitive edits, or high-resolution outputs.
246+
- Square images are typically fastest to generate. Use `1024x1024` for fast square drafts.
247+
- If the user asks for 4K-style output, use `3824x2160` for landscape or `2160x3824` for portrait. Do not use `3840x2160`, because the maximum edge length must be less than `3840px`.
248+
- `gpt-image-2` size may be `auto` or `WIDTHxHEIGHT` if all constraints hold: max edge `< 3840px`, both edges multiples of `16px`, long-to-short ratio `<= 3:1`, total pixels between `655,360` and `8,294,400`.
249+
250+
Popular `gpt-image-2` sizes:
251+
- `1024x1024` square
252+
- `1536x1024` landscape
253+
- `1024x1536` portrait
254+
- `2048x2048` 2K square
255+
- `2048x1152` 2K landscape
256+
- `3824x2160` near-4K landscape
257+
- `2160x3824` near-4K portrait
258+
- `auto`
259+
231260
## Fallback CLI mode only
232261

233262
### Temp and output conventions
234-
These conventions apply only to the explicit CLI fallback. They do not describe built-in `image_gen` output behavior.
263+
These conventions apply only to the CLI fallback. They do not describe built-in `image_gen` output behavior.
235264
- Use `tmp/imagegen/` for intermediate files (for example JSONL batches); delete them when done.
236265
- Write final artifacts under `output/imagegen/`.
237266
- Use `--out` or `--out-dir` to control output paths; keep filenames stable and descriptive.
@@ -276,4 +305,4 @@ If installation is not possible in this environment, tell the user which depende
276305
- `references/cli.md`: fallback-only CLI usage via `scripts/image_gen.py`.
277306
- `references/image-api.md`: fallback-only API/CLI parameter reference.
278307
- `references/codex-network.md`: fallback-only network/sandbox troubleshooting for CLI mode.
279-
- `scripts/image_gen.py`: fallback-only CLI implementation. Do not load or use it unless the user explicitly chooses CLI mode.
308+
- `scripts/image_gen.py`: fallback-only CLI implementation. Do not load or use it unless the user explicitly chooses CLI mode or explicitly asks for transparent output.

‎codex-rs/skills/src/assets/samples/imagegen/agents/openai.yaml‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ interface:
33
short_description: "Generate or edit images for websites, games, and more"
44
icon_small: "./assets/imagegen-small.svg"
55
icon_large: "./assets/imagegen.png"
6-
default_prompt: "Generate or edit the visual assets for this task with the built-in `image_gen` tool by default. First confirm that the task actually calls for a raster image; if the project already has SVG/vector/code-native assets and the user wants to extend or match those, do not use this skill. If the task includes reference images, treat them as references unless the user clearly wants an existing image modified. For multi-asset requests, loop built-in calls rather than treating batch as a separate top-level mode. Only use the fallback CLI if the user explicitly asks for it, and keep CLI-only controls such as `generate-batch`, `quality`, `input_fidelity`, masks, and output paths on that fallback path."
6+
default_prompt: "Generate or edit the visual assets for this task with the built-in `image_gen` tool by default. First confirm that the task actually calls for a raster image; if the project already has SVG/vector/code-native assets and the user wants to extend or match those, do not use this skill. If the task includes reference images, treat them as references unless the user clearly wants an existing image modified. For multi-asset requests, loop built-in calls; the word `batch` alone is not CLI opt-in. Use the fallback CLI only if the user explicitly asks for CLI/API/model controls or explicitly needs transparent output; for transparent output use `gpt-image-1.5` and explain that transparent backgrounds are not supported in `gpt-image-2`, the latest model. Keep CLI-only controls such as `generate-batch`, `quality`, `input_fidelity`, masks, and output paths on that fallback path."

‎codex-rs/skills/src/assets/samples/imagegen/references/cli.md‎

Lines changed: 82 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
# CLI reference (`scripts/image_gen.py`)
22

3-
This file is for the fallback CLI mode only. Read it only after the user explicitly asks to use `scripts/image_gen.py` instead of the built-in `image_gen` tool.
3+
This file is for the fallback CLI mode only. Read it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or when the user explicitly asks for transparent output that requires the `gpt-image-1.5` fallback path.
44

55
`generate-batch` is a CLI subcommand in this fallback path. It is not a top-level mode of the skill.
6+
The word `batch` in a user request is not CLI opt-in by itself.
67

78
## What this CLI does
89
- `generate`: generate a new image from a prompt
910
- `edit`: edit one or more existing images
10-
- `generate-batch`: run many generation jobs from a JSONL file
11+
- `generate-batch`: run many generation jobs from a JSONL file after the user explicitly chooses CLI/API/model controls
1112

1213
Real API calls require **network access** + `OPENAI_API_KEY`. `--dry-run` does not.
1314

@@ -16,7 +17,7 @@ Set a stable path to the skill CLI (default `CODEX_HOME` is `~/.codex`):
1617

1718
```
1819
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
19-
export IMAGE_GEN="$CODEX_HOME/skills/imagegen/scripts/image_gen.py"
20+
export IMAGE_GEN="$CODEX_HOME/skills/.system/imagegen/scripts/image_gen.py"
2021
```
2122

2223
Install dependencies into that environment with its package manager. In uv-managed environments, `uv pip install ...` remains the preferred path.
@@ -60,25 +61,96 @@ python "$IMAGE_GEN" edit \
6061
- **Never modify** `scripts/image_gen.py`. If something is missing, ask the user before doing anything else.
6162

6263
## Defaults
63-
- Model: `gpt-image-1.5`
64+
- Model: `gpt-image-2`
6465
- Supported model family for this CLI: GPT Image models (`gpt-image-*`)
65-
- Size: `1024x1024`
66-
- Quality: `auto`
66+
- Size: `auto`
67+
- Quality: `medium`
6768
- Output format: `png`
6869
- Default one-off output path: `output/imagegen/output.png`
6970
- Background: unspecified unless `--background` is set
7071

72+
## gpt-image-2 size and model guidance
73+
74+
`gpt-image-2` is the default model for new CLI fallback work.
75+
76+
- Use `--quality low` for fast drafts, thumbnails, and quick iterations.
77+
- Use `--quality medium`, `--quality high`, or `--quality auto` for final assets, dense text, diagrams, identity-sensitive edits, and high-resolution outputs.
78+
- Square images are typically fastest. Use `--size 1024x1024` for quick square drafts.
79+
- If the user asks for 4K-style output, use `--size 3824x2160` for landscape or `--size 2160x3824` for portrait.
80+
- Do not pass `--input-fidelity` with `gpt-image-2`; this model always uses high fidelity for image inputs.
81+
- Do not use `--background transparent` with `gpt-image-2`; use `gpt-image-1.5` for transparent output.
82+
83+
Popular `gpt-image-2` sizes:
84+
- `1024x1024`
85+
- `1536x1024`
86+
- `1024x1536`
87+
- `2048x2048`
88+
- `2048x1152`
89+
- `3824x2160`
90+
- `2160x3824`
91+
- `auto`
92+
93+
`gpt-image-2` size constraints:
94+
- max edge `< 3840px`
95+
- both edges multiples of `16px`
96+
- long edge to short edge ratio `<= 3:1`
97+
- total pixels between `655,360` and `8,294,400`
98+
99+
Fast draft:
100+
101+
```bash
102+
python "$IMAGE_GEN" generate \
103+
--prompt "A product thumbnail of a matte ceramic mug on a stone surface" \
104+
--quality low \
105+
--size 1024x1024 \
106+
--out output/imagegen/mug-draft.png
107+
```
108+
109+
Final 2K landscape:
110+
111+
```bash
112+
python "$IMAGE_GEN" generate \
113+
--prompt "A polished landing-page hero image of a matte ceramic mug on a stone surface" \
114+
--quality high \
115+
--size 2048x1152 \
116+
--out output/imagegen/mug-hero.png
117+
```
118+
119+
Near-4K landscape:
120+
121+
```bash
122+
python "$IMAGE_GEN" generate \
123+
--prompt "A detailed architectural visualization at golden hour" \
124+
--size 3824x2160 \
125+
--quality high \
126+
--out output/imagegen/architecture-near-4k.png
127+
```
128+
129+
Transparent background request:
130+
131+
```bash
132+
python "$IMAGE_GEN" generate \
133+
--model gpt-image-1.5 \
134+
--prompt "A clean product cutout on a transparent background" \
135+
--background transparent \
136+
--output-format png \
137+
--out output/imagegen/product-cutout.png
138+
```
139+
140+
When using this path, explain briefly that transparent backgrounds are not supported in `gpt-image-2`, the latest model, so `gpt-image-1.5` is required.
141+
71142
## Quality, input fidelity, and masks (CLI fallback only)
72143
These are explicit CLI controls. They are not built-in `image_gen` tool arguments.
73144

74145
- `--quality` works for `generate`, `edit`, and `generate-batch`: `low|medium|high|auto`
75-
- `--input-fidelity` is **edit-only** and validated as `low|high`
146+
- `--input-fidelity` is **edit-only** and validated as `low|high`; it is not supported for `gpt-image-2`
76147
- `--mask` is **edit-only**
77148

78149
Example:
79150

80151
```bash
81152
python "$IMAGE_GEN" edit \
153+
--model gpt-image-1.5 \
82154
--image input.png \
83155
--prompt "Change only the background" \
84156
--quality high \
@@ -147,10 +219,11 @@ Notes:
147219
- Per-job overrides are supported in JSONL (for example `size`, `quality`, `background`, `output_format`, `output_compression`, `moderation`, `n`, `model`, `out`, and prompt-augmentation fields).
148220
- `--n` generates multiple variants for a single prompt; `generate-batch` is for many different prompts.
149221
- In batch mode, per-job `out` is treated as a filename under `--out-dir`.
222+
- For many requested deliverable assets, provide one prompt/job per distinct asset and use semantic filenames when possible.
150223

151224
## CLI notes
152-
- Supported sizes: `1024x1024`, `1536x1024`, `1024x1536`, or `auto`.
153-
- Transparent backgrounds require `output_format` to be `png` or `webp`.
225+
- Supported sizes depend on the model. `gpt-image-2` supports flexible constrained sizes; older GPT Image models support `1024x1024`, `1536x1024`, `1024x1536`, or `auto`.
226+
- Transparent backgrounds require `output_format` to be `png` or `webp` and are not supported by `gpt-image-2`.
154227
- `--prompt-file`, `--output-compression`, `--moderation`, `--max-attempts`, `--fail-fast`, `--force`, and `--no-augment` are supported.
155228
- This CLI is intended for GPT Image models. Do not assume older non-GPT image-model behavior applies here.
156229

‎codex-rs/skills/src/assets/samples/imagegen/references/codex-network.md‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Codex network approvals / sandbox notes
22

3-
This file is for the fallback CLI mode only. Read it only after the user explicitly asks to use `scripts/image_gen.py`.
3+
This file is for the fallback CLI mode only. Read it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or when the user explicitly asks for transparent output that requires the `gpt-image-1.5` fallback path.
44

55
This guidance is intentionally isolated from `SKILL.md` because it can vary by environment and may become stale. Prefer the defaults in your environment when in doubt.
66

‎codex-rs/skills/src/assets/samples/imagegen/references/image-api.md‎

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,46 @@
11
# Image API quick reference
22

3-
This file is for the fallback CLI mode only. Use it only after the user explicitly asks to use `scripts/image_gen.py` instead of the built-in `image_gen` tool.
3+
This file is for the fallback CLI mode only. Use it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or when the user explicitly asks for transparent output that requires the `gpt-image-1.5` fallback path.
44

55
These parameters describe the Image API and bundled CLI fallback surface. Do not assume they are normal arguments on the built-in `image_gen` tool.
66

77
## Scope
8-
- This fallback CLI is intended for GPT Image models (`gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini`).
8+
- This fallback CLI is intended for GPT Image models (`gpt-image-2`, `gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini`).
99
- The built-in `image_gen` tool and the fallback CLI do not expose the same controls.
1010

11+
## Model summary
12+
13+
| Model | Quality | Input fidelity | Resolutions | Recommended use |
14+
| --- | --- | --- | --- | --- |
15+
| `gpt-image-2` | `low`, `medium`, `high`, `auto` | Always high fidelity for image inputs; do not set `input_fidelity` | `auto` or flexible sizes that satisfy the constraints below | Default for new CLI/API workflows: high-quality generation and editing, text-heavy images, photorealism, compositing, identity-sensitive edits, and workflows where fewer retries matter |
16+
| `gpt-image-1.5` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Transparent backgrounds and backward-compatible workflows |
17+
| `gpt-image-1` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Legacy compatibility |
18+
| `gpt-image-1-mini` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Cost-sensitive draft batches and lower-stakes previews |
19+
20+
## gpt-image-2 sizes
21+
22+
`gpt-image-2` accepts `auto` or any `WIDTHxHEIGHT` size that satisfies all constraints:
23+
24+
- Maximum edge length must be less than `3840px`.
25+
- Both edges must be multiples of `16px`.
26+
- Long edge to short edge ratio must not exceed `3:1`.
27+
- Total pixels must be at least `655,360` and no more than `8,294,400`.
28+
29+
Popular sizes:
30+
31+
| Label | Size | Notes |
32+
| --- | --- | --- |
33+
| Square | `1024x1024` | Typical fast default |
34+
| Landscape | `1536x1024` | Standard landscape |
35+
| Portrait | `1024x1536` | Standard portrait |
36+
| 2K square | `2048x2048` | Larger square output |
37+
| 2K landscape | `2048x1152` | Widescreen output |
38+
| Near-4K landscape | `3824x2160` | Use instead of `3840x2160` |
39+
| Near-4K portrait | `2160x3824` | Use instead of `2160x3840` |
40+
| Auto | `auto` | Default size |
41+
42+
Square images are typically fastest to generate. For 4K-style output, use `3824x2160` or `2160x3824`, not `3840x2160`, because the maximum edge length must be less than `3840px`.
43+
1144
## Endpoints
1245
- Generate: `POST /v1/images/generations` (`client.images.generate(...)`)
1346
- Edit: `POST /v1/images/edits` (`client.images.edit(...)`)
@@ -16,7 +49,7 @@ These parameters describe the Image API and bundled CLI fallback surface. Do not
1649
- `prompt`: text prompt
1750
- `model`: image model
1851
- `n`: number of images (1-10)
19-
- `size`: `1024x1024`, `1536x1024`, `1024x1536`, or `auto`
52+
- `size`: `auto` by default for `gpt-image-2`; flexible `WIDTHxHEIGHT` sizes are allowed only for `gpt-image-2`; older GPT Image models use `1024x1024`, `1536x1024`, `1024x1536`, or `auto`
2053
- `quality`: `low`, `medium`, `high`, or `auto`
2154
- `background`: output transparency behavior (`transparent`, `opaque`, or `auto`) for generated output; this is not the same thing as the prompt's visual scene/backdrop
2255
- `output_format`: `png` (default), `jpeg`, `webp`
@@ -26,12 +59,17 @@ These parameters describe the Image API and bundled CLI fallback surface. Do not
2659
## Edit-specific parameters
2760
- `image`: one or more input images. For GPT Image models, you can provide up to 16 images.
2861
- `mask`: optional mask image
29-
- `input_fidelity`: `low` (default) or `high`
62+
- `input_fidelity`: `low` or `high` only for models that support it; do not set this for `gpt-image-2`
3063

3164
Model-specific note for `input_fidelity`:
65+
- `gpt-image-2` always uses high fidelity for image inputs and does not support setting `input_fidelity`.
3266
- `gpt-image-1` and `gpt-image-1-mini` preserve all input images, but the first image gets richer textures and finer details.
3367
- `gpt-image-1.5` preserves the first 5 input images with higher fidelity.
3468

69+
## Transparent backgrounds
70+
71+
`gpt-image-2` does not currently support transparent backgrounds. If the user explicitly asks for a transparent image or transparent background, use `gpt-image-1.5` with `background=transparent` and a transparent-capable output format such as `png` or `webp`.
72+
3573
## Output
3674
- `data[]` list with `b64_json` per image
3775
- The bundled `scripts/image_gen.py` CLI decodes `b64_json` and writes output files for you.
@@ -41,8 +79,9 @@ Model-specific note for `input_fidelity`:
4179
- Use the edits endpoint when the user requests changes to an existing image.
4280
- Masking is prompt-guided; exact shapes are not guaranteed.
4381
- Large sizes and high quality increase latency and cost.
44-
- High `input_fidelity` can materially increase input token usage.
45-
- If a request fails because a specific option is unsupported by the selected GPT Image model, retry manually without that option.
82+
- Use `quality=low` for fast drafts, thumbnails, and quick iterations. Use `medium` or `high` for final assets, dense text, diagrams, identity-sensitive edits, or high-resolution outputs.
83+
- High `input_fidelity` can materially increase input token usage on models that support it.
84+
- If a request fails because a specific option is unsupported by the selected GPT Image model, retry manually without that option only when the option is not required by the user. If transparent output is required, switch to `gpt-image-1.5` instead of dropping `background=transparent`.
4685

4786
## Important boundary
4887
- `quality`, `input_fidelity`, explicit masks, `background`, `output_format`, and related parameters are fallback-only execution controls.

‎codex-rs/skills/src/assets/samples/imagegen/references/prompting.md‎

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ This file is about prompt structure, specificity, and iteration. Fallback-only e
2828
- If the user prompt is already specific and detailed, normalize it into a clean spec without adding creative requirements.
2929
- If the prompt is generic, you may add tasteful detail when it materially improves the output.
3030
- Treat examples in `sample-prompts.md` as fully-authored recipes, not as the default amount of augmentation to add to every request.
31+
- For photorealism, include `photorealistic` directly when that is the goal, plus concrete real-world texture such as pores, wrinkles, fabric wear, material grain, or imperfect everyday detail.
3132

3233
## Allowed and disallowed augmentation
3334

@@ -46,6 +47,7 @@ Do not add:
4647
- Specify framing and viewpoint (close-up, wide, top-down) and placement only when it materially helps.
4748
- Call out negative space if the asset clearly needs room for UI or copy.
4849
- Avoid making left/right layout decisions unless the user or surrounding layout supports them.
50+
- For people, describe body framing, scale, gaze, and object interactions when they matter (`full body visible`, `looking down at the book`, `hands naturally gripping the handlebars`).
4951

5052
## Constraints and invariants
5153
- State what must not change (`keep background unchanged`).
@@ -55,6 +57,7 @@ Do not add:
5557
- Put literal text in quotes or ALL CAPS and specify typography (font style, size, color, placement).
5658
- Spell uncommon words letter-by-letter if accuracy matters.
5759
- For in-image copy, require verbatim rendering and no extra characters.
60+
- In CLI fallback mode, use `medium` or `high` quality for small text, dense infographics, data-heavy slides, multi-font layouts, legends, axes, and footnotes.
5861

5962
## Input images and references
6063
- Do not assume that every provided image is an edit target.
@@ -71,15 +74,22 @@ Do not add:
7174
## Fallback-only execution controls
7275
- `quality`, `input_fidelity`, explicit masks, output format, and output paths are fallback-only execution controls.
7376
- Do not assume they are built-in `image_gen` tool arguments.
74-
- If the user explicitly chooses CLI fallback, see `references/cli.md` and `references/image-api.md` for those controls.
77+
- If the user explicitly chooses CLI fallback or explicitly asks for transparent output, see `references/cli.md` and `references/image-api.md` for those controls.
78+
- In CLI fallback mode, `gpt-image-2` is the default. It supports `quality=low|medium|high|auto`; use `low` for fast drafts and thumbnails, and move to `medium`, `high`, or `auto` for final assets.
79+
- `gpt-image-2` always uses high fidelity for image inputs, so do not set `input_fidelity` with that model.
80+
- If the user explicitly asks for transparent output, use `gpt-image-1.5` and explain that transparent backgrounds are not supported in `gpt-image-2`, the latest model.
81+
- If the user asks for 4K-style output with `gpt-image-2`, use `3824x2160` for landscape or `2160x3824` for portrait.
7582

7683
## Use-case tips
7784
Generate:
7885
- photorealistic-natural: Prompt as if a real photo is captured in the moment; use photography language (lens, lighting, framing); call for real texture; avoid over-stylized polish unless requested.
7986
- product-mockup: Describe the product/packaging and materials; ensure clean silhouette and label clarity; if in-image text is needed, require verbatim rendering and specify typography.
8087
- ui-mockup: Describe the target fidelity first (shippable mockup or low-fi wireframe), then focus on layout, hierarchy, and practical UI elements; avoid concept-art language.
81-
- infographic-diagram: Define the audience and layout flow; label parts explicitly; require verbatim text.
88+
- infographic-diagram: Define the audience and layout flow; label parts explicitly; require verbatim text; prefer higher quality in CLI mode for dense labels.
8289
- logo-brand: Keep it simple and scalable; ask for a strong silhouette and balanced negative space; avoid decorative flourishes unless requested.
90+
- ads-marketing: Write like a creative brief; include brand positioning, audience, desired vibe, scene, and exact tagline if text must appear.
91+
- productivity-visual: Name the exact artifact (slide, chart, workflow diagram), define the canvas and hierarchy, provide real labels/data, and ask for readable typography and polished spacing.
92+
- scientific-educational: Define audience, lesson objective, required labels, scientific constraints, arrows, and scan-friendly whitespace.
8393
- illustration-story: Define panels or scene beats; keep each action concrete.
8494
- stylized-concept: Specify style cues, material finish, and rendering approach (3D, painterly, clay) without inventing new story elements.
8595
- historical-scene: State the location/date and required period accuracy; constrain clothing, props, and environment to match the era.

‎codex-rs/skills/src/assets/samples/imagegen/references/sample-prompts.md‎

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
These prompt recipes are shared across both top-level modes of the skill:
44
- built-in `image_gen` tool (default)
5-
- explicit `scripts/image_gen.py` CLI fallback
5+
- `scripts/image_gen.py` CLI fallback for explicit CLI/API/model requests or explicit transparent-output requests
66

77
Use these as starting points. They are intentionally complete prompt recipes, not the default amount of augmentation to add to every user request.
88

@@ -13,7 +13,14 @@
1313

1414
The labeled lines are prompt scaffolding, not a closed schema. `Asset type` and `Input images` are prompt-only scaffolding; the CLI does not expose them as dedicated flags.
1515

16-
Execution details such as explicit CLI flags, `quality`, `input_fidelity`, masks, output formats, and local output paths depend on mode. Use the built-in tool by default; only apply CLI-specific controls after the user explicitly opts into fallback mode.
16+
Execution details such as explicit CLI flags, `quality`, `input_fidelity`, masks, output formats, and local output paths depend on mode. Use the built-in tool by default; only apply CLI-specific controls when the user explicitly opts into fallback mode or explicitly asks for transparent output.
17+
18+
CLI model notes:
19+
- `gpt-image-2` is the fallback CLI default for new workflows.
20+
- `gpt-image-2` supports `quality` values `low`, `medium`, `high`, and `auto`.
21+
- For 4K-style `gpt-image-2` output, use `3824x2160` or `2160x3824` instead of `3840x2160`.
22+
- If transparent output is explicitly required, use `gpt-image-1.5` and explain that transparent backgrounds are not supported in `gpt-image-2`, the latest model.
23+
- Do not set `input_fidelity` with `gpt-image-2`; image inputs already use high fidelity.
1724

1825
For prompting principles (structure, specificity, invariants, iteration), see `references/prompting.md`.
1926

@@ -68,6 +75,18 @@
6875
Constraints: clear labels, strong contrast, no logos or trademarks, no watermark
6976
```
7077

78+
### scientific-educational
79+
```
80+
Use case: scientific-educational
81+
Primary request: biology diagram titled "Cellular Respiration at a Glance" for high school students
82+
Scene/backdrop: clean white classroom handout background
83+
Subject: glucose turns into energy inside a cell; include glycolysis, Krebs cycle, and electron transport chain
84+
Style/medium: flat scientific diagram with consistent icons, arrows, and readable labels
85+
Composition/framing: landscape slide-style layout with clear hierarchy and generous whitespace
86+
Text (verbatim): "Cellular Respiration at a Glance", "Glucose", "Pyruvate", "ATP", "NADH", "FADH2", "CO2", "O2", "H2O"
87+
Constraints: scientifically plausible; avoid tiny text; no extra decoration; no watermark
88+
```
89+
7190
### logo-brand
7291
```
7392
Use case: logo-brand
@@ -100,6 +119,30 @@
100119
Constraints: no logos or trademarks; no watermark
101120
```
102121

122+
### ads-marketing
123+
```
124+
Use case: ads-marketing
125+
Primary request: campaign image for a streetwear brand called Thread
126+
Subject: group of friends hanging out together in a stylish urban setting
127+
Style/medium: polished youth streetwear campaign photography
128+
Composition/framing: vertical ad layout with natural poses and integrated headline space
129+
Lighting/mood: contemporary, energetic, tasteful
130+
Text (verbatim): "Yours to Create."
131+
Constraints: render the tagline exactly once; clean legible typography; no extra text; no watermarks; no unrelated logos
132+
```
133+
134+
### productivity-visual
135+
```
136+
Use case: productivity-visual
137+
Primary request: one pitch-deck slide titled "Market Opportunity"
138+
Asset type: fundraising slide image
139+
Style/medium: clean modern deck slide, white background, crisp sans-serif typography
140+
Subject: TAM/SAM/SOM concentric-circle diagram plus a small growth bar chart from 2021 to 2026
141+
Composition/framing: 16:9 landscape slide, clear data hierarchy, polished spacing
142+
Text (verbatim): "Market Opportunity", "TAM: $42B", "SAM: $8.7B", "SOM: $340M", "AGI Research, 2024", "Internal analysis"
143+
Constraints: readable labels, no clip art, no stock photography, no decorative clutter, no watermark
144+
```
145+
103146
### historical-scene
104147
```
105148
Use case: historical-scene
@@ -351,6 +394,8 @@
351394
Constraints: crisp silhouette; no halos or fringing; preserve label text exactly; no restyling
352395
```
353396

397+
CLI note: if transparent output is explicitly required, use `gpt-image-1.5` because `gpt-image-2` does not currently support transparent backgrounds.
398+
354399
### style-transfer
355400
```
356401
Use case: style-transfer
@@ -367,6 +412,17 @@
367412
Constraints: match lighting, perspective, and scale; keep the base framing unchanged; no extra elements
368413
```
369414

415+
### character consistency workflow
416+
```
417+
Use case: identity-preserve
418+
Input images: Image 1: previous character anchor illustration
419+
Primary request: continue the story with the same character in a new scene and action
420+
Scene/backdrop: snowy forest after a winter storm
421+
Subject: same young forest hero gently helping a frightened squirrel out of a fallen tree
422+
Style/medium: same children's book watercolor illustration style as Image 1
423+
Constraints: do not redesign the character; preserve facial features, proportions, outfit, color palette, and personality; no text; no watermark
424+
```
425+
370426
### sketch-to-render
371427
```
372428
Use case: sketch-to-render

‎codex-rs/skills/src/assets/samples/imagegen/scripts/image_gen.py‎

Lines changed: 90 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
#!/usr/bin/env python3
22
"""Fallback CLI for explicit image generation or editing with GPT Image models.
33
4-
Used only when the user explicitly opts into CLI fallback mode.
4+
Used only when the user explicitly opts into CLI fallback mode, or when explicit
5+
transparent output requires the `gpt-image-1.5` fallback path.
56
6-
Defaults to gpt-image-1.5 and a structured prompt augmentation workflow.
7+
Defaults to gpt-image-2 and a structured prompt augmentation workflow.
78
"""
89

910
from __future__ import annotations
@@ -21,20 +22,28 @@
2122

2223
from io import BytesIO
2324

24-
DEFAULT_MODEL = "gpt-image-1.5"
25-
DEFAULT_SIZE = "1024x1024"
26-
DEFAULT_QUALITY = "auto"
25+
DEFAULT_MODEL = "gpt-image-2"
26+
DEFAULT_SIZE = "auto"
27+
DEFAULT_QUALITY = "medium"
2728
DEFAULT_OUTPUT_FORMAT = "png"
2829
DEFAULT_CONCURRENCY = 5
2930
DEFAULT_DOWNSCALE_SUFFIX = "-web"
3031
DEFAULT_OUTPUT_PATH = "output/imagegen/output.png"
3132
GPT_IMAGE_MODEL_PREFIX = "gpt-image-"
3233

33-
ALLOWED_SIZES = {"1024x1024", "1536x1024", "1024x1536", "auto"}
34+
ALLOWED_LEGACY_SIZES = {"1024x1024", "1536x1024", "1024x1536", "auto"}
3435
ALLOWED_QUALITIES = {"low", "medium", "high", "auto"}
3536
ALLOWED_BACKGROUNDS = {"transparent", "opaque", "auto", None}
3637
ALLOWED_INPUT_FIDELITIES = {"low", "high", None}
3738

39+
GPT_IMAGE_2_MODEL = "gpt-image-2"
40+
GPT_IMAGE_2_MIN_PIXELS = 655_360
41+
GPT_IMAGE_2_MAX_PIXELS = 8_294_400
42+
GPT_IMAGE_2_MAX_EDGE_EXCLUSIVE = 3840
43+
GPT_IMAGE_2_MAX_RATIO = 3.0
44+
GPT_IMAGE_2_NEAR_4K_LANDSCAPE = "3824x2160"
45+
GPT_IMAGE_2_NEAR_4K_PORTRAIT = "2160x3824"
46+
3847
MAX_IMAGE_BYTES = 50 * 1024 * 1024
3948
MAX_BATCH_JOBS = 500
4049

@@ -104,10 +113,52 @@ def _normalize_output_format(fmt: Optional[str]) -> str:
104113
return "jpeg" if fmt == "jpg" else fmt
105114

106115

107-
def _validate_size(size: str) -> None:
108-
if size not in ALLOWED_SIZES:
116+
def _parse_size(size: str) -> Optional[Tuple[int, int]]:
117+
match = re.fullmatch(r"([1-9][0-9]*)x([1-9][0-9]*)", size)
118+
if not match:
119+
return None
120+
return int(match.group(1)), int(match.group(2))
121+
122+
123+
def _validate_gpt_image_2_size(size: str) -> None:
124+
if size == "auto":
125+
return
126+
127+
parsed = _parse_size(size)
128+
if parsed is None:
129+
_die("size must be auto or WIDTHxHEIGHT, for example 1024x1024.")
130+
131+
width, height = parsed
132+
max_edge = max(width, height)
133+
min_edge = min(width, height)
134+
total_pixels = width * height
135+
136+
if max_edge >= GPT_IMAGE_2_MAX_EDGE_EXCLUSIVE:
137+
hint = GPT_IMAGE_2_NEAR_4K_LANDSCAPE
138+
if height > width:
139+
hint = GPT_IMAGE_2_NEAR_4K_PORTRAIT
140+
_die(
141+
"gpt-image-2 size maximum edge length must be less than 3840px. "
142+
f"For 4K-style output, use {hint} instead of {size}."
143+
)
144+
if width % 16 != 0 or height % 16 != 0:
145+
_die("gpt-image-2 size width and height must be multiples of 16px.")
146+
if max_edge / min_edge > GPT_IMAGE_2_MAX_RATIO:
147+
_die("gpt-image-2 size long edge to short edge ratio must not exceed 3:1.")
148+
if total_pixels < GPT_IMAGE_2_MIN_PIXELS or total_pixels > GPT_IMAGE_2_MAX_PIXELS:
149+
_die(
150+
"gpt-image-2 size total pixels must be at least 655,360 and no more than 8,294,400."
151+
)
152+
153+
154+
def _validate_size(size: str, model: str) -> None:
155+
if model == GPT_IMAGE_2_MODEL:
156+
_validate_gpt_image_2_size(size)
157+
return
158+
159+
if size not in ALLOWED_LEGACY_SIZES:
109160
_die(
110-
"size must be one of 1024x1024, 1536x1024, 1024x1536, or auto for GPT image models."
161+
"size must be one of 1024x1024, 1536x1024, 1024x1536, or auto for this GPT Image model."
111162
)
112163

113164

@@ -138,17 +189,38 @@ def _validate_transparency(background: Optional[str], output_format: str) -> Non
138189
_die("transparent background requires output-format png or webp.")
139190

140191

192+
def _validate_model_specific_options(
193+
*,
194+
model: str,
195+
background: Optional[str],
196+
input_fidelity: Optional[str] = None,
197+
) -> None:
198+
if model != GPT_IMAGE_2_MODEL:
199+
return
200+
if background == "transparent":
201+
_die(
202+
"transparent backgrounds are not supported in gpt-image-2, the latest model. "
203+
"Use --model gpt-image-1.5 --background transparent --output-format png instead."
204+
)
205+
if input_fidelity is not None:
206+
_die(
207+
"input_fidelity is not supported in gpt-image-2 because image inputs always use high fidelity for this model."
208+
)
209+
210+
141211
def _validate_generate_payload(payload: Dict[str, Any]) -> None:
142-
_validate_model(str(payload.get("model", DEFAULT_MODEL)))
212+
model = str(payload.get("model", DEFAULT_MODEL))
213+
_validate_model(model)
143214
n = int(payload.get("n", 1))
144215
if n < 1 or n > 10:
145216
_die("n must be between 1 and 10")
146217
size = str(payload.get("size", DEFAULT_SIZE))
147218
quality = str(payload.get("quality", DEFAULT_QUALITY))
148219
background = payload.get("background")
149-
_validate_size(size)
220+
_validate_size(size, model)
150221
_validate_quality(quality)
151222
_validate_background(background)
223+
_validate_model_specific_options(model=model, background=background)
152224
oc = payload.get("output_compression")
153225
if oc is not None and not (0 <= int(oc) <= 100):
154226
_die("output_compression must be between 0 and 100")
@@ -912,10 +984,15 @@ def main() -> int:
912984
if getattr(args, "downscale_max_dim", None) is not None and args.downscale_max_dim < 1:
913985
_die("--downscale-max-dim must be >= 1")
914986

915-
_validate_size(args.size)
987+
_validate_model(args.model)
988+
_validate_size(args.size, args.model)
916989
_validate_quality(args.quality)
917990
_validate_background(args.background)
918-
_validate_model(args.model)
991+
_validate_model_specific_options(
992+
model=args.model,
993+
background=args.background,
994+
input_fidelity=getattr(args, "input_fidelity", None),
995+
)
919996
_ensure_api_key(args.dry_run)
920997

921998
args.func(args)

0 commit comments

Comments
 (0)
Please sign in to comment.