This is a decensored version of openai/gpt-oss-20b, made using an unreleased version of Heretic with the experimental "Arbitrary-Rank Ablation" (ARA) method
See https://github.com/p-e-w/heretic/pull/211 for details about ARA.
Abliteration parameters
| Parameter | Value |
|---|---|
| start_layer_index | 12 |
| end_layer_index | 17 |
| preserve_good_behavior_weight | 0.8150 |
| steer_bad_behavior_weight | 0.0072 |
| overcorrect_relative_weight | 1.1267 |
| neighbor_count | 11 |
Performance
| Metric | This model | Original model (openai/gpt-oss-20b) |
|---|---|---|
| KL divergence | 0.0554 | 0 (by definition) |
| Refusals | 3/100 | 98/100 |