pops-gte - DisasmHints.wiki

Introduction

PSP has custom CPU named Allegrex. It has additional MIPS instructions + VFPU (vector FPU) coprocessor.

On this page you can find quick description of these weird instructions, which can be found in disassembly.

MIPS Extension

Extract bits

ext rd, rs, p, s: Extract s bits from rs from position p and put into rd
Example:

rs = 0x11223344

ext   rd, rs, 8, 8

rd = 0x33

Insert bits

ins rd, rs, p, s: Insert first s bits of rs into rd, from position p
Example:

rs = 0xAA

rd = 0x11223344

ins   rd, rs, 16, 8

rd = 0x11AA3344

Widely used for RGB calculations.

Conditional Move

movn rd, rs, rt: if (rt != 0) rd = rs
movz rd, rs, rt: if (rt == 0) rd = rs

Advanced Multiply

madd rs, rt: Acc += rs rt
msub rs, rt: Acc -= rs rt
maddu rs, rt: Acc += (unsigned)rs (unsigned)rt
msubu rs, rt: Acc -= (unsigned)rs (unsigned)rt

Count leading bits

clo rd: rs: rd = Count leading ones of rs
clz rd: rs: rd = Count leading zeros of rs

Based on nice pdf: http://www.mips.com/media/files/MD00565-2B-MIPS32-QRC-01.01.pdf'>MIPS32 Quick Reference

MIPS Delay Slot Instructions

On the MIPS architecture, jump and branch instructions have a "delay slot". This means that the instruction after the jump or branch instruction is executed before the jump or branch is executed.

In addition, there is a group of "branch likely" conditional branch instructions in which the instruction in the delay slot is executed only if the branch is taken.

! So please remember, that BEQ is not the same as BEQL: BEQ delay slot is always executed, but BEQL delay slot executed only if branch taken. This may ruin whole disassembly logic.

Borrowed here: http://public.lanl.gov/totalview/online-4.1.0-4/user_guide/appc28.html'>http://public.lanl.gov/totalview/online-4.1.0-4/user_guide/appc28.html

Small note on VFPU disassembly

When I write:

[x] [y] [z] [w]

this mean single VFPU quad vector

When I write:

block number:

   [0] [1] [2] [3]

   [4] [5] [6] [7]

   [8] [9] [A] [B]

   [C] [D] [E] [F]

this mean contents of whole VFPU 'block'

It is very convenient to represent VFPU disassembly by steps, as content of its registers, for example:

2:

        [L11L12] [L33] [VXY0] []

        [L13L21] [RBK] [VZ0]  []

        [L22L23] [GBK] [VXY1] []

        [L31L32] [BBK] [VZ1]  []



        0x0000DCB0: 0xDB8800A0 '....' - lv.q       C200, 160($gp)       // L11L12

        0x0000DCB4: 0xDB8900B0 '....' - lv.q       C210, 176($gp)       // L33

        0x0000DCB8: 0xDB8A0000 '....' - lv.q       C220, 0($gp)

This writings represent final result of block #2 after three subsequent lv.q instructions.

For more details on VFPU registers look here: http://wiki.fx-world.org/doku.php?id=general:vfpu_registers'>VFPU Register Mapping

http://ogamespec.com/imgstore/whc4e200167b0e4e.jpg'>

When I write:

[F 10] [F 20] [F 40] [F 80]

this mean VFPU registers contain floats

When I write:

[I 10] [I 20] [I 40] [I 80]

this mean VFPU registers contain integers

If I/F of register is not specified, it supposed to be integer

VFPU Load/Store with 'shuffle'

lvl.q: Load Quad Word Left to VFPU

offset % 16address & ~0xfvfpu +0 [1] [2] [3] [4][-] [-] [-] [1] +4 [1] [2] [3] [4][-] [-] [1] [2] +8 [1] [2] [3] [4][-] [1] [2] [3] +12 [1] [2] [3] [4][1] [2] [3] [4]

[-] mean keep unchanged

lvr.q: Load Quad Word Right to VFPU

offset % 16address & ~0xfvfpu +0 [1] [2] [3] [4][1] [2] [3] [4] +4 [1] [2] [3] [4][2] [3] [4] [-] +8 [1] [2] [3] [4][3] [4] [-] [-] +12 [1] [2] [3] [4][4] [-] [-] [-]

[-] mean keep unchanged

svl.q: Store Quad Word Left from VFPU

offset % 16vfpuaddress & ~0xf +0 [1] [2] [3] [4][4] [-] [-] [-] +4 [1] [2] [3] [4][3] [4] [-] [-] +8 [1] [2] [3] [4][2] [3] [4] [-] +12 [1] [2] [3] [4][1] [2] [3] [4]

[-] mean keep unchanged

svr.q: Store Quad Word Right from VFPU

offset % 16vfpuaddress & ~0xf +0 [1] [2] [3] [4][1] [2] [3] [4] +4 [1] [2] [3] [4][-] [1] [2] [3] +8 [1] [2] [3] [4][-] [-] [1] [2] +12 [1] [2] [3] [4][-] [-] [-] [1]

[-] mean keep unchanged

Example of use:
Some piece of GTE calculations require to "push" FIFO registers:

RGB0 = RGB1

RGB1 = RGB2

RGB2 = RGB

So we do:

lv.q      C000, &gteData[20]

.... do some calculations of RGB value, so C000 would be:

[RGB0] [RGB1] [RGB2] [RGB]

svl.q     C000, &gteData[22]

It writes to memory:

[RGB0] [RGB1] [RGB2] [RES1] <= [RGB1] [RGB2] [RGB] [unchanged]

VFPU Simple

instruction description operation vi2f.s Convert integer to float with Scaling Single Word [d0] [s0]) / (1 vi2f.p Convert integer to float with Scaling Pair Word same for pair vi2f.t Convert integer to float with Scaling Triple Word same for triple vi2f.q Convert integer to float with Scaling Quad Word same for quad vmax.s Maximum Single Word [d0] [s0],[t0]) vmax.p Maximum Pair Word same for pair vmax.t Maximum Triple Word same for triple vmax.q Maximum Quad Word same for quad vmin.s Minimum Single Word [d0] [s0],[t0]) vmin.p Minimum Pair Word same for pair vmin.t Minimum Triple Word same for triple vmin.q Minimum Quad Word same for quad vmov.s Move Single Word [d0] [s0] vmov.p Move Pair Word [d0] [d1] [s0] [s1] vmov.t Move Triple Word [d0] [d1] [d2] [s0] [s1] [s2] vmov.q Move Quad Word [d0] [d1] [d2] [d3] [s0] [s1] [s2] [s3] vzero.s Set Zero Single Word [d0] vzero.p Set Zero Pair Word [d0] [d1] vzero.t Set Zero Triple Word [d0] [d1] [d2] vzero.q Set Zero Quad Word [d0] [d1] [d2] [d3]

VFPU Not Simple :)

vs2i.p: Convert signed short to integer Pair Word

Four packed signed shorts converted to four unpacked longs.

Example:

 Source: [0xAABBCCDD] [0x11223344] [-] [-]

 Dest: [0xCCDD0000] [0xAABB0000] [0x33440000] [0x11220000]

! Dont forget to shift >> 16 all result values before use

Used to unpack GTE registers, such as:

 8  L11L12  |L12 1, 3,12|L11 1, 3,12| Light source matrix elements 11, 12

 9  L13L21  |L21 1, 3,12|L13 1, 3,12| Light source matrix elements 13, 21

10  L22L23  |L23 1, 3,12|L22 1, 3,12| Light source matrix elements 22, 23

11  L31L32  |L32 1, 3,12|L31 1, 3,12| Light source matrix elements 31, 32

VFPU Prefix Instructions

VFPU has three prefix instructions: vpfxd, vpfxs and vpfxt which is applied to rd (destination), rs (source) and rt (target) operands of the next instruction.

You can shuffle or discard some of operands, by applying prefix on it. For example:

vpfxt   [x, y, 0, 0]

vadd.q  rd, rs, rt



rt will be [x, y, 0, 0] before vadd.

Prefix table codes for source and target:

code take x v[0] y v[1] z v[2] w v[3] -x -v[0] -y -v[1] -z -v[2] -w -v[3] |x|abs(v[0]) |y|abs(v[1]) |z|abs(v[2]) |w|abs(v[3]) -|x|-abs(v[0]) -|y|-abs(v[1]) -|z|-abs(v[2]) -|w|-abs(v[3]) 0 0 1 1 2 2 3 3 1/2 1/2 1/3 1/3 1/4 1/4 1/6 1/6

Prefix table codes for destination:

code result pass through m ignore 0:1 clamp value to [0.0; 1.0] -1:1 clamp value to [-1.0; 1.0]

prxtool Bugs

Bug in msub decoding

All msub instructions use $zr instead proper RS register, for example:

   0x0000E5B8: 0x00C1002E '....' - msub       $zr, $at

Should be:

   0x0000E5B8: 0x00C1002E '....' - msub       $a2, $at

Bug in vi2uc.q decoding

0x0000D420: 0xD03C83FE '..<.' - vi2uc.q    R702, C030

Destination is always single register. Following instruction should be decoded as follow:

0x0000D420: 0xD03C83FE '..<.' - vi2uc.q    S723, C030

Code

Archive

pops-gte - DisasmHints.wiki

Introduction

MIPS Extension

Extract bits

Insert bits

Conditional Move

Advanced Multiply

Count leading bits

MIPS Delay Slot Instructions

Small note on VFPU disassembly

VFPU Load/Store with 'shuffle'

lvl.q: Load Quad Word Left to VFPU

lvr.q: Load Quad Word Right to VFPU

svl.q: Store Quad Word Left from VFPU

svr.q: Store Quad Word Right from VFPU

VFPU Simple

VFPU Not Simple :)

vs2i.p: Convert signed short to integer Pair Word

VFPU Prefix Instructions

prxtool Bugs

Bug in msub decoding

Bug in vi2uc.q decoding