Introduction
PSP has custom CPU named Allegrex. It has additional MIPS instructions + VFPU (vector FPU) coprocessor.
On this page you can find quick description of these weird instructions, which can be found in disassembly.
MIPS Extension
Extract bits
ext rd, rs, p, s: Extract s bits from rs from position p and put into rd
Example:
rs = 0x11223344
ext rd, rs, 8, 8
rd = 0x33
Insert bits
ins rd, rs, p, s: Insert first s bits of rs into rd, from position p
Example:
rs = 0xAA
rd = 0x11223344
ins rd, rs, 16, 8
rd = 0x11AA3344
Widely used for RGB calculations.
Conditional Move
movn rd, rs, rt: if (rt != 0) rd = rs
movz rd, rs, rt: if (rt == 0) rd = rs
Advanced Multiply
madd rs, rt: Acc += rs rt
msub rs, rt: Acc -= rs rt
maddu rs, rt: Acc += (unsigned)rs (unsigned)rt
msubu rs, rt: Acc -= (unsigned)rs (unsigned)rt
Count leading bits
clo rd: rs: rd = Count leading ones of rs
clz rd: rs: rd = Count leading zeros of rs
Based on nice pdf: http://www.mips.com/media/files/MD00565-2B-MIPS32-QRC-01.01.pdf'>MIPS32 Quick Reference
MIPS Delay Slot Instructions
On the MIPS architecture, jump and branch instructions have a "delay slot". This means that the instruction after the jump or branch instruction is executed before the jump or branch is executed.
In addition, there is a group of "branch likely" conditional branch instructions in which the instruction in the delay slot is executed only if the branch is taken.
! So please remember, that BEQ is not the same as BEQL: BEQ delay slot is always executed, but BEQL delay slot executed only if branch taken. This may ruin whole disassembly logic.
Borrowed here: http://public.lanl.gov/totalview/online-4.1.0-4/user_guide/appc28.html'>http://public.lanl.gov/totalview/online-4.1.0-4/user_guide/appc28.html
Small note on VFPU disassembly
When I write:
[x] [y] [z] [w]
this mean single VFPU quad vector
When I write:
block number:
[0] [1] [2] [3]
[4] [5] [6] [7]
[8] [9] [A] [B]
[C] [D] [E] [F]
this mean contents of whole VFPU 'block'
It is very convenient to represent VFPU disassembly by steps, as content of its registers, for example:
2:
[L11L12] [L33] [VXY0] []
[L13L21] [RBK] [VZ0] []
[L22L23] [GBK] [VXY1] []
[L31L32] [BBK] [VZ1] []
0x0000DCB0: 0xDB8800A0 '....' - lv.q C200, 160($gp) // L11L12
0x0000DCB4: 0xDB8900B0 '....' - lv.q C210, 176($gp) // L33
0x0000DCB8: 0xDB8A0000 '....' - lv.q C220, 0($gp)
This writings represent final result of block #2 after three subsequent lv.q instructions.
For more details on VFPU registers look here: http://wiki.fx-world.org/doku.php?id=general:vfpu_registers'>VFPU Register Mapping
http://ogamespec.com/imgstore/whc4e200167b0e4e.jpg'>
When I write:
[F 10] [F 20] [F 40] [F 80]
this mean VFPU registers contain floats
When I write:
[I 10] [I 20] [I 40] [I 80]
this mean VFPU registers contain integers
If I/F of register is not specified, it supposed to be integer
VFPU Load/Store with 'shuffle'
lvl.q: Load Quad Word Left to VFPU
offset % 16address & ~0xfvfpu +0[1]
[2]
[3]
[4]
[-]
[-]
[-]
[1]
+4 [1]
[2]
[3]
[4]
[-]
[-]
[1]
[2]
+8 [1]
[2]
[3]
[4]
[-]
[1]
[2]
[3]
+12 [1]
[2]
[3]
[4]
[1]
[2]
[3]
[4]
[-]
mean keep unchanged
lvr.q: Load Quad Word Right to VFPU
offset % 16address & ~0xfvfpu +0[1]
[2]
[3]
[4]
[1]
[2]
[3]
[4]
+4 [1]
[2]
[3]
[4]
[2]
[3]
[4]
[-]
+8 [1]
[2]
[3]
[4]
[3]
[4]
[-]
[-]
+12 [1]
[2]
[3]
[4]
[4]
[-]
[-]
[-]
[-]
mean keep unchanged
svl.q: Store Quad Word Left from VFPU
offset % 16vfpuaddress & ~0xf +0[1]
[2]
[3]
[4]
[4]
[-]
[-]
[-]
+4 [1]
[2]
[3]
[4]
[3]
[4]
[-]
[-]
+8 [1]
[2]
[3]
[4]
[2]
[3]
[4]
[-]
+12 [1]
[2]
[3]
[4]
[1]
[2]
[3]
[4]
[-]
mean keep unchanged
svr.q: Store Quad Word Right from VFPU
offset % 16vfpuaddress & ~0xf +0[1]
[2]
[3]
[4]
[1]
[2]
[3]
[4]
+4 [1]
[2]
[3]
[4]
[-]
[1]
[2]
[3]
+8 [1]
[2]
[3]
[4]
[-]
[-]
[1]
[2]
+12 [1]
[2]
[3]
[4]
[-]
[-]
[-]
[1]
[-]
mean keep unchanged
Example of use:
Some piece of GTE calculations require to "push" FIFO registers:
RGB0 = RGB1
RGB1 = RGB2
RGB2 = RGB
So we do:
lv.q C000, >eData[20]
.... do some calculations of RGB value, so C000 would be:
[RGB0] [RGB1] [RGB2] [RGB]
svl.q C000, >eData[22]
It writes to memory:
[RGB0] [RGB1] [RGB2] [RES1] <= [RGB1] [RGB2] [RGB] [unchanged]
VFPU Simple
instruction description operation vi2f.s Convert integer to float with Scaling Single Word[d0]
[s0]) / (1
vi2f.p Convert integer to float with Scaling Pair Word same for pair
vi2f.t Convert integer to float with Scaling Triple Word same for triple
vi2f.q Convert integer to float with Scaling Quad Word same for quad
vmax.s Maximum Single Word [d0]
[s0],[t0]
)
vmax.p Maximum Pair Word same for pair
vmax.t Maximum Triple Word same for triple
vmax.q Maximum Quad Word same for quad
vmin.s Minimum Single Word [d0]
[s0],[t0]
)
vmin.p Minimum Pair Word same for pair
vmin.t Minimum Triple Word same for triple
vmin.q Minimum Quad Word same for quad
vmov.s Move Single Word [d0]
[s0]
vmov.p Move Pair Word [d0]
[d1]
[s0] [s1]
vmov.t Move Triple Word [d0]
[d1]
[d2]
[s0] [s1]
[s2]
vmov.q Move Quad Word [d0]
[d1]
[d2]
[d3]
[s0] [s1]
[s2]
[s3]
vzero.s Set Zero Single Word [d0]
vzero.p Set Zero Pair Word [d0]
[d1]
vzero.t Set Zero Triple Word [d0]
[d1]
[d2]
vzero.q Set Zero Quad Word [d0]
[d1]
[d2]
[d3]
VFPU Not Simple :)
vs2i.p: Convert signed short to integer Pair Word
Four packed signed shorts converted to four unpacked longs.
Example:
Source: [0xAABBCCDD] [0x11223344] [-] [-]
Dest: [0xCCDD0000] [0xAABB0000] [0x33440000] [0x11220000]
! Dont forget to shift >> 16 all result values before use
Used to unpack GTE registers, such as:
8 L11L12 |L12 1, 3,12|L11 1, 3,12| Light source matrix elements 11, 12
9 L13L21 |L21 1, 3,12|L13 1, 3,12| Light source matrix elements 13, 21
10 L22L23 |L23 1, 3,12|L22 1, 3,12| Light source matrix elements 22, 23
11 L31L32 |L32 1, 3,12|L31 1, 3,12| Light source matrix elements 31, 32
VFPU Prefix Instructions
VFPU has three prefix instructions: vpfxd, vpfxs and vpfxt which is applied to rd (destination), rs (source) and rt (target) operands of the next instruction.
You can shuffle or discard some of operands, by applying prefix on it. For example:
vpfxt [x, y, 0, 0]
vadd.q rd, rs, rt
rt will be [x, y, 0, 0] before vadd.
Prefix table codes for source and target:
[0]
y v[1]
z v[2]
w v[3]
-x -v[0]
-y -v[1]
-z -v[2]
-w -v[3]
|x|
abs(v[0]
)
|y|
abs(v[1]
)
|z|
abs(v[2]
)
|w|
abs(v[3]
)
-|x|
-abs(v[0]
)
-|y|
-abs(v[1]
)
-|z|
-abs(v[2]
)
-|w|
-abs(v[3]
)
0 0
1 1
2 2
3 3
1/2 1/2
1/3 1/3
1/4 1/4
1/6 1/6
Prefix table codes for destination:
pass through
m ignore
0:1 clamp value to [0.0; 1.0]
-1:1 clamp value to [-1.0; 1.0]
prxtool Bugs
Bug in msub decoding
All msub instructions use $zr instead proper RS register, for example:
0x0000E5B8: 0x00C1002E '....' - msub $zr, $at
Should be:
0x0000E5B8: 0x00C1002E '....' - msub $a2, $at
Bug in vi2uc.q decoding
0x0000D420: 0xD03C83FE '..<.' - vi2uc.q R702, C030
Destination is always single register. Following instruction should be decoded as follow:
0x0000D420: 0xD03C83FE '..<.' - vi2uc.q S723, C030