Cell Simulator

NOT DONE YET

Mambo supports a simple way to start and stop simple profiling. This tutorial will go through some basics of producing profile data on a program. Before we do that, we need something to profile.

If you look through Mambo's samples, provided is a matrix multiply program. Unfortunately it is 1577 lines of code. The version titled "Simple Matrix Multiple" in the code section is 167 lines. To keep this web page short, just download that file, and we will begin editting it for simple profiling. (Or just get it here.)

One quick note about the program before we start on profiling, the Matrix Multiple first passes a "control block" from the PPU to the SPU. In the control block are the addresses of the Matrices to calculate. The passing of a control block (at the begining) is a "standard" way of "initializing" the SPUs.

For profiling, there are three main functions:
prof_clear() clears the proformance data
prof_start() starts profiling
prof_end stops profiling

For out example, first include <profile.h> After the declaration of variables, call the prof_clear and prof_start functions. At the end, (before the return 0;) put prof_end

Now, before you load the linux kernel in Mambo, you must set the SPUs to be "profiledable". Just click "SPU modes" and set all of them to "Pipe". Next, load up the kernel, and run the program. (You can let the simulator run in fast mode while loading the kernel. Once, you run the program, be sure it is not in fast mode.)

After the program's completion, stop the simulator (hit 'stop', or do a "ctrl-c") and then type: systemsim % mysim spu 0 display statistics which should provide something like:
SPU DD3.0
***
Total Cycle count               550148
Total Instruction count         643
Total CPI                       855.60
***
Performance Cycle count         515157
Performance Instruction count   369836 (330894)
Performance CPI                 1.39 (1.56)

Branch instructions             37384
Branch taken                    36862
Branch not taken                522

Hint instructions               1026
Hint hit                        36855

Contention at LS between Load/Store and Prefetch 37384

Single cycle                                            195700 ( 38.0%)
Dual cycle                                               67597 ( 13.1%)
Nop cycle                                                 5121 (  1.0%)
Stall due to branch miss                                  8991 (  1.7%)
Stall due to prefetch miss                                   0 (  0.0%)
Stall due to dependency                                 234022 ( 45.4%)
Stall due to fp resource conflict                            0 (  0.0%)
Stall due to waiting for hint target                      1025 (  0.2%)
Stall due to dp pipeline                                     0 (  0.0%)
Channel stall cycle                                       2701 (  0.5%)
SPU Initialization cycle                                     0 (  0.0%)
-----------------------------------------------------------------------
Total cycle                                             515157 (100.0%)

Stall cycles due to dependency on each pipelines
 FX2        4115 (  1.8% of all dependency stalls)
 SHUF       79360 ( 33.9% of all dependency stalls)
 FX3        1 (  0.0% of all dependency stalls)
 LS         150546 ( 64.3% of all dependency stalls)
 BR         0 (  0.0% of all dependency stalls)
 SPR        0 (  0.0% of all dependency stalls)
 LNOP       0 (  0.0% of all dependency stalls)
 NOP        0 (  0.0% of all dependency stalls)
 FXB        0 (  0.0% of all dependency stalls)
 FP6        0 (  0.0% of all dependency stalls)
 FP7        0 (  0.0% of all dependency stalls)
 FPD        0 (  0.0% of all dependency stalls)

The number of used registers are 20, the used ratio is 15.62
dumped pipeline stats
systemsim % 
Naturally, you can replace mysim spu 0 display statistics with another spu to see it's statistics. Furthermore, prof_clear() resets the profile data, prof_start() starts the profiling, and prof_stop() stops/suspends the profiling. (Profiling can be restarted with prof_start().)