Software optimization resources
Contents
- Optimization manuals
- Test programs for measuring clock cycles in C++ and assembly code
- Object file converter and disassembler
- Subroutine library
- Assembly macros for new instructions not supported by old assemblers
- Links
Optimization manuals
This series of five manuals describes everything you need to know about optimizing code for Intel and AMD microprocessors, including optimization advices for C++ and assembly language, details about the microarchitecture and instruction timings of Intel and AMD processors, and details about different compilers and calling conventions.
Intel microprocessors covered: Intel Pentium 1 through Pentium 4, Pentium D, Pentium M, Core Duo, Core 2, etc., but not Itanium. AMD microprocessors covered: Athlon 64, Opteron. Operating systems covered: DOS, Windows, Linux, BSD, Mac OS X Intel based. Includes coverage of 64-bit systems.
Note that these manuals are not for beginners.
- 1. Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms
- This is an optimization manual for advanced C++ programmers.
Topics include: The choice of platform and operating system. Choice of
compiler and framework. Finding performance bottlenecks.
The efficiency of different C++ constructs. Multi-core systems.
Parallelization with vector operations. CPU dispatching. Efficient
container class templates. Etc.
File name: optimizing_cpp.pdf, size: 694426, last modified: 2008-Jan-14.
Download.
- 2. Optimizing subroutines in assembly language: An optimization guide for x86 platforms
- This is an optimization manual for advanced assembly language programmers
and compiler makers.
Topics include: C++ instrinsic functions, inline assembly and stand-alone assembly.
Linking optimized assembly subroutines into high level language programs.
Making subroutine libraries compatible with multiple compilers and operating systems.
Optimizing for speed or size. Memory access. Loops. Vector programming (XMM, SIMD).
CPU-specific optimization and CPU dispatching.
File name: optimizing_assembly.pdf, size: 685716, last modified: 2008-Jan-14.
Download.
- 3. The microarchitecture of Intel and AMD CPU’s: An optimization guide for assembly programmers and compiler makers
- This manual contains details about the internal working of various microprocessors
from Intel and AMD. Topics include: Out-of-order execution, register renaming,
pipeline structure, execution unit organization and branch prediction algorithms
for each type of microprocessor. Describes many details that cannot be found
in manuals from microprocessor vendors or anywhere else. The information is
based on my own research and measurements rather than on official sources.
This information will be useful to programmers who want to make CPU-specific
optimizations as well as to compiler makers and students of microarchitecture.
File name: microarchitecture.pdf, size: 1073859, last modified: 2008-Jan-14.
Download.
- 4. Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel and AMD CPU's
- Contains detailed lists of instruction latencies, execution unit throughputs,
micro-operation breakdown and other details for all application instructions
of most microprocessors from Intel and AMD. Intended as an appendix to the
preceding manuals.
File name: instruction_tables.pdf, size: 708277, last modified: 2008-Jan-14.
Download.
- 5. Calling conventions for different C++ compilers and operating systems
- This document contains details about data representation,
function calling conventions, register usage conventions, name mangling schemes,
etc. for many different C++ compilers and operating systems. Discusses compatibilities
and incompatibilities between different C++ compilers. Includes information that
is not covered by the official Application Binary Interface standards (ABI's).
The information provided here is based on my own research and therefore
descriptive rather than normative.
Intended as a source of reference for programmers who want to make function
libraries compatible with multiple compilers or operating systems and for
makers of compilers and other development tools who want their tools to be
compatible with existing tools.
File name: calling_conventions.pdf, size: 315622, last modified: 2008-Jan-14.
Download.
- All five manuals
- Download all the above manuals together in one zip file.
File name: optimization_manuals.zip, size: 2595969, last modified: 2008-Jan-14.
Download.
If you don't know how to read the .pdf files click here.
Test programs for measuring clock cycles and performance monitoring
These are some of the test programs I have used for my research. You can use them for testing how many clock cycles a piece of assembly or C++ code takes. Can also count cache misses, branch mispredictions, resource stalls etc. Supports Intel processors from Pentium to Core 2 and AMD Athlon and Opteron. Includes different versions for 16, 32 and 64 bit mode, Windows and Linux.
File name: testp.zip, size: 270536, last modified: 2007-Sep-23.
Download.
Object file converter
This utility can be used for converting object files between COFF/PE, OMF, ELF and Mach-O formats for all 32-bit and 64-bit x86 platforms. Can also be used as a cross-platform library manager, a dump utility, and a powerful disassembler supporting the SSE4 and SSE5 instruction sets. Source code included (GPL).
File name: objconv.zip, size: 551217, last modified: 2008-Mar-24.
Download.
Subroutine library
This is a library of subroutines coded in assembly language. The functions in this library can be called from C++ and other compiled high-level languages. Different object file formats are supplied to support different compilers under Windows, Linux, BSD and Mac OS X operating systems, 32 and 64 bits. The library contains the following functions:
- int Round (double x); int Round (float x);
- Fast conversion of floating point number to integer.
Rounds to nearest or even.
- int ReadTSC (void);
- Reads internal microprocessor clock counter. Use this for measuring how
many clock cycles a piece of code takes.
- int InstructionSet (void);
- Gets information about which instruction sets are supported by the
microprocessor and the operating system, e.g. SSE2, SSE3, SSE4.
- void ProcessorName (char * text);
- Makes a zero-terminated text string with a short description of the
microprocessor name and type.
- void Serialize (void);
- Serializes execution.
The package contains library files in six different file formats, C++ header file and assembly language source code. Gnu general public license applies.
File name: asmlib.zip, size: 89078, last modified: 2007-Sep-23.
Download.
Macros for new instructions not supported by old assemblers
Use these macros for coding new instructions on assemblers that don't have these instructions. Supports MMX - supplementary SSE3 (SSSE3) instruction sets. Works on MASM, ML and TASM.
File name: macros.zip, size: 14667, last modified: 2006-Dec-27.
Download.
Useful assembly links
Masm Forum www.masmforum.com
ASM Community Messageboard www.asmcommunity.net/board/
Linux Assembly www.linuxassembly.org
Hutch's Assembly pages www.movsd.com
Iczelion's Win32 Assembly Homepage win32asm.cjb.net/
CPU-id tools and information www.cpuid.com
Programmer's heaven assembler zone Programmers' Heaven
X-bit Labs articles on microprocessors www.xbitlabs.com/articles/cpu/
Virtual sandpile x86 Processor information www.sandpile.org
Dr. Dobb's journal microprocessor resources www.x86.org
Online computer books www.computer-books.us/assembler.php
FASM assembler and messageboard flatassembler.net
NASM assembler sourceforge.net/projects/nasm
YASM assembler www.tortall.net/projects/yasm
Intel resources
Reference manuals and other documents can be found at Intel's web site. Intel's web site is refurnished so often that any link I could provide here to specific documents would be broken after a few months. I will therefore recommend that you use the search facilities at developer.intel.com and search for "Software Developer's Manual" and "Optimization Reference Manual".
AMD resources
www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739,00.html
Microsoft resources
MASM manuals msdn.microsoft.com/library/default.asp?url=/library/en-us/vcmasm/html/vcoriMicrosoftAssemblerMacroLanguage.asp
177310