python‎ > ‎

pyASM

pyasm User's Guide V. 0.3

by Grant Olson <kgo at grant-olson dot net>

Introduction

Pyasm is a full-featured dynamic assembler written entirely in Python. By dynamic, I mean that it can be used to generate and execute machine code in python at runtime without requiring the generation of object files and linkage. It essentially allow 'inline' assembly in python modules on x86 platforms.

Pyasm can also generate object files (for windows) like a traditional standalone assembler, although you're probably better off using one of the many freely available assemblers if this is you primary goal.

Installation

Pyasm currently requires python 2.6.

Linux Install:

Windows Install:

Hello World!

A simple Windows version of a hello_world.py program is as follows:

#
# Hello World in assembly: pyasm/examples/hello_World.py
#
#

from pyasm import pyasm

pyasm(globals(),r"""
!PROC hello_world PYTHON
!ARG self
!ARG args
!CALL PySys_WriteStdout "Hello, World!\n\0"
ADD ESP, 0x4
MOV EAX,PyNone
ADD [EAX],1
!ENDPROC
""")

hello_world()

A brief description of what is happening durring the pyasm call:

  1. the globals() statement tells pyasm where to bind newly created python functions

  2. The !CHARS directive creates a string constant.

  3. The !PROC and !ARG directives create a procedure that matches the standard CPythonFunction signature [PyObject* hello_world(PyObject* self, PyObject* args) and create procedure initialization code.

  4. The procedure calls python's PySys_WriteStdout function. Since python functions use CDECL calling conventions, we:

    1. PUSH the paramters onto the stack from right to left
    2. CALL the function
    3. Cleanup the stack ourselves
  5. PyCFunctions must return some sort of python object, so we:

    1. Load PyNone into the EAX register, which will become the return value.
    2. Add one to the reference count

6. The !ENDPROC directive ends the procedure and creates function cleanup code. This creates a procedure called hello_world that would have the C signature of PyObject* hello_world(PyObject* self, PyObject* args). The procedure loads hello_str onto the stack, calls the python interpreters PySys_WriteStdout function,

  1. Calling hello_world() executes the newly created function.

Warning

The rest of this document assumes that you know x86 assembly language. A tutorial is beyond the scope of this document. If you don't know assembly language, you'll want to read an introductory text (such as The Art of Assembly Language) as well as downloading Volumes 2 and 3 of the IA-32 Intel Architecture Software Developer's Manual for reference.

Everyday usage

Assembler Syntax

Like most assemblers, the command-line assembler contains a very simple parser. There two basic statements that can be used. An instruction statement and an assembler directive. Assembler directives contain information that makes your assembly a little easier to read than raw assembly code, such as the begining and ending of function; declaration of parameters, variables, constants and data; and other stuff. Instruction Statements consist of real assembly instructions such as MOV [EAX+4],value

Additional notes specific to this assembler are as follows:

  • Numbers use python's formatting scheme, so hex is represented as 0xFF and not FFh.
  • Instructions and Registers must be in all caps. mov eax,0x0 is invalid.

Instruction Statements

Instruction statements are reasonably straightforward if you know x86 assembly language.

Assembler Directives

Assembler directives begin with an exclamation mark, followed by the directive itself, and followed by any applicable parameters. Keep in mind that these directives are provided for the programmer's convienence. Anything that is done via a directive could be translated into raw assembly, it's just not as readable.

Text Directive API Call Brief Description
!CALL proc [arg arg] n/a Procedure call framework
!CHARS name value .AStr(n,v) Create a character array (aka a string)
!COMMENT text n/a Comment line.
!CONST name value .AC(n,v) Create a constant value.
!LABEL name .AIL(name) Provide a symbolic label for later ref.
!PROC name [type] .AP(name,type) Begin a procedure.
!ARG argname [size] .AA(name,size) Add an argument to a procedure def.
!LOCAL varname [size] .ALoc(name,size) Add a local var to a procedure def.
!ENDPROC .EP() End a procedure
!CALL proc [arg arg arg]
A convienence function for procedure calling. PUSHes arguments onto the stack from right to left and calls the appropriate procedure. Stack cleanup (if any) is still the programmer's responsibility.
!CHARS name value
Create a character array (aka a string)
!COMMENT text
Ignore this line.
!CONST name value
Just declares a constant that is replaced in subsequent occurances. Keep in mind that this is resolved at compile time, so the values should really only be numbers. !CONST hello_world "hello world\n\0" is invalid.
!LABEL name
Provide a symbolic label to the current memory address. Primarily used for loops, if-then logic, etc. You can use a label and hand-roll a procedure, but you probably want to use the !PROC directive instead.
!PROC name[type]
Begin a procedure. This will emit the boilerplate code to start a procedure. Arguments and Local variables can be declared with !ARG and !LOCAL directives listed below. These declarations must occur before any instruction statements or an error will occur. This will generate the boilerplate function startup code, which consists of PUSHing the EBP register, copying the current location of ESP, and translating arguments and local variables into references via the offset of the EBP pointer. If the previous sentence didn't make any sense to you, just remember that the EBP register shouldn't be manipulatedin your code here or things will get screwed up.
!ARG argname [size]
An argument passed to a procedure via the stack. By default, we assume the size is 4 bytes although you can specify if you need to.
!LOCAL varname [size]
A local variable maintained on the procedure's stack frame.
!ENDPROC
End a procedure. Emit the cleanup code as the caller's responsibility.]

Typical Usage

Typically, usage is as simple as the hello world example listed above. Import the pyasm function from the pyasm package and call it. globals blah blah blah.

Assembly via the API

calling pyasm is fine if you're just trying to inline some assembly functions, but if you're trying to dynamically generate assembly (such as writing a python compiler) you're better off accessing the api directly. This involves a few steps:

  1. import the assembler class from x86 asm and instantiate.
  2. Add instructions either as strings that need to be preprocessed or via the api.
  3. generate an intermediate 'codepackage' by calling the .Compile() method
  4. transform the codePackage to runtime memory via CpToMemory.

Command-line assembler

NOT IMPLEMENTED YET

If you really want to, a command-line asembler is available for usage. Usage is straightforward:

python pyassemble.py asmfile.asm

This will generate an object file asmfile.o that can be used by your linker.

Debugging Tips

If you write assembly, chances are that you are going to crash your app at one point or another.

Linux Debuggers

On Linux, you obviously have gdb.

Windows debuggers

Contrary to popular belief, there is a buildin command-line debugger on Windows NT/2000/XP called ntsd.exe that can be used in a bind. If you're doing any serious work though, do yourself a favor and download the 18MB "Debugging Tools for Windows." It includes an updated version of ntsd.exe and a version with a simple Windows interface called WinDBG. You'll really want to download this if you're getting serious about assembly debugging. Actual usage is beyond the scope of this document, but read up on setting up a symbol server.

Tip

After installing, you may want to register WinDBG as the default debugger by cd'ing to the program directory and issuing 'windbg -I' This will cause WinDBG to spawn automatically when any program crashes or executes an INT 3 instruction. It also has the added benefit of making friends and co-workers think that you're a much more hardcore programmer than you really are. The jury is still out as to whether this impresses the ladies or not.

And yes, there is the Visual Studio .NET debugger. This is a great debugger when you're debugging C or VB code in an existing project. But it is designed to work as part of an IDE. It gets a little wierd when debugging raw assembly or compiled code without the source floating around. As ugly as WinDBG's gui looks like by todays standards, it is a lot more convienent in these cases.

Source output - *not implemented yet

I plan to provide a hook via the logging module so you can obtain disassembly of the source at runtime.

Feel like contributing?

Any and all patches will be considered. If you're planning on implenting anything serious you may want to run it by me so you don't end up wasting your time. There is some low-hanging fruit out there though.

Adding x86 instructions

I haven't added all of the x86 instructions yet. Most of it involves cutting and pasting from the IA32 Intel Software Architecture Manual Volumes 2 and 3. For standard instructions, you should just be able to add the appropriate text to x86inst.py and creating a test in test_instructions.py. SIMD and FPU operations will probably require some additional hacking.

ELF serialization/deserialization

There is currently code that converts windows COFF objects to a python-based object model and vice versa. This allows you to create standard object files for traditional linking. An equivilent for ELF files would allow you to do the same thing in Linux. Refer to the coff*.py files to see how this format was implemented.

New in version 0.3

  • You can now run the test cases via mingw as well as msvc. Set the command in test/linkCmd.py appropraitely. Thanks to Markus Lall for figuring out how to do this.
  • Updated to python 2.6.
  • Updated MSVC project files to VC 2008.
  • Python structure values are loaded automatically if desired. For Example, assuming EAX is a pointer to a string MOV [EAX+PyString_ob_sval],0x42424242 will change the first four letters of the string to B's.
  • Preliminary debugging console to view generation of assembly at various stages in the compilation pipeline.
  • Implicit string variable creation is now possible. e.g. "PUSH 'foon0'" now works instead of requiring "!CHARS foo 'foon0'" and "PUSH foo"
  • New !CALL assembler directive handles throwing arguements onto the stack. e.g. "!CALL foo bar baz bot" instead of "PUSH bot" "PUSH baz" "PUSH bar" "CALL foo"
  • Fixed tokenizer for instruction definitions with numbers in them such as INT 3
  • Now includes an 'examples' directory that should be easier for users to read than the test directory.
  • Show symbol name in disassembly if it exists.

Attachments (6)

  • pyasm-0.3.tar.gz - on Mar 28, 2010 1:01 PM by Grant Olson (version 1)
    41k Download
  • pyasm-0.3.tar.gz.sig - on Mar 28, 2010 1:01 PM by Grant Olson (version 1)
    1k Download
  • pyasm-0.3.win32-py2.6.exe.zip - on Mar 28, 2010 1:03 PM by Grant Olson (version 1)
    160k Download
  • pyasm-0.3.win32-py2.6.exe.zip.sig - on Mar 28, 2010 1:04 PM by Grant Olson (version 1)
    1k Download
  • pyasm-0.3.zip - on Mar 28, 2010 1:02 PM by Grant Olson (version 1)
    54k Download
  • pyasm-0.3.zip.sig - on Mar 28, 2010 1:02 PM by Grant Olson (version 1)
    1k Download