pyasm User's Guide V. 0.3
by
Grant Olson <kgo at grant-olson dot net>
Introduction
Pyasm is a full-featured dynamic assembler written entirely in
Python. By
dynamic, I mean that it can be used to generate and execute machine code
in
python at runtime without requiring the generation of object files and
linkage.
It essentially allow 'inline' assembly in python modules on x86
platforms.
Pyasm can also generate object files (for windows) like a traditional
standalone assembler, although you're probably better off using one of
the many
freely available assemblers if this is you primary goal.
Installation
Pyasm currently requires python 2.6.
Linux Install:
Windows Install:
Hello World!
A simple Windows version of a hello_world.py program is as follows:
# # Hello World in assembly: pyasm/examples/hello_World.py # #
from pyasm import pyasm
pyasm(globals(),r""" !PROC hello_world PYTHON !ARG self !ARG args !CALL PySys_WriteStdout "Hello, World!\n\0" ADD ESP, 0x4 MOV EAX,PyNone ADD [EAX],1 !ENDPROC """)
hello_world()
A brief description of what is happening durring the pyasm call:
-
the globals() statement tells pyasm where to bind
newly created python
functions
-
The !CHARS directive creates a string constant.
-
The !PROC and !ARG directives create a procedure
that matches the
standard CPythonFunction signature [PyObject* hello_world(PyObject*
self,
PyObject* args) and create procedure initialization code.
-
The procedure calls python's PySys_WriteStdout
function. Since python functions
use CDECL calling conventions, we:
- PUSH the paramters onto the stack from right to left
- CALL the function
- Cleanup the stack ourselves
-
PyCFunctions must return some sort of python
object, so we:
- Load PyNone into the EAX register, which will become the return
value.
- Add one to the reference count
6. The !ENDPROC directive ends the procedure and creates function
cleanup
code. This creates a procedure called hello_world that would have the C
signature of PyObject* hello_world(PyObject* self, PyObject* args).
The
procedure loads hello_str onto the stack, calls the python interpreters
PySys_WriteStdout function,
- Calling hello_world() executes the newly created function.
Warning
The rest of this document assumes that you know x86
assembly language. A
tutorial is beyond the scope of this document. If you don't know
assembly
language, you'll want to read an introductory text (such as The Art
of
Assembly Language) as well as downloading Volumes 2 and 3 of the IA-32
Intel
Architecture Software Developer's Manual for reference.
Everyday usage
Assembler Syntax
Like most assemblers, the command-line assembler contains a very
simple parser.
There two basic statements that can be used. An instruction
statement and an
assembler directive. Assembler directives contain
information that makes
your assembly a little easier to read than raw assembly code, such as
the
begining and ending of function; declaration of parameters, variables,
constants and data; and other stuff. Instruction Statements
consist of real
assembly instructions such as MOV [EAX+4],value
Additional notes specific to this assembler are as follows:
- Numbers use python's formatting scheme, so hex is represented as
0xFF and not FFh.
- Instructions and Registers must be in all caps. mov eax,0x0 is
invalid.
Instruction Statements
Instruction statements are reasonably straightforward if you know x86
assembly
language.
Assembler
Directives
Assembler directives begin with an exclamation mark, followed by the
directive
itself, and followed by any applicable parameters. Keep in mind that
these
directives are provided for the programmer's convienence. Anything that
is
done via a directive could be translated into raw assembly, it's just
not as
readable.
Text Directive |
API Call |
Brief Description |
!CALL proc [arg arg] |
n/a |
Procedure call framework |
!CHARS name value |
.AStr(n,v) |
Create a character array (aka a string) |
!COMMENT text |
n/a |
Comment line. |
!CONST name value |
.AC(n,v) |
Create a constant value. |
!LABEL name |
.AIL(name) |
Provide a symbolic label for later ref. |
!PROC name [type] |
.AP(name,type) |
Begin a procedure. |
!ARG argname [size] |
.AA(name,size) |
Add an argument to a procedure def. |
!LOCAL varname [size] |
.ALoc(name,size) |
Add a local var to a procedure def. |
!ENDPROC |
.EP() |
End a procedure |
- !CALL proc [arg arg arg]
- A convienence function for procedure calling. PUSHes arguments onto
the
stack from right to left and calls the appropriate procedure. Stack
cleanup
(if any) is still the programmer's responsibility.
- !CHARS name value
- Create a character array (aka a string)
- !COMMENT text
- Ignore this line.
- !CONST name value
- Just declares a constant that is replaced in subsequent occurances.
Keep in
mind that this is resolved at compile time, so the values should really
only
be numbers. !CONST hello_world "hello world\n\0" is invalid.
- !LABEL name
- Provide a symbolic label to the current memory address. Primarily
used for
loops, if-then logic, etc. You can use a label and hand-roll a
procedure,
but you probably want to use the !PROC directive instead.
- !PROC name[type]
- Begin a procedure. This will emit the boilerplate code to start a
procedure.
Arguments and Local variables can be declared with !ARG and !LOCAL
directives
listed below. These declarations must occur before any instruction
statements or an error will occur. This will generate the boilerplate
function startup code, which consists of PUSHing the EBP register,
copying
the current location of ESP, and translating arguments and local
variables
into references via the offset of the EBP pointer. If the previous
sentence
didn't make any sense to you, just remember that the EBP register
shouldn't
be manipulatedin your code here or things will get screwed up.
- !ARG argname [size]
- An argument passed to a procedure via the stack. By default, we
assume the
size is 4 bytes although you can specify if you need to.
- !LOCAL varname [size]
- A local variable maintained on the procedure's stack frame.
- !ENDPROC
- End a procedure. Emit the cleanup code as the caller's
responsibility.]
Typical Usage
Typically, usage is as simple as the hello world example listed
above. Import
the pyasm function from the pyasm package and call it. globals blah
blah blah.
Assembly via the
API
calling pyasm is fine if you're just trying to inline some assembly
functions,
but if you're trying to dynamically generate assembly (such as writing a
python
compiler) you're better off accessing the api directly. This involves a
few
steps:
- import the assembler class from x86 asm and instantiate.
- Add instructions either as strings that need to be preprocessed or
via the api.
- generate an intermediate 'codepackage' by calling the .Compile()
method
- transform the codePackage to runtime memory via CpToMemory.
Command-line
assembler
NOT IMPLEMENTED YET
If you really want to, a command-line asembler is available for
usage. Usage
is straightforward:
python pyassemble.py asmfile.asm
This will generate an object file asmfile.o that can be used by your
linker.
Debugging Tips
If you write assembly, chances are that you are going to crash your
app at one
point or another.
Linux Debuggers
On Linux, you obviously have gdb.
Windows debuggers
Contrary to popular belief, there is a buildin command-line debugger
on Windows
NT/2000/XP called ntsd.exe that can be used in a bind. If you're doing
any
serious work though, do yourself a favor and download the 18MB
"Debugging Tools
for Windows." It includes an updated version of ntsd.exe and a version
with a
simple Windows interface called WinDBG. You'll really want to download
this if
you're getting serious about assembly debugging. Actual usage is beyond
the
scope of this document, but read up on setting up a symbol server.
Tip
After installing, you may want to register WinDBG as the
default
debugger by cd'ing to the program directory and issuing 'windbg -I'
This will
cause WinDBG to spawn automatically when any program crashes or executes
an INT
3 instruction. It also has the added benefit of making friends and
co-workers
think that you're a much more hardcore programmer than you really are.
The
jury is still out as to whether this impresses the ladies or not.
And yes, there is the Visual Studio .NET debugger. This is a great
debugger
when you're debugging C or VB code in an existing project. But it is
designed to work as part of an IDE. It gets a little wierd when
debugging
raw assembly or compiled code without the source floating around. As
ugly as
WinDBG's gui looks like by todays standards, it is a lot more convienent
in
these cases.
Source output - *not implemented yet
I plan to provide a hook via the logging module so you can obtain
disassembly
of the source at runtime.
Feel like
contributing?
Any and all patches will be considered. If you're planning on
implenting
anything serious you may want to run it by me so you don't end up
wasting your
time. There is some low-hanging fruit out there though.
Adding x86
instructions
I haven't added all of the x86 instructions yet. Most of it involves
cutting
and pasting from the IA32 Intel Software Architecture Manual Volumes 2
and 3.
For standard instructions, you should just be able to add the
appropriate text
to x86inst.py and creating a test in test_instructions.py. SIMD and FPU
operations will probably require some additional hacking.
ELF
serialization/deserialization
There is currently code that converts windows COFF objects to a
python-based
object model and vice versa. This allows you to create standard object
files
for traditional linking. An equivilent for ELF files would allow you to
do the
same thing in Linux. Refer to the coff*.py files to see how this format
was
implemented.
New in version 0.3
- You can now run the test cases via mingw as well as msvc. Set the
command
in test/linkCmd.py appropraitely. Thanks to Markus Lall for figuring
out
how to do this.
- Updated to python 2.6.
- Updated MSVC project files to VC 2008.
- Python structure values are loaded automatically if desired. For
Example,
assuming EAX is a pointer to a string MOV
[EAX+PyString_ob_sval],0x42424242
will change the first four letters of the string to B's.
- Preliminary debugging console to view generation of assembly at
various
stages in the compilation pipeline.
- Implicit string variable creation is now possible. e.g. "PUSH
'foon0'"
now works instead of requiring "!CHARS foo 'foon0'" and "PUSH foo"
- New !CALL assembler directive handles throwing arguements onto the
stack.
e.g. "!CALL foo bar baz bot" instead of "PUSH bot" "PUSH baz" "PUSH
bar"
"CALL foo"
- Fixed tokenizer for instruction definitions with numbers in them
such as
INT 3
- Now includes an 'examples' directory that should be easier for users
to
read than the test directory.
- Show symbol name in disassembly if it exists.
|
Attachments (6)
-
pyasm-0.3.tar.gz - on Mar 28, 2010 1:01 PM by Grant Olson (version 1)
41k
Download
-
pyasm-0.3.tar.gz.sig - on Mar 28, 2010 1:01 PM by Grant Olson (version 1)
1k
Download
-
pyasm-0.3.win32-py2.6.exe.zip - on Mar 28, 2010 1:03 PM by Grant Olson (version 1)
160k
Download
-
pyasm-0.3.win32-py2.6.exe.zip.sig - on Mar 28, 2010 1:04 PM by Grant Olson (version 1)
1k
Download
-
pyasm-0.3.zip - on Mar 28, 2010 1:02 PM by Grant Olson (version 1)
54k
Download
-
pyasm-0.3.zip.sig - on Mar 28, 2010 1:02 PM by Grant Olson (version 1)
1k
Download
|