Using ABC Assembler

Most of the code in ABC is written in the ABC language. However, there are several cases in which it is advantageous or necessary to code in ABC assembler:

This manual assumes that you have a good understanding of programming, assemblers, compilers, interpreters and stack-based execution. Any messages you implement incorrectly can corrupt an environment or crash the ABC system so caution is advised in the use of the assembler.

The following sections describe the processes and elements to compile methods and run them using the interpreter.

The Assembly Process

The statements for a method are compiled into bytecodes which are executed by a stack-based interpreter. The function of the compiler is to concatenate the bytecodes for each message into one bytecode stream for the whole method. For non-ASM methods, the message is converted into bytecodes which will call the appropriate method at runtime. However, for ASM methods, the ASM code you supply is used to replace the message call and no call is made at runtime.

The Interpreter

The interpreter uses a stack of pointers for its stack space. When a ABC session begins, the execution stack is initialized with the contents shown below. Note that the stack pushes and pops pointers to objects, not the objects themselves. Any variable accessible to a method is located somewhere on this stack.

	Position	Object 
	>10		Computation Space
	10		myTask
	9		unused
	8		Command Line (argv)
	7		Event Task
	6		Array of Types
	5		Tail of task queue
	4		system
	3		exception
	2		widgets
	1		master
	0		register

When a task is initiated and its methods are invoked, the stack space above task is allocated and released as needed. If this task terminates or must relinquish control of the interpreter, the stack contents for the current task are saved and the contents for the new task are restored.

The interpreter primarily executes opcodes which push and pop object pointers on the stack. In some cases, it will also call external C routines to conduct window operations, complex computations or other operations. It will also call on the Taskmaster to manage the task queue and determine the next ready task to execute.

Assembler Format

A section of assembly code consists of opcodes and associated parameters, label names beginning with dollar signs, assembler directives like %load and %store, or nested code surrounded by curly brackets. See below for an example.

    whileDo aBlock:Block
       ASM 
       { { START.
           BRF END.
           %load aBlock.
           %load me.
           BRU START.
           END.
         }.
         PUF.
       }

The layout of the lines is not significant, however each opcode must be terminated by a period. Branches to labels are normally resolved within a nested code section, however branches from inner to outer code sections is permitted.

The compiler concatenates the ASM code you define to code which pushes the receiver on the stack. Assembler code representing the parameters may be included in the assembler code by use of the %load and %store directives. Each specifies the name of a parameter or the receiver me. The %load directive can also be used to load constants of the basic types as shown below.

    asString
      ASM {
        %load "%d".
        IST.
      }

Integers and reals can be specified in a similar manner. If you want a bitstring, precede the double-quoted string with a percent sign. Make sure that your ASM code leaves a result on the stack representing the REPLY object of the message.

Object Format

The interpreter opcodes manipulate pointers to objects. Pointers are defined in C to be

 
    typedef unsigned int utype;

    typedef union {
        int32    Value; 
    #if LSBORDER
        struct
        { utype  IsaPtr    : 1; 
          utype  Type      : 3;     
          utype  Marked    : 1; 
          utype  Loc       :27;     
        } Ptr; 
    #else 
        struct 
        { utype  Type      : 3;	
          utype  Marked    : 1;     
          utype  Loc       :27;     
          utype  IsaPtr    : 1; 
        } Ptr; 
    #endif 
    } DPointer; 

The pointer encodes the Type (Boolean, String, Real, Bit or Pointer), Marked (used by garbage collector), Loc (pointer to where the object resides) and IsaPtr, which is 1 if the pointer is actually a pointer and 0 if it is an integer. The constant LSBORDER is used to compensate for compilers whose layout fields are in a different order in a structure.

All integers are right-shifted by 1 when stored and IsaPtr is set to 0. Thus integers do not take up additional memory although their magnitude is reduced by half from what is possible in the hardware. Booleans are also encoded in the pointer with IsaPtr as 1, Type as Boolean and Loc as 1 if TRUE and 0 if FALSE. NIL is encoded in the pointer with IsaPtr as 1, Type as Pointer and Loc as 0. This encoding speeds computation and reduce storage requirements since integers and booleans can be accessed directly and require no additional storage. All other types of objects have actual storage allocated for their data.

An object is stored in a contiguous area consisting of a four byte header and the data for the object. The header consists of Size and Type fields and some flags used by the system for bookkeeping, recovering unused objects and copying objects. The data will either be a null-terminated string for strings, a non-null-terminated string for Bits (eight bits to a byte), four bytes for a real or a vector of pointers for all other object types. The first pointer of a pointer object is a pointer to its type.

Accessing Objects in C

Macros and functions have been defined to read and write objects to the stack. While this process is simple for the basic types, it becomes a little more complex for other types of objects. Recall that objects are actually implemented as a vector of pointers to the parts. Pointer 0 points to the type of the object and the remaining pointers point to the parts of the object. For example, the parts of Dictionary are defined as

    PARTS 
    { entries:Array
      tally:Integer
    }

Suppose that you CALL an external procedure with the first parameter being a dictionary. You could retrieve the tally with the following C code:

    DPointer *dict,*ptr;
    int tally;

    dict = ptrparm(1); /* get dictionary */
    ptr = GetPtr(dict,2); /* tally is the second part */
    tally = ValtoInt(ptr);  /* convert to integer */

Note that ptrparm and GetPtr are macros which return the address of a DPointer, an actual location in memory. Don't assume that these pointers will stay accurate for a long period of time since the page they are on may be paged out during subsequent pointer accesses. Use them immediately and refresh later if necessary by executing GetPtr again. Also note that ptr must be converted to an integer with the ValtoInt macro. Other macros include GetStr(p) and GetReal(p). You can test if a Boolean pointer is true with the IsTrue(p) and determine if a pointer is NIL with the IsNIL(p). Finally, you can convert an integer to a pointer with InttoVal(i). Don't nest the macros.

Assembler Opcodes

Each opcode is a two or three letter mnemonic which may or may not have a parameter depending on the opcode. The opcodes are divided into categories depending on their use.

Opcodes which are designated as binary opcodes (binary), will pop the top two pointers, perform the operation and push the result. Normally, the operands are pushed in the order they are encountered (e.g. 3 + 4 would have 4 as the top and 3 below it on the stack.) Unary opcodes replace the top with the result of the operation.

Message Opcodes

HLT

Terminates a ABC session. This opcodes copies all pages from the temporary environment to the permanent environment and closes the environment.

RET

Returns from executing a method. The top of the stack will contain the object to return from the method.

SND

Executes (sends) a method. See PUM for stack contents.

EXT

Calls an external method. The stack contents are the same as for SND.

SIG

Signals an error. The stack top is a string which is the reason for the error.

RSG

Resignals a previous error.

TSK

Creates a suspended task from the parameters and method on the stack and pushes the new task object on the stack.

EXE

Transfers execution to top (a task).

NXT

Pushes next ready task.

TRM

Terminates top (a task).

EXP

Export an object(top) to a file(top-1) in object text format.

IMP

Read an object definition from a file (top) and push the newly created object.

TWT

Suspends task (top - 1) to sleep for n (top) milliseconds.

SYS

Executes the C system(s) call where s is a string on the top of the stack.

TIM b

Provides current date if b is true or current time if b is false.

DBG b

Recompiles a method to insert debug code if b is true or remove debug code if b is false.

NOP

Debugger NOP opcode.

BRK

Suspends execution of task and notifies debugger task.

DSN

Performs a debugger send.

Push Opcodes

PUG n

Pushes the nth global variable. "n" is taken from the next byte in the bytecode and is relative to the stack bottom.

PUP n

Pushes the nth typeref of a method. If the typeref already refers to the proper type, the appropriate part is returned. If not, a lookup is performed to find the part.

PUM n

Pushes the nth typeref of a method. If the typeref already refers to the proper type, the appropriate method is returned. If not, a lookup is performed to find the method. A stack frame is ordered as follows:

    top    locals and temps
           old me offset
           old pc
           method
           n
           parameter n
           ... 
           parameter 1
           me
           previous method temporaries

The execution of PUM will push previous method's pc and the offset of the old receiver. The pc will then be set to the beginning of the new method and NILs will be be pushed on the stack for any local variables in the method.

PAR n

Like PUM, but starts the method search in the parent of the receiver.

PPS n

Pushes the nth part of the receiver on the stack.

PUN n

Pushes n NILs on the stack.

PUV n

Pushes the nth variable on the stack. "n" will be negative if a parameter and positive if a local variable.

PUX

Pushes the nth (top) element of object (top-1). "object" may be a string or list of pointers.

PUI

Pushes an integer encoded in the next four bytes of the bytecode.

PU

Pushes a copy of top.

PUB

Pushes the nth (top) of object (top-1). "object" is of type Bits.

PUT

Push TRUE.

PUF

Push FALSE.

PUA

Pushes the register A (global variable 0).

PUE

Pushes the current exception reason.

PUL

Pushes a list of objects onto the stack

PDL

Pushes a dynamic list of objects onto the stack.

PUS string

Pushes the literal "string" on the stack.

PUR real

Pushes an immediate real on the stack.

PBS bitstring

Pushes an immediate bitstring on the stack.

PTY type

Pushes an immediate type on the stack.

Pop Opcodes

PO

Pop and discard the top.

Store Opcodes

The store opcodes copy one or more items of the stack in an object. Note that most store opcodes do not pop the top element of the stack when it is stored.

STG n

Store the top in the nth global variable.

STP n

Stores top in part specified by the nth typeref of method.

SPS n

Stores top in part n of receiver.

STV n

Stores top in variable (parameter or local) n.

STX

Stores value (top) in the nth (top-1) element of object (top-2). "object" may be a string or list of pointers. After STX completes, "value" and "n" will be popped and "object" will be left.

STA

Stores top in register A.

STB

Stores value (top) in the nth (top-1) element of object (top-2). Value should be a boolean and object should be a Bits. After STX completes, "value" and "n" will be popped and "object" will be left.

STL n

Forms the top n objects into a list.

Object Opcodes

NEW

Creates a new object based on the type which on the top of the stack. The top is replaced with the new object.

GTP

Replaces the name on top with the type which has that name.

TP

Replaces top with the type of type.

SZ

Replaces top with its size. Integer, Boolean and Real have size 1. String will have a size which is the number of characters, Bits will have a size which is the number of bytes needed to represent the bitstring, and all other objects will have a size representing the number of pointers, excluding the type pointer.

GR

Shallow copies object (top-1) by n (top) units. The meaning of the units is based on the top of obj. See SZ for details. Both object and n are popped and the new object is pushed.

CL

Makes a deep copy of the top, replacing it on the stack.

DEF

Returns TRUE if top is defined (not NIL), FALSE otherwise.

Bit Opcodes

BTA

Bit AND. (binary)

BTO

Bit OR. (binary)

BTN

Bit NOT. (unary)

BTR

Bit right shift. Shift the bitstring (top-1) to the right n (top) bits.

BTL

Bit left shift. Shift the bitstring (top-1) to the left n (top) bits.

BTE

Bit EXOR. (binary)

BTQ

Bit EQUAL. (binary)

BTC

The number of ON bits in bit string replaces top.

BTS

Converts top to string. It is actually a byte-encoded bitstring.

BTI

Convert top to integer. This is only valid for bitstrings which have less than 32 bits.

Boolean Opcodes

BLA

Boolean AND. (binary)

BLO

Boolean OR. (binary)

BLN

Boolean NOT. (unary)

Branch Opcodes

Branches use a relative offset which is encoded in the next two bytes of the bytecode.

BRT

Branch if top is TRUE. Pop the top.

BRF

Branch if top is FALSE. Pop the top.

BRU

Branch unconditionally.

File Opcodes

A total of 10 files may be open at one time.

FOP

Open file based on top mode ("r","w","a"), name (top-1) and file object (top-2). Pops these three elements and pushed file index.

FRE

Read file based on maximum characters to copy (top) and file id (top-1).

FWR

Write string(top) to fileid (top-1).

FCL

Close fileid (top).

FDL

Deletes a file based on name (top).

FEX

Returns TRUE if file (top) exists.

Integer Opcodes

Integer Comparisons

The integer comparisons are binary operations which compare top and top-1 and push a Boolean result: IEQ, INE, ILT, ILE, IGT, IGE.

IAD

Integer add. (binary)

ISB

Integer subtract (binary)

IML

Integer multiply. (binary)

IDV

Integer divide. (binary)

IAB

Integer absolute value. (unary)

ING

Integer negation. (unary)

IMD

Integer modulus. (binary)

IRE

Converts integer to real. (unary)

IST

Converts top to a string based on format in top-1 (C formatting).

IIN

Increment top.

IDC

Decrement top.

ITB

Convert integer to bitstring. (unary)

Real Opcodes

Real Comparisons

The real comparisons are binary operations which compare top and top-1 and push a Boolean result: REQ, RNE, RLT, RLE, RGT, RGE.

RAD

Real addition. (binary)

RSB

Real subtraction. (binary)

RML

Real multiplication. (binary)

RDV

Real division. (binary)

RAB

Real absolute. (unary)

RNG

Real negation. (unary)

RXP

Real exponentiation. (binary)

RIN

Converts real to integer (truncates). (unary)

RST

Converts top to string based on format top-1.

String Opcodes

String Comparisons

The string comparisons are binary operations which compare top and top-1 and push a Boolean result: SEQ, SNE, SLT, SLE, SGT, SGE.

SCT n

Concatenates n strings on stack and pushes string. "n" is found in the next byte in the bytecode. The strings to concatenate are ordered so that lowest on stack is leftmost and highest is rightmost.

SSB

Make a substring of obj (top-2) starting at i(top-1) for n (top) characters and push on the stack.

SFN

Finds the position of source (top) in pattern (top-1). Returns 0 if not found.

SSP

Finds the first occurrance of an element of a set (top) of characters in given string (top-1).

SUP

Converts a string to upper case.

SLO

Converts a string to lower case.

SCN

Centers a string (top-1) in a field of width (top) characters.

SIN

Converts a string to an integer.

SRE

Converts a string to a real.

SPR

Prints a string to the console. The string is not popped.

SIP

Pushes a string from the console.

SAS

Assembles a string into bytecodes.

SCP

Capitalizes a string.

SPX

Finds prefix(top-2) using string prefix(top-1) starting at position(top).

SST

Formats string(top) according to format(top-1).

Trigonometric and Math Functions

RAC (arccos), RAS (arcsin), RAT (arctan), RCS (cos), RSN (sine), RTN (tan), RCH (cosh), RSH (sinh), RTH (tanh), RLG (log2), RLX (log10), RSQ (sqrt), REX (exp)

Miscellaneous Opcodes

AL1,AL2,AL3

NOP byte alignment for 1,2 or 3 bytes.

BST

Break/step opcode.

BCP

Copy a block for later execution.

BRN

Run a block.

REP

Reply to a message.

CKP

Checkpoint the environment.

BFK

Fork a block.

IVC

Invalidate the message and parts cache.

SYM

Get an environment symbol.

LLC

Get start of line locations for debugger.

SSC

String scan for character in pattern.

ABT

Abort a session.

CMP

Compress a checkpoint file during execution.

QUO

Change the message quota of a running task.

ASM File Format

A special text format is used to represent objects. The table below shows the symbols used to represent the various objects. Spacing and formatting is not normally significant.

Type	Example		Format
Integer	234		optional sign,digits
Real	3.141		optional sign, optional digits, decimal, digits
String	"test"		characters surrounded by double quotes
Bits	%"$backslash3"	string preceded by percent
Boolean	T or F		F(false) or T(true)
Others	(...)		Objects surround by parentheses
NIL	N		NIL pointer
Type	@Real		@ followed by name
Opcodes	{...}		Opcode code format surrounded by curly brackets
Dupl.	*		asterisk
Dup ref	$33		dollar sign followed by number

Most objects, except the basic types, contain other objects and will be surrounded by parentheses. The first pointer of such an object will be a type constant. An example of a method is shown below.

  ( @Method 5 2 
    { BRU EXC.###PPS 2.REP.#
      PUV 0.RET.EXC.RSG.} 
    "size;" "Integer" 
    "sizen
     REPLY Integern
     { REPLY tally.n
     }n"
    4 
    (@Array)
    (@Array)
    @Dictionary F 
  )

It has parts consisting of the Method type pointer, starting PC (5), scope(2), bytecodes (the opcodes between the brackets), the method name(size;), the reply type(Integer),the source code, count of local variables(4), array of variables(empty), array of references(empty), type in which the method is defined (Dictionary) and a Boolean indicating whether the type is a type method (F). The format for the bytecodes is described in the section on the Assembler format.

In some cases, two objects may need to refer to the same object. If so, the first object to refer to the object will define it preceded by an asterisk. Subsequent objects will refer to it with the reference format. The number refers to the nth asterisk which was encountered from the beginning of the object. Duplicates are normally generated by the export function.