Go to the first, previous, next, last section, table of contents.


Scheduling

There is a little complication: Pipelined and superscalar processors, i.e., RISC and some modern CISC machines can process independent instructions while waiting for the results of an instruction. The compiler usually reorders (schedules) the instructions in a way that achieves good usage of these delay slots. However, on our first tries the compiler did not do well on scheduling primitives. E.g., for + implemented as

n=sp[0]+sp[1];
sp++;
sp[0]=n;
NEXT;

the NEXT comes strictly after the other code, i.e., there is nearly no scheduling. After a little thought the problem becomes clear: The compiler cannot know that sp and ip point to different addresses (and the version of gcc we used would not know it even if it was possible), so it could not move the load of the cfa above the store to the TOS. Indeed the pointers could be the same, if code on or very near the top of stack were executed. In the interest of speed we chose to forbid this probably unused "feature" and helped the compiler in scheduling: NEXT is divided into the loading part (NEXT_P1) and the goto part (NEXT_P2). + now looks like:

n=sp[0]+sp[1];
sp++;
NEXT_P1;
sp[0]=n;
NEXT_P2;

This can be scheduled optimally by the compiler.

This division can be turned off with the switch -DCISC_NEXT. This switch is on by default on machines that do not profit from scheduling (e.g., the 80386), in order to preserve registers.


Go to the first, previous, next, last section, table of contents.