Table of Contents
kf90 - compiler driver used with kapf to optimize application performance
kf90 [ switches ] filename
The command invokes
a series of language translators for the compilation of Fortran programs.
For each input file, the translators are executed in the following order
by default:
* KAP high-level optimizer
* compiler
* linker
Execution of one or more of the translators may be inhibited.
Switches which apply to the preprocessor and high-level optimizer are
recognized and passed accordingly. Unrecognized switches are passed to
the compiler only. Linker switches are passed to the linker. Files with
unrecognized extensions will be passed to the compiler untouched.
·- To compile and run your program on a symmetric multiprocessor system,
use the -concurrentize switch, abbreviated -conc, as follows:
·- kf90
-fkapargs='-conc' myprog.f90
When you use the -conc switch with kf90, the
driver sets the compiler's -automatic switch by default to ensure correct
execution of a multithreaded program. Do not override the -automatic switch.
An explanation of the switches kf90 -fkapargs='-conc' sets follows:
-automatic
tells the compiler to allocate local variables on the run-time stack.
-fast sets the compiler optimization level to -O4.
-lkmp_osf or -lkmp_osfp10
chooses the Digital UNIX version-specific KAP parallel processing library.
-lkmp_osfp10 tells the linker to use the KAP parallel processing library
for Digital UNIX Version 4.0. -lkmp_osf tells the linker to use the KAP
parallel processing library for Digital UNIX Version 3.2. The parallel
processing library provides an interface to DECthreads.
-threads or -pthread
chooses the Digital UNIX version-specific DECthreads library. -pthread tells
the linker to use the Digital UNIX Version 4.0 libpthread DECthreads library
when linking the program. -threads tells the linker to include the Digital
UNIX Version 3.2 libpthreads DECthreads library when linking the program.
-call_shared tells the linker to link against the DECthreads shared
library. Digital recommends that you use the -call_shared default; however,
in the case where you want to link the image to the DECthreads archive
library libpthread.a, use the -non_shared switch and one of the Digital
UNIX version-specific procedures explained in the KAP Fortran 90 for Digital
UNIX User Guide.
-tune host tells the compiler to optimize for the architecture
of the host processor.
·- To execute your program on a symmetric multiprocessor
system, set the following three environment variables at runtime:
setenv PARALLEL <integer> setenv KMP_STACKSIZE <integer> setenv
KMP_SPINLOCKS <on/off>
·- Replace PARALLEL <integer> with the number of processors
on the system, for example, setenv PARALLEL 4. If you do not set the PARALLEL
environment variable, KAP defaults to executing with 2 threads.
Do not
specify more parallel threads than there are processors to avoid performance
degradation.
Replace KMP_STACKSIZE <integer> with the stack size in bytes,
for example, setenv KMP_STACKSIZE 100000. KMP_STACKSIZE should be as large
as the largest stack size given in the annotated listing. The default is
1 megabyte.
KMP_SPINLOCKS sets the synchronization mechanism. The default
setting, on, causes KAP to use spinlock synchronization. The off setting
causes KAP to use mutex synchronization. An example is setenv KMP_SPINLOCKS
off.
·- KAP also implements the X3H5 standard produced initially by the
Parallel Computing Forum (PCF) with a set of C*KAP* directives. For information
about these directives refer to the KAP Fortran 90 for Digital UNIX User
Guide.
·- To compile a program with the default KAP settings use the following
command: kf90 myprog.f90 kf90 uses the KAP preprocessor
on myprog.f90, compiles the result with the DEC Fortran 90 compiler and
linker switches: -fast, -non_shared, and -tune host, links with the linker,
and produces the following files:
1. myprog.cmp.f90 - the optimized source
file
2. myprog.out - the annotated source file showing default KAP settings
3. a.out - the executable file
·- To pass one or more KAP command switches
to the KAP preprocessor, use the -fkapargs switch. For example, the following
command optimizes and compiles a program using KAP switches for general
optimization. kf90 -fkapargs='-optimize=5 -roundoff=3 -scalaropt=3 \
-list=myprog_annotated.lis' myprog.f90 The following files result:
1.
myprog.cmp.f90 - the optimized source file
2. myprog_annotated.lis - the annotated
source file renamed with the -list switch
3. a.out - the executable file
·- You can specify Fortran compiler switches and KAP switches on the same
line. For example, to optimize a program with KAP switches and specify
the name of the executable file with the DEC Fortran compiler switch,
-o, use the command: kf90 -fkapargs='-optimize=5 -roundoff=3 -scalaropt=3'
\ -o myprog.exe myprog.f90 The following files result:
1. myprog.cmp.f90
- the optimized source file
2. myprog.out - the annotated source file
3. myprog.exe - the executable file renamed with the -o switch
·- The kf90
command specifies the Digital Fortran compiler switch, -fast, by default.
You can override any of the individual compiler switches encompassed by
-fast by specifying them on the kf90 command line. For example, the following
command sets the compiler switch -math_library accurate and overrides
the default -math_library
noaccurate set by -fast. kf90 -math_library
accurate myprog.f90
·- The kf90 command specifies the DEC Fortran compiler
switch, -tune=host, by default. The -tune=host switch causes the compiler
to optimize to the host architecture. If you want to optimize for the ev5
architecture but are compiling on an ev4 system, override the default
setting of the -tune switch as follows: kf90 -tune=ev5 myprog.f90
·- The kf90 command uses the linker switch -non_shared by default. The -non_shared
switch causes the image to be linked with archive libraries instead of
with shared libraries. To override the -non_shared default, specify -call_shared
on the command line, for example: kf90 -call_shared myprog.f90
·- The kf90 command accepts either Fortran 90 or Fortran 77 source input.
kf90 assumes by default that source files with a file extension of .f90
are free format and source files with a file extension of .f, .for or .FOR
are fixed format. You can override these defaults by setting the -format
switch on either the KAP preprocessor or the DEC Fortran 90 compiler.
See Table 2-1 in the KAP for DEC Fortran 90 User Guide for combinations
of KAP and compiler format switch settings and the resulting assumption
made about the format of the source file.
- - [no ]fkap [='Fortran_kap_path' ]
- Default: -fkap='/usr/bin/kapf'
This switch
inhibits or causes the execution of the KAP Fortran high-level optimizer,
providing the capability of specifying an alternative path.
- -fkapargs='kap_option_string'
-
This switch passes switches to the KAP Fortran high-level optimizer.
- - [no ]f90 [='Fortran_compiler_path' ]
-
Default: -f77='/usr/bin/f90'
This
switch inhibits or causes the execution of the Fortran compiler, providing
the capability of specifying an alternative path.
- -fext=Fortran_file_extension_string
-
Default: -fext=f
Treat files with the indicated extension as Fortran
source files.
- -v
-
Print the commands invoking passes as they execute.
This switch is also passed to the compiler.
- -tmpdir=temporary_directory_path_string
-
Default: -tmpdir=/tmp/
This is the directory to place temporary files.
This switch may also be set by the environment variable TMPDIR.
- -sif [={kap}
]
- -S
-
Default: off
Save intermediate files. Specifying -sif is equivalent
to -sif=kap . Specifying -S is equivalent to -sif=kap and passing -S to
the compiler which saves the assembly language output. Intermediate file
naming conventions are as follows:
K<file>.f - KAP Fortran output file
The
path and switch strings shown above must be enclosed in single or double
quotes if they contain white space characters.
file.f90 - input Fortran
file
file.out - output KAP listing file
file.o - output object file
·- KAP
also implements the X3H5 standard produced initially by the Parallel Computing
Forum (PCF) with a set of C*KAP* directives. For information about these
directives refer to the KAP Fortran 90 for Digital UNIX User Guide.
- -ag=<list>
- Long name: -aggressive=<list>
Default value: -nag
- -nag=<list>
- Long name: -noaggressive=<list>
-Aggressive=a means that kapf90
will pad COMMON blocks in an attempt to avoid cache line collisions. This
assumes the following:
- ·
- All COMMON blocks will be visible to kapf90
in the course of processing the entire file.
- ·
- If the same COMMON block
has two different layouts, these two layouts are fully independent and
do not pass values between each other.
- -arclm=<integer>
- Long name: -arclimit=<integer>
Default value: -arclm=5000
The arclimit switch is used to increase
the size of the dependence arc data structure that kapf90 uses to perform
data dependence analysis. This data structure is dynamically allocated
on a loop-nest by loop-nest basis.
The formula which is used to estimate
the number of dependence arcs for a given loop nest is:
dependence_array_size=max(#_of_statements
* 4, arclimit value)
This is an estimate because kapf90 is assuming
that each statement, in the worst case, would have 4 dependence arcs.
- -a=<list>
- Long name: -assume=<list>
Default value: -a=cel
- -nas
- Long name:
-noassume
list can contain the following characters:
- a
- Allow multiple
aliasing
- b
- Allow array bounds violation
- c
- Constant arguments are
assigned to temporaries in procedure and function calls
- e
- Equivalenced
variables do not refer to same memory location inside one DO loop nest
- l
- Last value assignments are necessary
To disable all the above
assumptions, give -noassume on the command line.
- -chl=<integer>[,<integer>]
- Long name: -cacheline=<integer>[,<integer>]
Default value: -chl=32,32
The
cacheline switch informs kapf90 of the width of the memory channel
in bytes between cache and main memory. Cacheline can take a second argument.
When two arguments are specified, the first argument gives the width
of the memory channel between the primary cache and the secondary cache,
and the second argument gives the width of the memory channel between
the secondary cache and main memory. Omitting the second argument, or
specifying it as 32 (the default), instructs KAP not to optimize secondary
cache usage.
- -cplc=<integer>
- Long name: -cache_prefetch_line_count=<integer>
Default value: -cplc=0
The cache_prefetch_line_count switch gives the
number of additional lines prefetched into the cache during a cache miss.
- -chs=<integer>[,<integer>]
- Long name: cachesize=<integer>[,<integer>]
Default
value: -chs=8,0
The cachesize switch informs kapf90 of the size in
kilobytes of the cache memory.
When two arguments are specified, the
first argument gives the size of the primary cache, and the second argument
gives the size of the secondary cache. Omitting the second argument, or
specifying it as 0 (the default), tells KAP not to optimize secondary
cache usage.
- -cmp [=<file> ]
- Short name: -cmp [=<file> ]
- -nocmp
- Short name:
-ncmp
Default value: <file>.cmp.f90, <file>.cmp.f
The -cmp switch causes KAP
to save the optimized source program under the file name of your choice.
The kf90 default names the optimized source <file>.cmp.f90 when the souce
file extension is .f90. If the optimized source has a file extension of
.f, .for, or .FOR, the default is to name the optimized source <file>.cmp.f.
The kapf90 default is to name the optimized source program <file>.cmp,
regardless of the input file extension. Because the Fortran 90 compiler
will not process a file with the default .cmp extension, you should override
the default. For example, use the -cmp switch in the kapf90 command
line to rename the optimized source <file>.cmp.f90.
Both kf90 and kapf90
place the optimized source file in the current directory. To disable
generation of the optimized Fortran output file, enter -nocmp on the command
line.
- -conc
- Long name: -concurrentize
Default value: -noconc
- -noconc
- Long name: -noconcurrentize
The concurrentize switch directs KAP to
restructure the source code for parallel processing.
Setting -noconcurrentize
disables parallel execution and allows all serial optimizations to take
place. You can enable and disable parallel execution on a module by module
basis using KAP directives or on a loop by loop basis using KAP assertions.
Programs containing many loops which require synchronization or programs
that have loops with small iteration counts may run more slowly when parallelized.
In these cases you should disable parallel execution.
- -cp=<list>
- Long name:
-cmpoptions=<list>
Default value: -cp=n
- -ncp
- Long name: -nocmpoptions
The cmpoptions switch specifies optional additional information or formatting
for inclusion in the transformed code, file, .cmp .
- i
- Insert special
numbers that reference the original code
- n
- Create transformed code
from internal data structures
Specifying -cmpoptions=n instructs kapf90
to create the transformed code from its internal data structures. Specifying
-nocmpoptions will instruct kapf90 to use lines from the source file,
where feasible. Using the internal data structures for the code will provide
consistent indentation and formatting but also all new labels and other
changes from the source code. This may make relating source and transformed
code more difficult.
Special line numbers are # line comments which may
appear in the transformed program file in order to reference line numbers
of the original code. The line in the transformed code that immediately
follows a # line comment is either the transformed version of the line
in the original code that is referenced, or a line which kapf90 inserted
before the referenced line. The name of the source file from the command
line is included in the form it had on the kapf90 command line.
- - [n ]ds
- Long name: - [no ]datasave
Default value: -ds
The datasave switch
instructs kapf90 to treat local variables in a subroutine or function
which appear in DATA statements as if they were also in SAVE statements.
That is, their values will be retained between invocations of the subroutine
or function. This is the practice of many commercial Fortran compilers.
This choice affects certain optimizations performed by kapf90. The nodatasave
switch complies with the Fortran-77 standard.
- -dr=<list>
- Long name: -directives=<list>
Default value: -dr=ak
- -ndr
- Long name: -nodirectives
The directives
switch controls which directives are accepted by kapf90. <list> can contain
the following characters:
- a
- kapf90 assertions are accepted
- k
- kapf90
!*$* or *$* directives are accepted
- v
- VAST CVD$ directives
Setting
-nodirectives disables the acceptance of all directives.
- - [n ]dl
- Long
name: - [no ]dlines
Default value: -ndl
The dlines switch allows a D
in column 1 to be treated like a character space. The rest of that line
is then parsed as a normal Fortran statement. By default, kapf90 treats
these lines like comments. This switch is useful for the inclusion or exclusion
of debugging lines.
- -dpr=<integer>
- Long name: -dpregisters=<integer>
Default
value: -dpr=32
The dpregisters switch specifies the number of DOUBLE
PRECISION registers each processor has.
- -eiifg=<integer>
- Long name: -each_invariant_if_growth=<integer>
Default value: -eiifg=20
When a loop contains an IF statement whose
condition does not change from one iteration to another, loop invariant
, the same test must be repeated for every iteration. The code can often
be made more efficient by floating the IF outside the loop and putting
the THEN and ELSE sections into their own loops.
This gets more complicated
when there is other code in the loop, since a copy of it must be included
in both the THEN and ELSE loops. The total amount of additional code generated
in a program unit through invariant IF floating can be limited with the
max_invariant_if_growth switch.
- -escape
- Long name: - [no ]escape
Default
value: -escape
The -escape switch causes KAP to scan escape characters
in input lines.
- -fpr=<integer>
- Long name: -fpregisters=<integer>
Default
value: -fpr=32
The fpregisters switch specifies the number of single
precision registers, such as ordinary floating point, each processor
has.
- -ff
- Long name: -freeformat
Default value: -nff
The freeformat
switch removes the standard column restrictions for Fortran source code.
For example, source files can be up to 132 columns and use an ampersand
(&) at the end of the line to indicate continuation. See the Fortran Language
Reference manual for more information.
Setting -freeformat=f90 allows
KAP to accept Fortran 90 conventions and extensions. Continuation lines
are indicated with an ampersand (&) as the first character of the continuation
line.
The -freeformat switch is off by default, and the usual Fortran
90 conventions apply. For example, files are truncated after column 72
unless you specify the DEC Fortran 90 flag -extend_source . A character
(except a zero or a blank) in column 6 indicates a continuation line.
- -fuse
- Long name: -fuse
Default value: -nofuse
The fuse switch tells KAP
to perform loop fusion. Loop fusion is a conventional compiler optimization
that transforms two adjacent loops into a single loop. Data dependence
tests allow fusion of more loops than standard techniques allow. Before
KAP can perform loop fusion, you must specify the switch -scalaropt = 2
or -optimize = 5 .
- -fuselevel
- Long name: -fuselevel
Default value: -fuselevel=0
The fuselevel=1 switch causes KAP to attempt loop fusion after making
additional passes through the source program to gather information about
data dependencies. To activate fuselevel=1 , you must also use the fuse
switch.
The default is fuselevel=0 . The effect of fuselevel=0 is equivalent
to setting the fuse switch.
- -generateh
- Default value: off
KAP automatically
sets the -generate switch for you. Digital recommends that you do not
set the -generateh switch.
KAP needs two passes to resolve Fortran 90
forward declarations. The first pass, the generateh pass, builds the information
needed to analyze the program for forward references.
- -hdir=directory_name
- Default value: -hdir=current_directory
The -hdir=directoryname switch
specifies the name of the directory where the KAP -generateh pass stores
the temporary files containing information about forward references. The
-useh switch picks up the information from that directory. The default
is the current directory.
KAP automatically sets the -hdir switch for
you. Digital recommends that you do not set the -hdir switch.
- -heap =<integer>
- Long name: -heaplimit =<integer>
Default value: -heaplimit=116
KAP may
require large amounts of memory in order to processes your source code.
The -heaplimit option specifies the maximum size in megabytes that the
KAP heap can grow. If this limit is breached, KAP will stop processing
your source code and try to exit with an ``out of memory'' error message.
If you choose a -heaplimit setting that is greater than the amount of
memory that your machine has available, KAP may run out of memory before
it reaches the -heaplimit .
KAP relies upon the operating system to tell
it that the process has run out of memory before that problem occurs. Some
operating systems kill KAP without first telling KAP that there is insufficent
memory. In that case, KAP may stop processing your code and exit in an
undefined manner. Using -heaplimit makes a graceful exit more likely.
- -hli=<integer>
- Long name: -hoist_loop_invariants=<integer>
Default value: -hli=1
The
hoist_loop_invariants switch controls code hoisting of loop-invariant
expressions from loops. Note that this switch is independent of the switches,
each_invariant_if_growth and max_invariant_if_growth, that control the
floating of invariant-IFs out of loops. The possible settings for hoist_loop_invariants
are the following:
0 -- Turns off the hoisting of invariant code from loops.
1 -- Floats all loop invariant expressions not under the control of an
IF-structure within the given loop nest.
- - [n ]ig
- Long name: - [no ]ignoreoptions
Default value: -nig
The ignoreoptions switch allows the user to direct
kapf90 to ignore a !*$*OPTIONS or *$*OPTIONS line at the beginning of
a file, thereby having the command line switches override the options
card. The default is to accept the command line switches specified on the
!*$*OPTIONS line.
- -inc=<path name>
- Long name: -include=<path name>
Default
value: -off
The include switch allows the user to specify an alternate
directory for locating the files specified in INCLUDE statements.
An include
file whose name does not begin with a slash (/) is sought first in the
directory containing the file containing the INCLUDE statement or directive,
then in the directory named in the include switch.
- -inl [=<names> ]
- Long
name: -inline [=<names> ]
Default value: off
- -ninl=<names>
- Long name: -noinline=<names>
Default value: off
- - [no ]interchange
- Long name: -interchange
Default
value: -interchange
- -ninterchange
- Long name: -nointerchange
Use the interchange
switch to enable or disable loop interchanging. KAP enables loop interchange
when -interchange is specified and the -optimize level is at least 1 or
the -scalaropt level is 3. If you specify -nointerchange , KAP disables
loop interchange regardless of the -optimize or -scalaropt levels. Loop
interchanging is enabled by default.
- -intl
- Long name: -interleave
Default
value: -interleave
- -nintl
- Long name: -nointerleave
The -interleave switch
controls loop unrolling and rescheduling. Interleaved unrolling can help
the compiler recognize quad-word loads and stores, which are more efficient
than ordinary loads and stores. It does this by first unrolling the loop
as in ordinary loop unrolling. Second, the statements in the loop are interchanged
where possible to make references to the same array adjacent to each other.
Interleaved unrolling can be demonstrated by the example below:
real A(100),B(100)
do I = 1, 100
A(i) = 99.
B(i) = 100.
enddo
print *,a,b
end
The output from KAP with interleaved unrolling turned on, -interleave
, is:
real A(100), B(100)
do I=1,97,4
A(I) =
99.
A(I+1) = 99.
A(I+2) = 99.
A(I+3) = 99.
B(I)
= 100.
B(I+1) = 100.
B(I+2) = 100.
B(I+3) = 100.
end do
print *, A, B
end
The code produced with
interleaved unrolling turned off, -nointerleave , is
real A(100),
B(100)
do I=1,97,4
A(I) = 99.
B(I) = 100.
A(I+1) = 99.
B(I+1) = 100.
A(I+2) = 99.
B(I+2) = 100.
A(I+3) = 99.
B(I+3) = 100.
end do
print *,
A, B
end
The default value is -interleave .
- -ipa [=<names> ]
- Long
name: -ipa [=<names> ]
Default value: off
- -nipa=<names>
- Long name: -noipa=<names>
Default value: off
The inline switch provides kapf90 a list of routines
to analyze. If the switch is given without an argument list, kapf90 will
try to inline/analyze all the called functions in the inlining universe
specified by the inline_from../ipa_from.. switches. If a list of names is
included, for example, -inline=mkcoef,yval , then just the routines named
will be inlined/analyzed. Additionally, -ipa causes KAP to give information
in the annotated listing about appropriate settings for the -ind , -inll
, and -ipall switches on a loop by loop basis.
The no forms instruct
kapf90 to inline/analyze all routines except those in the list. The list
is required.
- -inlc=<names>
- Long name: inline_and_copy=<names>
Default value:
off
The inline_and_copy command line switch functions like the inline
switch, except that if all CALLs or references to a subprogram are inlined,
the text of the routine is not optimized but is copied unchanged to the
transformed code file. This is intended for use when inlining routines
from the same file as the call. Inline_and_copy has no special effect
when the routines being inlined are taken from a library or another source
file.
After a subprogram has been inlined everywhere it is used, leaving
it unoptimized saves compilation time. When a program involves multiple
source files, the unoptimized routine will still be available in case
one of the other source files contains a reference to it, so no errors
will result.
Note: The inline_and_copy algorithm assumes that all CALLs
and references to the routine precede it in the source file. If the routine
is referenced after the text of the routine, and that particular call
site cannot be inlined, the unoptimized version of the routine will be
invoked.
- -incr[=<file> ]
- Long name: -inline_create[=<file> ]
Default value:
off
- -ipacr [=<file> ]
- Long name: -ipa_create [=<file> ]
Default value: off
The inline_create and ipa_create switch instruct kapf90 to build a
library file containing partially analyzed routines for later inlining.
The library created is used with the inline_from_libraries or ipa_from_libraries
switch. Libraries created with inline_create can be used with either
inlining or interprocedural analysis, since they contain essentially complete
descriptions of the functions included. Libraries created with ipa_create
can be used only with interprocedural analysis, since they do not have
the complete text of the functions--just the data relationships information.
Any filename can be used for the library name. An extension .klib is
preferred for maximum compatibility with the ...from_libraries switches.
If either of these switches is given without a file name, the created
library is named <file>.klib , where <file> is the source file name with any
trailing .f , .ftn , or .for removed.
- -ind=<integer>
- Long name: -inline_depth=<integer>
Default value: -ind=2
The inline_depth switch sets the maximum level
of subprogram nesting which kapf90 will attempt to inline. Higher values
instruct kapf90 to trace CALLs and function references further. The values
and their meanings are:
- 1-10
- Inline routines to this depth.
- 0
- Use
the default value.
- -1
- Inline only routines which do not contain subroutine
CALLs or function references.
The !*$*[no]inline directive, when enabled,
is not affected by the inline_depth restrictions.
- -ipad=<integer>
- Long
name: -ipa_depth=<integer>
Default value: -ipad=2
The ipa_depth switch
sets the maximum level of subprogram nesting which kapf90 will attempt
to analyze. Higher values instruct kapf90 to trace CALLs and function
references further. The values and their meanings are:
- 1-10
- Analyze
routines to this depth.
- 0
- Use the default value.
- -1
- Analyze only
routines which do not contain subroutine CALLs or function references.
The !*$*[no]ipa directive, when enabled, is not affected by the ipa_depth
restrictions.
-inff=<file>,<file> Long name: -inline_from_files=<file>,<file>
Default value: current source file
- -ipaff=<file>,<file>
- Long name: -ipa_from_files=<file>,<file>
Default value: current source file
- -infl=<file>,<file>
- Long name: -inline_from_libraries=<library>,<library>
Default value: off
- -ipafl=<library>,<library>
- Long name: -ipa_from_libraries=<library>,<library>
Default value: off
The .._from_.. switches provide kapf90 with the locations
of functions available for inlining/interprocedural analysis. The total
set of available functions is called the inlining or IPA universe .
The
.._from_files switches take the names of source files and directories containing
source files. Including a directory, for example, -ipaff=/usr/ipalib is
equivalent to the UNIX notation /usr/ipalib/*.c . Do not use shell wild
card characters in the list of files and directories.
The .._from_libraries
switches take the names of libraries created with the .._create switches
and directories containing such libraries. In directories, the kapf90
libraries are identified by the extension .klib .
Multiple files/libraries
or directories may be given in one .._from_.. switch, separated by commas.
Multiple .._from_.. switches may be specified on the command line.
- -inll=<integer>
- Long name: -inline_looplevel=<integer>
Default value: -inll=2
- -ipall=<integer>
- Long name: -ipa_looplevel=<integer>
Default value: -ipall=2
The .._looplevel
switches enable the user to limit inlining to just functions which are
referenced in nested loops where the effects of reduced function call
overhead or enhanced optimizations will be multiplied.
The parameter is
defined from the most deeply nested function reference. For example, -inll=1
restricts inlining to functions referenced in the deepest loop nest.
-inll=3 restricts inlining to those routines referenced at the three deepest
levels. The FOR loop nest level of each function reference is included
in the optional calling tree section of the listing files.
The !*$*[NO]INLINE
and !*$*[NO]IPA directives, when enabled, are not affected by the looplevel
restrictions.
- -inm
- Long name: -inline_manual
Default value: off
- -ipam
- Long name: -ipa_manual
Default value: off
The inline_manual and ipa_manual
switches instruct kapf90 to recognize the !*$*ASSERT [NO]IPA directives.
This allows manual control over which functions are inlined/analyzed
at which call sites.
The default is to ignore these directives. They are
enabled when any inlining (IPA) switch is given on the command line. When
-inline_manual or -ipa_manual is included on the command line, the !*$*INLINE
or !*$*IPA directives are enabled without enabling the automatic inlining
algorithms. Since !*$*[NO]INLINE and !*$*[NO]IPA override the -inline=/-ipa=,
-inline_depth, and -.._looplevel command line switches, they can be used
along with command line control to select routines or call sites which
the regular selection algorithm would reject or to prevent specific routines
or CALL sites from being inlined/analyzed.
- -inline_optimize=<integer>
- Long
name: -inline_optimize=<integer>
Default value: -inline_optimize=0
- -ipa_optimize=<integer>
- Long name: -ipa_optimize=<integer>
Default value: -ipa_optimize=0
The
inline_optimize=<integer> and ipa_optimize=<integer> switches aid in optimizing
large codes. These switches cause other KAP switches to be set depending
on the value you replace for <integer> as follows:
- 0
- -noipa , -noinline
- 1
- -ipa , -inline
- 2
- -ipa , -inline , -[ipa,inline]_loop_level=3 ,
-[ipa,inline]_depth=10 , -heaplimit=500 , -noarclimit
- 3
- -ipa , -inline
, -[ipa,inline]_loop_level=10 , -[ipa,inline]_depth=10 , -heaplimit=500 ,
-noarclimit
- -i=<file>
- Long name: -input=<file>
Default value: whatever
filename is given on the command line.
For some mainframe operating systems,
i=file may be required to specify the input file name. For other operating
systems, simply list the filename on the command line. When input is specified
without a file name, kapf90 reads the source code from standard input
.
- -int=<integer>
- Long name: -integer=<integer>
Default value: -int=4
The
integer switch specifies a size in bytes, N, for the default size of
INTEGER variables. When N=2 or 4, take INTEGER*N as the default INTEGER
type. When N=0, use the ordinary default length for INTEGER variables.
- -intlog
-
Default value: -intlog
The intlog switch enables the mixing
of integer and logical operands in expressions. When integer operands are
used with logical operators, the operations are performed in a bitwise
manner. When logical operations are used with arithmetic operators, the
operands are treated as integers.
- -lc=<name>
- Long name: -library_calls=<name>
Default value: off
The library_calls qualifier directs kapf90
to replace sections of code with calls to standard numerical library
routines which have the same functionality. This can simplify the source
code, and if a version of the library which has been highly tuned for
the target machine is available, the use of the standard package will
improve performance of the application program. For example, if you specify
this switch and you link the application with the Digital Extended Math
Library (DXML), calls to the DXML Basic Linear Algebra Subroutines (BLAS)
will replace sections of code. Use the following command: kf90 -fkapargs='-lc'
-ldxml myprog.f90
The argument for library_calls identifies which library
to create CALLs for. The DXML BLAS libraries are: blas1 which performs
vector-vector operations such as dot product, blas2 which performs matrix-vector
operations such as matrix vector multiplication, and blas3 which performs
matrix-matrix multiplication. To specify both blas1 and blas2 , specify
blas12 . To specify both blas2 and blas3 , specify blas23 ; this is the
recommended switch. Specifying blas is equivalent to specifying blas23
. This switch can be disabled within a section of code with the C*$* optimize=o
directive. This switch is disabled if -roundoff=0 .
CAUTION: This switch
will introduce calls to BLAS routines to be linked from system libraries.
Use of this switch can cause a collision between KAP generated BLAS routine
names and user-provided routines in the source code. Even if the user-provided
routines are identical in function to the library routines, rename or
remove the user routines, since the linker will not use the optimized
library routines if the user's calls to routines can be satisfied with
the user-provided routines.
- -lm=<integer>
- Long name: -limit=<integer>
Default
value: -lm=20000
In order to reduce the compile time, kapf90 estimates
how long it spends analyzing each loop nest construct. If a loop is too
deeply nested, kapf90 ignores the outer loop and recursively visits the
inner loops. The loop nest limit is a rough dial to control what kapf90
considers too deeply nested. For further information, refer to the KAP
Fortran 90 for Digital UNIXtm User's Guide.
- -ln=<integer>
- Long name: -lines=<integer>
Default value: -ln=55
The listing generated by kapf90 is paginated for
printing on a line printer. The number of lines per page on the listing
may be changed using the -lines= switch. The -lines=0 switch directs kapf90
to paginate at subroutine boundaries.
- -l=<file>
- Long name: -list=<file>
- -nl
- Long name: -nolist
Default value: -nl
The list switch allows kapf90
to generate an annotated listing of the user's program. On most systems,
the default name of the listing file is derived from the input file name;
however, on some systems the listing file name must be explicit. If -list=file
is specified, the listing is written to that file. To disable generation
of the listing file, enter -nolist on the command line.
- -lw=<integer>
- Long
name: -listingwidth=<integer>
Default value: -lw=132
The listingwidth
switch sets the maximum line length for the listing file produced by
kapf90. This setting affects the format of the loop summary table, -listoptions=l
, and kapf90 switches table, -listoptions=k . The fixed setting, 132,
is optimal for most line printers. At present, no other values are allowed.
- -kind=<integer>
- Long name: -kind=<integer>
Default: -kind=4
The kind switch
establishes the value for the Fortran 90 KIND type parameter used when
KIND has not been specified or KIND=0 is specified. kind applies to all
data types: logical, integer, real, and complex. The values for -kind are
4 or 8 with 4 being the default. The kind switch allows you to change
the underlying precision of compuations without violating the Fortran
90 standard constraints that default logical, default integer and default
real occupy the same amount of storage and that default double precision
and default complex occupy twice the storage of default real.
- -lo=<list>
- Long name: -listoptions=<list>
Default: -lo=o
The listoptions switch
tells kapf90 what information to include in the listing and error files.
<list> can contain the following characters:
- c
- Calling tree at end of
program listing
- k
- kapf90 switches used are printed at the end of each
program unit
- l
- A loop-by-loop optimization table
- n
- Program unit
names as processed to error
- o
- An annotated listing of the original
program
- p
- Compilation performance statistics
- s
- A summary of the
optimizations performed
- t
- Annotated listing of transformed program
- -log=<integer>
- Long name: -logical=<integer>
Default value: -log=4
The
logical switch specifies a size in bytes, N, for the default size of
LOGICAL variables. When N=1, 2, or 4, take LOGICAL*N as the default LOGICAL
type. When N=0, use the ordinary default length for LOGICAL variables.
- -mc=<integer>
- Long name: -minconcurrent=<integer>
Default value: -mc=1700
The minconcurrent switch sets the level of work in a loop above which
KAP executes the loop in parallel. The range of values for this switch
is all numbers greater than or equal to 0. The higher the minconcurrent
value, the more iterations and/or statements the loop body must have to
run concurrently.
Executing a loop in parallel incurs overhead that varies
with different systems. If a loop has little work, the overhead required
to set up parallel execution may make the loop execute more slowly than
it would using serial execution. At compilation time, KAP estimates the
amount of work inside a loop on the basis of loop computations and loop
iterations. KAP multiplies the loop iteration count by the sum of the noindex
operands/results and the nonassignment operators. KAP compares its estimation
with the minconcurrent value. If the estimated amount of work is greater
than the minconcurrent value, KAP generates parallel code for the loop.
Otherwise, the loop execution is serial. This is called a two-version
loop. If the DO loop bounds are known at compilation time, KAP computes
the exact iteration count. However, if the DO loop bounds are unknown,
KAP generates a block IF around the parallel code. The block IF allows
a runtime decision whether or not to execute the loop in parallel.
To
disable the generation of two-version loops throughout the program, use
the command line switch minconcurrent=0 . To disable this action in specific
DO loops, use the minconcurrent directive.
The minconcurrent switch
automatically executes the concurrentize switch.
- -ma=<list>
- Long name:
-machine=<list>
Default value: -ma=s
<List> is one of three of:
- n
- Prefer
optimization of non-stride-1 loops.
- o
- Do not parallelize innermost loops
when optimizing. Parallelize only outermost loops.
- s
- Prefer optimization
of stride-1 inner loops.
- -miifg=<integer>
- Long name: -max_invariant_if_growth=<integer>
Default value: -miifg=500
When a loop contains an IF statement whose
condition does not change from one iteration to another, loop invariant
, the same test must be repeated for every iteration. The code can often
be made more efficient by floating the IF outside the loop and putting
the THEN and ELSE sections into their own loops.
This gets more complicated
when there is other code in the loop, since a copy of it must be included
in both the THEN and ELSE loops. The max_invariant_if_growth switch allows
the user to limit the total number of additional lines of code generated
in each program unit through invariant IF restructuring .
This can be
controlled on a loop-by-loop basis with the !*$*MAX_INVARIANT_IF_GROWTH
(<integer>) directive. The maximum amount of additional code generated in
a single loop through invariant IF floating can be limited with the each_invariant_if_growth
switch.
- -namepart=<integer><integer>
- Long name: -namepartitioning=<integer><integer>
Default value: -nonamepart
This switch tells KAP to look at distinct
array names and limit the number of arrays that appear in a loop to avoid
cache thrashing. That is, this switch breaks a loop containing, for example,
references to arrays A and B into two loops. One loop references array
A and the other loop references array B.
Two arguments (i and j) used
in a -namepartitioning=i,j switch, control name partitioning as follows:
i --- specifies the minimum number of partitions. This is preferred smallest
number of distinct arrays in each distributed loop.
j --- specifies the maximum
number of partitions. This is preferred largest number of distinct arrays
in each distributed loop.
If no arguments appear with the -namepartitioning
switch, KAP uses its default values of 2 for the minimum and 8 for the
maximum number of partitions.
Before KAP can perform name partitioning,
you must specify the switch -scalaropt=n where n is greater than or equal
to 3.
The -nonamepartitioning switch explicitly prevents name partitioning.
- -nat[=<list> ]
- Long name: -natural[=<list> ]
Default value: -nat
- -nnat
- Long
name: -nonatural
The natural switch selects between natural alignment,
such as REAL*8 entities will always start on double-word boundaries, or
non-alignment of data elements in COMMON blocks.
Natural alignment specifies
that variables and arrays in COMMON blocks will start on boundaries which
correspond to their size. Items which take up two words, such as COMPLEX
arrays, will start on double-word boundaries; single-word items, such as
REAL variables, will start on word boundaries; half-word items, such as
INTEGER*2 variables, will start on half-word boundaries. The natural alignment
can improve program speed by making memory access simpler.
This optimization
is safe when:
- ·
- All COMMON blocks will be visible to kapf90 in the
course of processing the source file.
- ·
- If the same COMMON block has
two different layouts, the different layouts do not pass data between
them as they are fully independent.
The default, nonatural , causes variables
and arrays to be packed tightly into COMMON blocks. This can reduce memory
usage but slow the program.
- - [n ]1
- Long name: - [no ]onetrip
Default
value: -n1
The onetrip switch allows the user to specify one-trip DO
loops. Many pre-Fortran-77 compilers implemented DO loops which were always
executed once, even if the loop index initial value was higher than the
final value. This switch informs kapf90 that the DO loops in the file
being processed assume this feature.
- -o=<integer>
- Long name: -optimize=<integer>
Default value: -o=5
The optimize switch sets the optimization level,
ranging from 0 to 5. The meanings of levels are as follows:
- 0
- No optimization
performed
- 1
- Only simple analysis and optimization performed
Induction
variables recognized
DO loop interchanging techniques applied
- 2
- Lifetime
analysis performed
More powerful data dependence tests performed
-
3
- More loop interchanging performed
Special case data dependence tests
performed
Wraparound variables recognized
- 4
- Loop interchanging
around reductions
More exact data dependence tests performed
- 5
- Array expansion enabled
The enter gate , exit gate , and independent
directives will be generated.
- -pio
- Long name: -parallelio
Default value:
-nopio
The parallelio switch allows parallel execution of loops with
I/O. Use this switch when you know the I/O will not execute. An example
is a test for an error condition that causes a message to be printed.
- -rl=<integer>
- Long name: -real=<integer>
Default value: -rl=4
The real
switch tells KAP what the Fortran 90 compiler default size for REAL variables
is in bytes, N, where REAL*N can be 4 or 8. To change the default size
of REAL variables, for example, from 4 to 8, first, set the Fortran 90
compiler switch -r=8 . Next, tell KAP the new size with the -real=8 switch.
- -r=<integer>
- Long name: -roundoff=<integer>
Default value: -r=3
The roundoff
switch allows the user to specify the change from serial roundoff error
that is tolerable. If an arithmetic reduction is accumulated in a different
order than in the scalar program, the roundoff error is accumulated differently
and the final result may differ from that of the original program's output.
While the difference is usually insignificant, certain restructuring transformations
performed by kapf90 must be disabled in order to obtain exactly the
same results as the scalar program. These transformations, referenced below,
are discussed in Chapter 7.
kapf90 classifies its transformations by
the amount of difference in roundoff error that can accumulate so the
user can decide what level of roundoff error differences is allowable.
The roundoff command line switch has the values 0 to 3.
The meaning of
each roundoff level is as follows. Each level is cumulative, performing
what is listed below for that level in addition to what is listed for
the previous levels. Meanings of these levels are as follows:
- 0
- No
roundoff-changing transformations
- 1
- Expression simplification and code
floating enabled
Arithmetic reductions recognized
Loop interchanging
around arithmetic reductions allowed if optimize >= 4
Loop rerolling
if scalaropt >= 1
- 2
- Reciprocal substitution performed to move an expensive
division outside of loop
- 3
- Recognize real induction variables if scalaropt
>= 2 or optimize >= 1
Memory management enabled if -scalaropt = 3
Expressions
such as A / B / C can be rotated to A / (B * C)
- -rt=<routine_name>[,<routine_name>...]
- Long name: -routine=<routine_name>[,<routine_name>...]
Default value: -noroutine
The routine switch allows you to specify switches that apply only to
specific routines within the source file KAP possesses. The only switches
that routine can specify are:
-each_invariant_if_growth
-max_invariant_if_growth
-optimize
-roundoff
-scalaropt
-skip
-unroll
-unroll2
-unroll3
Place
the routine switch after the name for the DEC Fortran source file. <routine_name>
must be a routine in the the source file.
- -sv=<list>
- Long name: -save=<list>
Default value: -sv=manual_adjust
The save switch instructs kapf90
whether or not to perform live variable analysis to determine if the
value of a local scalar variable in a subroutine or function needs to
be saved between invocations of the routine being processed. SAVE statements
will be generated for any variables requiring them. kapf90 will not delete
or ignore a SAVE statement coded by the user.
Saving local variables may
be required for correct execution of the program but can restrict kapf90
optimizations.
With -save=manual , kapf90 assumes that the user has inserted
the necessary SAVE statements into the code and performs no corresponding
analysis of its own. The user-written SAVE statements are assumed to be
correct and sufficient.
Specifying -save=all tells kapf90 that all routine-local
variables and COMMON blocks are retained between invocations. This is as
if all variables and COMMON blocks were in SAVE statements.
- -so=<integer>
- Long name: -scalaropt=<integer>
Default value: -so=3
The !*$*SCALAR
OPTIMIZE directive sets the level of serial transformations performed.
Unlike the scalaropt switch, the !*$*SCALAR OPTIMIZE directive sets
the level of loop-based optimizations only, such as loop fusion, and not
straight code optimizations, and dead code elimination.
The levels and
their optimizations are:
- 0
- No scalar optimizations performed
- 1
- IF loops changed into DO loops
Simple code floating out of loops performed
Inaccessible or unused code removed
Forward substitution of variables
performed
Dusty deck IF transformations enabled
- 2
- Full range of scalar
optimizations enabled
Invariant IFs floated out of loops
Induction variable
recognition
Loop rerolling if roundoff >= 1
Loop unrolling, loop peeling,
loop fusion
- 3
- Memory management performed if -roundoff = 3
Additional
dead code elimination performed during output conversion
- -scan=<integer>
- Long name: -scan=<integer>
Default value: -scan=72
The scan switch allows
the user to set the length of the Fortran input lines. kapf90 will ignore
and treat as a comment characters on columns beyond the value of the scan
switch. The values must be 72, 120, or 132.
- -sasc=<integer>
- Long name:
-setassociativity=<integer>
Default value: -sasc=1
The setassociativity
switch provides information on the mapping of physical addresses in main
memory to cache pages. The default, 1 , says that a datum in main memory
can be placed in only one place in cache. If this cache page is in use,
it will have to be rewritten or flushed in order to copy the newly accessed
page into cache.
- -skip
-
Default value: -noskip
The skip switch tells
kapf90 to ignore the specified routines. For example, the command:
kapf90
program.f90 -skip=temp_sub_1 -skip=temp_sub_2
tells KAP to process all
the program units in DEC Fortran source file program.f90 except for temp_sub_1
and temp_sub_2.
- -srlcd
-
Default value: -nosrlcd
The srlcd switch tells
kapf90 to remove loop-carried dependencies. KAP holds in temporary scalar
array values read or written across multiple loop iterations. Faster temporary/register
accesses replace slower memory accesses in the loop body.
Srlcd stands
for Scalar Replacement of Loop Carried Dependencies.
Before KAP can remove
loop-carried dependencies, you must specify the switch -scalaropt where
n is greater than or equal to 2.
- -su=<list>
- Long name: -suppress=<list>
Default
value: no suppression
kapf90 produces several types of messages that
range from syntax warning and error messages to messages about the optimizations
performed. Use the switches below to disable the following classes of messages:
- d
- Data dependence messages
- e
- Syntax error messages
- i
- Informational
messages
- n
- Not optimized messages
- q
- Questions
- s
- Standardized
messages
- w
- Syntax warning messages
- -sy=<list>
- Long name: -syntax=<list>
Default value: accepts all dialects listed below
The syntax switch
directs kapf90 as to whether to check for compliance with certain syntactic
rules. The default is to accept the superset of the ANSI Fortran 77 standard
defined by DEC Fortran, which includes many common Fortran 77 extensions.
The syntax settings are as follows:
- a
- Checks for strict compliance
with ANSI standard. Warning and error messages are issued for syntax which
does not conform to the standard.
- v
- Accepts the extensions and interpretations
of DEC Fortran
- f90
- Checks for strict compliance with the ANSI Fortran
90 standard. With -syntax=f90 , failures occur when you mix logical and
integer variables. See the manpage for -intlog .
- -tune=<architecture>
- Long
name: -tune=<architecture>
Default value: -tune=host
The KAP preprocessor
determines whether the host architecture is ev4 or ev5 and then optimizes
your program for that architecture by default. In the event you compile
a program on one architecture but plan to run it on another, you should
override the default by setting -tune equal to the architecture where
the program will run. For example, if you compile a program on ev4 architecture,
but plan to run it on ev5, use -tune =ev5.
- - [n ]ty
- Long name: - [no ]type
Default value: -nty
The type switch instructs kapf90 to issue warning
messages for variables not explicitly typed. This is as if there were an
IMPLICIT NONE at the top of each program unit. The notype default suppresses
this checking.
- -ur=<integer>
- Long name: -unroll
Default value: -ur=4
The unroll , unroll2 , and unroll3 switches control innerloop unrolling.
-scalaropt=2 must be in effect to engage the unroll switch.
The syntax
for unroll is as follows:
Long form: -unroll=<#it>
Short form: -ur=<#it>
where <#it> is the maximum number of iterations to unroll
=0 use default
values to unroll
=1 no unrolling
The default, 4, means at most 4 iterations
will be unrolled.
- -ur2=<integer>
- Long name:-unroll2
Default value: -ur2=160
-scalaropt=2 must be in effect to engage the unroll2 switch.
The syntax
for unroll2 is as follows:
Long form: -unroll2=<weight>
Short form: -ur2=<weight>
where <weight> is the maximum weight, estimate of work, in
an unrolled
loop. Work is estimated by counting
operands and operators in a loop.
The default, 160, means a maximum work of 160 in an unrolled iteration.
- -ur3=<integer>
- Long name: -unroll3
Default value: -ur3=1
-scalaropt=2
must be in effect to engage the unroll3 switch. Unroll3=n sets the lower
limit for unrolling. If there are less than n units of work in the loop,
the loop will not be unrolled. The amount of work in each loop iteration
is shown in the loop table in the annotated listing. The switch should
be left at 1, the default.
- -useh
- Default value: off
KAP automatically
sets the -useh switch correctly for you. Digital recommends that you do
not set the -useh switch.
KAP needs two passes to resolve Fortran 90 forward
declarations The second pass, the -useh pass, resolves any forward references.
KAP supports the following directives: !*$* arclimit (0-500)
!*$* [no]concurrentize
!*$* each_invariant_if_growth (0-100)
!*$* [no]inline
[here | routine | global][(name[,name...])]
!*$* [no]ipa [here | routine | global][(name[,name...])]
!*$* limit (>0)
!*$* max_invariant_if_growth (0-1000)
!*$* minconcurrent
!*$* optimize (0-5)
!*$* roundoff (0-3)
!*$* scalar optimize (0-3)
!*$*
unroll(<#it>[,<weight>])
See the
KAP Fortran 90 for Digital UNIXtm User's
Guide for more details.
KAP supports the following assertions:
!*$* assert [no]argument aliasing
!*$* assert [no]bounds violations
!*$* assert concurrent call
!*$* assert do (concurrent)
!*$* assert
do (concurrent call)
!*$* assert do (serial)
!*$* assert do prefer (concurrent)
!*$* assert do prefer (serial)
!*$* assert [no]equivalence hazard
!*$*
assert [no]last value needed
!*$* assert permutation
!*$* assert no
recurrence
!*$* assert relation (<name> .XX. <variable/constant>)
!*$* assert
no sync
!*$* assert [no] temporaries for constant arguments
f90(1)
cpp(1)
cc(1)
KAP Fortran 90 User's Guide
kapf90 man page
Table of Contents