Native Intermediate Representation

NIR is high-level object-oriented SSA-based representation. The core of the representation is a subset of LLVM instructions, types and values, augmented with a number of high-level primitives that are necessary to efficiently compiler modern languages like Scala.

Introduction

Lets have a look at the textual form of NIR generated for a simple Scala module:

object Test {
  def main(args: Array[String]): Unit =
    println("Hello, world!")
}

Would map to:

pin(@Test$::init) module @Test$ : @java.lang.Object

def @Test$::main_class.ssnr.ObjectArray_unit : (module @Test$, class @scala.scalanative.runtime.ObjectArray) => unit {
  %src.2(%src.0 : module @Test$, %src.1 : class @scala.scalanative.runtime.ObjectArray):
    %src.3 = module @scala.Predef$
    %src.4 = method %src.3 : module @scala.Predef$, @scala.Predef$::println_class.java.lang.Object_unit
    %src.5 = call[(module @scala.Predef$, class @java.lang.Object) => unit] %src.4 : ptr(%src.3 : module @scala.Predef$, "Hello, world!")
    ret %src.5 : unit
}

def @Test$::init : (module @Test$) => unit {
  %src.1(%src.0 : module @Test$):
    %src.2 = call[(class @java.lang.Object) => unit] @java.lang.Object::init : ptr(%src.0 : module @Test$)
    ret unit
}

Here we can see a few distinctive features of the representation:

  1. At its core NIR is very much a classical SSA-based representation. The code consists of basic blocks of instructions. Instructions take value and type parameters. Control flow instructions can only appear as the last instruction of the basic block.
  2. Basic blocks have parameters. Parameters directly correspond to phi instructions in the classical SSA.
  3. The representation is strongly typed. All parameters have explicit type annotations. Instructions may be overloaded for different types via type parameters.
  4. Unlike LLVM, it has support for high-level object-oriented features such as garbage-collected classes, traits and modules. They may contain methods and fields. There is no overloading or access control modifiers so names must be mangled appropriately.
  5. All definitions live in a single top-level scope indexed by globally unique names. During compilation they are lazily loaded until all reachable definitions have been discovered. pin and pin-if attributes are used to express additional dependencies.

Definitions

Var

..$attrs var @$name: $ty = $value

Corresponds to LLVM’s global variables when used in the top-level scope and to fields, when used as a member of classes and modules.

Const

..$attrs const @$name: $type = $value

Corresponds to LLVM’s global constant. Constants may only reside on the top-level and can not be members of classes and modules.

Declare

..$attrs def @$name: $type

Correspond to LLVM’s declare when used on the top-level of the compilation unit and to abstract methods when used inside classes and traits.

Define

..$attrs def @$name: $type { ..$blocks }

Corresponds to LLVM’s define when used on the top-level of the compilation unit and to normal methods when used inside classes, traits and modules.

Struct

..$attrs struct @$name { ..$types }

Corresponds to LLVM’s named struct.

Trait

..$attrs trait @$name : ..$traits

Scala-like traits. May contain abstract and concrete methods as members.

Class

..$attrs class @$name : $parent, ..$traits

Scala-like classes. May contain vars, abstract and concrete methods as members.

Module

..$attrs module @$name : $parent, ..$traits

Scala-like modules (i.e. object $name) May only contain vars and concrete methods as members.

Types

Void

void

Corresponds to LLVM’s void.

Vararg

...

Corresponds to LLVM’s varargs. May only be nested inside function types.

Pointer

ptr

Corresponds to LLVM’s pointer type with a major distinction of not preserving the type of memory that’s being pointed at. Pointers are going to become untyped in LLVM in near future too.

Boolean

bool

Corresponds to LLVM’s i1.

Integer

i8
i16
i32
i64

Corresponds to LLVM integer types. Unlike LLVM we do not support arbitrary width integer types at the moment.

Float

f32
f64

Corresponds to LLVM’s floating point types.

Array

[$type x N]

Corresponds to LLVM’s aggregate array type.

Function

(..$args) => $ret

Corresponds to LLVM’s function type.

Struct

struct @$name
struct { ..$types }

Has two forms: named and anonymous. Corresponds to LLVM’s aggregate structure type.

Unit

unit

A reference type that corresponds to scala.Unit.

Nothing

nothing

Corresponds to scala.Nothing. May only be used a function return type.

Class

class @$name

A reference to a class instance.

Trait

trait @$name

A reference to a trait instance.

Module

module @$name

A reference to a module.

Control-Flow

unreachable

unreachable

If execution reaches undefined instruction the behaviour of execution is undefined starting from that point. Corresponds to LLVM’s unreachable.

ret

ret $value

Returns a value. Corresponds to LLVM’s ret.

jump

jump $next(..$values)

Jumps to the next basic block with provided values for the parameters. Corresponds to LLVM’s unconditional version of br.

if

if $cond then $next1(..$values1) else $next2(..$values2)

Conditionally jumps to one of the basic blocks. Corresponds to LLVM’s conditional form of br.

switch

switch $value {
   case $value1 => $next1(..$values1)
   ...
   default      => $nextN(..$valuesN)
}

Jumps to one of the basic blocks if $value is equal to corresponding $valueN. Corresponds to LLVM’s switch.

invoke

invoke[$type] $ptr(..$values) to $success unwind $failure

Invoke function pointer, jump to success in case value is returned, unwind to failure if exception was thrown. Corresponds to LLVM’s invoke.

throw

throw $value

Throws the values and starts unwinding.

try

try $succ catch $failure

Operands

All non-control-flow instructions follow a general pattern of %$name = $opname[..$types] ..$values. Purely side-effecting operands like store produce unit value.

call

call[$type] $ptr(..$values)

Calls given function of given function type and argument values. Corresponds to LLVM’s call.

load

load[$type] $ptr

Load value of given type from memory. Corresponds to LLVM’s load.

store

store[$type] $ptr, $value

Store value of given type to memory. Corresponds to LLVM’s store.

elem

elem[$type] $ptr, ..$indexes

Compute derived pointer starting from given pointer. Corresponds to LLVM’s getelementptr.

extract

extract[$type] $aggrvalue, $index

Extract element from aggregate value. Corresponds to LLVM’s extractvalue.

insert

insert[$type] $aggrvalue, $value, $index

Create a new aggregate value based on existing one with element at index replaced with new value. Corresponds to LLVM’s insertvalue.

stackalloc

stackalloc[$type]

Stack allocate a slot of memory big enough to store given type. Corresponds to LLVM’s alloca.

bin

$bin[$type] $value1, $value2`

Where $bin is one of the following: iadd, fadd, isub, fsub, imul, fmul, sdiv, udiv, fdiv, srem, urem, frem, shl, lshr, ashr , and, or, xor. Depending on the type and signedness, maps to either integer or floating point binary operations in LLVM.

comp

$comp[$type] $value1, $value2

Where $comp is one of the following: eq, neq, lt, lte, gt, gte. Depending on the type, maps to either icmp or fcmp with corresponding comparison flags in LLVM.

conv

$conv[$type] $value

Where $conv is one of the following: trunc, zext, sext, fptrunc, fpext, fptoui, fptosi, uitofp, sitofp, ptrtoint, inttoptr, bitcast. Corresponds to LLVM conversion instructions with the same name.

sizeof

sizeof[$type]

Returns a size of given type.

classalloc

classalloc @$name

Roughly corresponds to new $name in Scala. Performs allocation without calling the constructor.

field

field[$type] $value, @$name

Returns a pointer to the given field of given object.

method

method[$type] $value, @$name

Returns a pointer to the given method of given object.

dynmethod

dynmethod $obj, $signature

Returns a pointer to the given method of given object and signature.

as

as[$type] $value

Corresponds to $value.asInstanceOf[$type] in Scala.

is

is[$type] $value

Corresponds to $value.isInstanceOf[$type] in Scala.

Values

Boolean

true
false

Corresponds to LLVM’s true and false.

Zero and null

null
zero $type

Corresponds to LLVM’s null and zeroinitializer.

Integer

Ni8
Ni16
Ni32
Ni64

Correponds to LLVM’s integer values.

Float

N.Nf32
N.Nf64

Corresponds to LLVM’s floating point values.

Struct

struct @$name {..$values}`

Corresponds to LLVM’s struct values.

Array

array $ty {..$values}

Corresponds to LLVM’s array value.

Local

%$name

Named reference to result of previously executed instructions or basic block parameters.

Global

@$name

Reference to the value of top-level definition.

Unit

unit

Corresponds to () in Scala.

Null

null

Corresponds to null literal in Scala.

String

"..."

Corresponds to string literal in Scala.

Attributes

Attributes allow one to attach additional metadata to definitions and instructions.

Inlining

mayinline

mayinline

Default state: optimiser is allowed to inline given method.

inlinehint

inlinehint

Optimiser is incentivized to inline given methods but is it allowed not to.

noinline

noinline

Optimiser must never inline given method.

alwaysinline

alwaysinline

Optimiser must always inline given method.

Linking

pin

pin(@$name)

Require $name to be reachable, whenever current definition is reachable. Used to introduce indirect linking dependencies. For example, module definitions depend on its constructors using this attribute.

pin-if

pin-if(@$name, @$cond)

Require $name to be reachable if current and $cond definitions are both reachable. Used to introduce conditional indirect linking dependencies. For example, class constructors conditionally depend on methods overridden in given class if the method that are being overridden are reachable.

pin-weak

pin-weak(@$name)

Require $name to be reachable if there is a reachable dynmethod with matching signature.

Misc

dyn

dyn

Indication that a method can be called using a structural type dispatch.

pure

pure

Let optimiser assume that calls to given method are effectively pure. Meaning that if the same method is called twice with exactly the same argument values, it can re-use the result of first invocation without calling the method twice.

extern

extern

Use C-friendly calling convention and don’t name-mangle given method.

override

override(@$name)

Attributed method overrides @$name method if @$name is reachable. $name must be defined in one of the super classes or traits of the parent class.