Introduction

Build Status Issues Pull requests

Greetings, fellow developers and enthusiasts! Here, amidst the digital expanse, we present to you the culmination of an ambitious endeavor: a MiniJava compiler meticulously crafted in Java, tailored for the MIPS32 architecture.

Inspired by the timeless wisdom encapsulated within "Modern Compiler Implementation in Java" by Andrew W. Appel and fortified by the comprehensive resources provided by Cambridge University Press, this repository stands as a testament to our commitment to excellence in compiler construction.

Within these virtual walls, you'll discover a treasure trove of code, documentation, and insights meticulously assembled to guide you through the intricacies of compiler development. From lexical analysis to MIPS32 code generation, every line of code, every algorithmic decision is informed by the principles expounded in our esteemed references.

Whether you're a seasoned compiler architect seeking to hone your craft or an aspiring enthusiast eager to delve into the arcane arts of compiler construction, this repository offers a sanctuary for exploration, learning, and collaboration.

So, come forth, fellow travelers, and embark on a journey where MiniJava programs transcend mere abstraction to find tangible expression on the MIPS32 architecture. Welcome to the MiniJava Compiler for MIPS32 Repository—where theory meets practice, and innovation knows no bounds!

Parser

A parser is a critical component of a compiler, responsible for analyzing the grammatical structure of a program's source code and transforming it into a format understandable to the compiler. In the context of the mini Java to MIPS32 compiler, the parser plays the pivotal role of interpreting the syntactic constructs of mini Java language, such as statements, expressions, and control flow structures, and translating them into an intermediate form that can be further processed by the compiler. Key points of the parser include recognizing and interpreting tokens, applying defined grammar rules for mini Java, and detecting and reporting syntactic errors in the source code.

Parser
- Tokens
  - Punctuation
  - Keywords
  - Types
  - Literals
  - Operators
- Statements
- Expressions
- Usability
- Example

Tokens

Tokens are fundamental units of a programming language, representing various syntactic elements within the code. In the context of the mini Java to MIPS32 compiler, tokens are categorized into different types, each serving a distinct purpose in the parsing process.

Punctuation

Punctuation tokens consist of symbols used for structuring code and delimiting expressions.

Type	Examples	Type	Examples
`LSQUIRLY`	`{`	`RSQUIRLY`	`}`
`LPAREN`	`(`	`RPAREN`	`)`
`LBRACKET`	`[`	`RBRACKET`	`]`
`COMMA`	`,`	`SEMICOLON`	`;`
`DOT`	`.`

Keywords

Keywords are reserved identifiers in the language that convey specific meanings or functionalities.

Type	Examples	Type	Examples
`CLASS`	`class`	`PUBLIC`	`public`
`STATIC`	`static`	`VOID`	`void`
`MAIN`	`main`	`EXTENDS`	`extends`
`RETURN`	`return`	`STRING`	`string`
`IF`	`if`	`ELSE`	`else`
`WHILE`	`while`	`THIS`	`this`
`NEW`	`new`	`LENGTH`	`length`
`SOUT`	`System.out.println`

Types

Types represent data categories in the language, defining the nature of variables and expressions.

Type	Examples
`INT`	`int`
`INT_ARRAY`	`int[]`
`BOOLEAN`	`boolean`

Literals

Literals represent constant values in the code, providing specific data to be processed or manipulated by the program. In the context of the mini Java to MIPS32 compiler, literals are categorized into different types, each representing a distinct category of constant values.

Type	Examples
`TRUE_LITERAL`	`true`
`FALSE_LITERAL`	`false`
`IDENTIFIER`	`gabs`, `saidx30`
`INTEGER_LITERAL`	`762`, `21`

Operators

Operators are symbols used to perform operations on variables and values in a programming language. In the mini Java to MIPS32 compiler, operators are classified into different types, each representing a specific operation or manipulation.

Type	Examples
`EQ`	`=`
`AND`	`&&`
`LT`	`<`
`PLUS`	`+`
`MINUS`	`-`
`STAR`	`*`
`BANG`	`!`

⬆️

Statements

Statements are fundamental components of a program that perform specific actions or control the flow of execution. In the mini Java to MIPS32 compiler, various types of statements are supported, each serving a unique purpose:

Block Statement: Represents a block of statements enclosed within curly braces {}. This allows for the grouping of multiple statements together.
Conditional Statement (if-else): Executes one of two statements based on the evaluation of a specified expression within the if condition. If the expression evaluates to true, the first statement is executed; otherwise, the second statement is executed.
Loop Statement (while): Repeatedly executes a statement as long as a specified expression within the while condition evaluates to true.
Print Statement: Outputs the result of an evaluated expression to the standard output using System.out.println.
Assignment Statement: Assigns the value of an evaluated expression to a variable identified by an identifier (id = Exp).
Array Assignment Statement: Assigns the value of an evaluated expression to a specific index of an array identified by another expression (id[Exp] = Exp).

⬆️

Expressions

Expressions represent computations or values within a program, often involving operators, literals, variables, and method calls. In the mini Java to MIPS32 compiler, various types of expressions are supported, each serving a specific purpose in defining program behavior:

Binary Operation: Represents an operation performed between two expressions, denoted as Exp op Exp.
Array Access: Accesses an element within an array using square brackets, expressed as Exp [Exp].
Array Length: Retrieves the length of an array using the length property, indicated as Exp . length.
Method Invocation: Calls a method on an object, specified as Exp . id ( ExpList ), where id represents the method name and ExpList denotes the list of arguments passed to the method.
Literals:
- INTEGER_LITERAL: Represents integer constants.
- true and false: Boolean literals.
Identifiers: Represents variables or object references within the program.
Keyword Expressions:
- this: Refers to the current object instance.
- new int [Exp]: Creates a new integer array of the specified size.
- new id(): Instantiates a new object of the specified class.
Unary Operation: Represents a unary operation applied to an expression, such as negation (!Exp).
Parenthesized Expression: Encloses an expression within parentheses to control evaluation order, as in (Exp).
Expression List: Represents a list of expressions used as arguments in a method call or array initialization. It is defined as Exp ExpRest*, where ExpRest allows for additional expressions separated by commas.

These expressions enable the creation of complex computations and interactions within mini Java programs, facilitating their translation into MIPS32 instructions.

⬆️

Usability

In the mini Java to MIPS32 compiler, the usability of the language constructs is defined by the grammar rules and structural conventions used in the code. Here are the key components that contribute to the usability of mini Java programs:

Start Symbol (Program):
- The starting point of a mini Java program is defined by a MainClass followed by zero or more ClassDecl.
MainClass:
- A main class is defined by the following syntax:
```
class id {
    public static void main(String[] id) {
        Statement
    }
}
```
  This structure specifies the class containing the main method, which serves as the entry point for program execution.
Class Declaration (ClassDecl):
- The syntax for defining a class is:
```
class id {
    VarDecl* MethodDecl*
}
```
  Optionally, a class can extend another class using the syntax:
```
class id extends id {
    VarDecl* MethodDecl*
}
```
  This structure defines the properties and methods associated with a class.
Variable Declaration (VarDecl):
- Variables are declared using the syntax Type id;, specifying the data type and identifier for the variable.
Method Declaration (MethodDecl):
- Methods are declared with the syntax:
```
public Type id(FormalList) {
    VarDecl* Statement* return Exp;
}
```
  This structure defines the signature, parameters, and body of a method.
Formal Parameters (FormalList):
- Formal parameters are specified as a list of data types and identifiers, with optional additional parameters defined by FormalRest.

These conventions ensure consistency and readability in mini Java code, enhancing its usability and maintainability.

⬆️

Example

class Factorial {
    public static void main(String[] a) {
        System.out.println(new Fac().ComputeFac(10));
    }
}
class Fac {
    public int ComputeFac(int num) {
        int num_aux;
        if (num < 1)
        num_aux = 1;
        else
            num_aux = num * (this.ComputeFac(num-1));
        return num_aux;
    }
}

graph LR
subgraph "id"
    second(((2)))
    fourth(((4)))
end
subgraph "if"
    third(((3)))
end
primary((1)) -->|i|second
primary -->|"`**a-h**, **j-z**`"|fourth
second -->|f| third
second -->|"`**a-e**, **g-z**, **0-9**`"| fourth
third -->|"`**0-9**, **a-z**`"| fourth
fourth -->|"`**0-9**, **a-z**`"| fourth

Return to top

Abstract Syntax Tree

Abstract Syntax Tree

Overview

The abstract syntax tree (AST) is a tree representation of the source code. It is used to represent the structure of the code in a way that is easy to manipulate and analyze. The AST is created by the parser, which reads the source code and generates a tree of nodes that represent the different parts of the code.

flowchart TD
    A[Program] --> B[ClassDecl]
    A --> C[MainClass]
    C --> Sout[System.out.println]
    Sout --> Hello["Hello, World!"]
    B --> E[VarDecl]
    B --> F[MethodDecl]
    F --> J[Statement]
    J --> K[Exp]

⬆️

Nodes

The nodes of the AST represent the different elements of the source code, such as statements, expressions, and declarations. Each node has a specific type and contains information about the code it represents. For example, a node representing an assignment statement might contain information about the variable being assigned to and the value being assigned.

// Abstract base class for all AST nodes
public abstract class Node {
    public abstract <T> T accept(Visitor<T> v);
}

// Example of a node representing an assignment statement
public class Assign extends Statement {
    private Identifier identifier;
    private Expression value;

    @Override
    public <T> T accept(Visitor<T> v) {
        return v.visit(this);
    }
}

Generic Type T is used to allow the visitor to return different types of values depending on the operation being

⬆️

Visitors

Each node in the AST has an accept method that takes a visitor as an argument. The visitor is an interface that defines methods for each type of node in the AST. When the accept method is called on a node, it calls the appropriate method on the visitor, passing itself as an argument. This allows the visitor to perform operations on the node and its children.

// Visitor interface
public interface Visitor<T> {
    T visit(Assign assign);
    T visit(Identifier identifier);
    T visit(Expression expression);
    { ... }
}

⬆️

Orthogonal directions of modularity.

The visitor pattern allows for orthogonal directions of modularity. This means that the code that defines the nodes of the AST is separate from the code that defines the operations that can be performed on the nodes. This separation makes it easy to add new operations to the AST without modifying the existing code.

Examples of operations that can be performed on the AST include type checking, table of symbols and mermaid printer, code generation, and optimization(like llvm).

Return to top

Semantic Analysis

se-man-tic: of or relating to meaning in language

Webster’s Dictionary

The semantic analysis phase of a compiler connects variable definitions to their uses, checks that each expression has a correct type, and translates the abstract syntax into a simpler representation suitable for generating machine code.

Contents
Table of Symbols
Type Checking

Table of Symbols

Table of Symbols
- Overview
- Details of Implementation

Overview

This table of symbols is a part of the semantic analysis phase of the compiler. It is responsible for building a symbol table that keeps track of the names and types of variables and functions in the program. The symbol table is used by other parts of the compiler, such as the type checker and code generator, to ensure that the program is well-formed and to generate correct code.


// Implementation of the SymbolTableVisitor
public class SymbolTableVisitor implements Visitor<Void> {
    private MainTable mainTable = new MainTable();
    private ClassTable currentClassTable = null;
    private MethodTable currentMethodTable = null;
    private final ArrayList<SymbolTableException> errors = new ArrayList<>();
    private final boolean ignoreExtends = false;

    public Void visit(And a) {
        return null;
    }

    public Void visit(BooleanType b) {
        return null;
    }

    { ...}
}

Imperative Symbol Table

The imperative symbol table is built using mutable data structures, such as maps and lists. It is updated in place as the compiler traverses the AST, adding new symbols and checking for errors.

Is the one we are using in our implementation.

Functional Symbol Table

The functional symbol table is built using immutable data structures, such as trees and lists. It is updated by creating new copies of the symbol table with the new symbols added. This approach is more functional and can make it easier to reason about the symbol table, but it can be less efficient than the imperative approach.

Symbols

A symbol is a name that is associated with a type. In the symbol table, symbols are used to represent variables, functions, classes, and other entities in the program. Each symbol has a name and a type, which is used to determine how the symbol can be used in the program.

⬆️

Details of Implementation

MainTable

The main table is the root of the symbol table. It contains a map of class names to class tables.

public class MainTable {
    HashMap<String, ClassTable> map = new HashMap<>();
}

⬆️

ClassTable

The class table contains a map of field names to types and a map of method names to method tables. It also contains a reference to the parent class table, if any.

In SymbolTableVisitor we have a currentClassTable to keep track of the current class we are in.

public class ClassTable {
    private String className;
    private ClassTable parent;
    HashMap<String, Type> fieldsContext = new HashMap<>();
    HashMap<String, MethodTable> methodsContext = new HashMap<>();
}

⬆️

MethodTable

The method table contains a map of parameter names to types and a map of local variable names to types. It also contains a reference to the parent class table.

In SymbolTableVisitor we have a currentMethodTable to keep track of the current method we are in.

public class MethodTable {
    private String methodName;
    private Type methodReturnType;
    private ClassTable classParent;
    LinkedHashMap<String, Type> paramsContext = new LinkedHashMap<>();
    HashMap<String, Type> localsContext = new HashMap<>();
}

⬆️

Exceptions

The SymbolTableException is a custom exception that is thrown when an error occurs during the building of the symbol table.

The SymbolTableVisitor don't throw exceptions, but it keeps track of them in a list of SymbolTableExceptions.

public class SymbolTableException extends RuntimeException {
    public SymbolTableException(String errorMessage) {
        super(errorMessage);
    }

    public SymbolTableException(String errorMessage, Throwable err) {
        super(errorMessage, err);
    }
}

Return to top

Type Checking

Type Checking

Overview

Type checking is a process of verifying and enforcing the constraints of types in a program. The type checking process can be performed at compile-time or run-time. Type checking is a fundamental part of the compiler or interpreter. It helps to catch bugs at the early stages of development and ensures that the program is well-typed.

Type Checking in AST

Type checking in AST is a process of verifying and enforcing the constraints of types in the AST nodes. The type checking process can be performed at compile-time or run-time. Type checking in AST can be implemented using the visitor pattern. The type checking visitor traverses the AST and performs type checking on the AST nodes.

Type Checking Visitor

In our compiler, we have implemented a type checking visitor that traverses the AST and performs type checking on the AST nodes. The type checking visitor is implemented as a subclass of the AST visitor. The type checking visitor implements the visit methods for each AST node type. The visit methods perform type checking on the AST nodes.

The class TypeCheckingVisitor implements Visitor<Type> and SemanticAnalysisStrategy

// T is the return type of the visitor, which is Type in this case
public interface Visitor<T> {
    T visit(And a);

    T visit(BooleanType b);

    { ...}
}

// SemanticAnalysisStrategy is an interface that defines a method
// to check the semantics of a program
public interface SemanticAnalysisStrategy {
    boolean isSemanticsOk(Program program);

    void isSemanticsOkOrThrow(Program program) throws SemanticAnalysisException;
}

Error Handling

When the type-checker detects a type error or an undeclared identifier, it should print an error message and continue – because the programmer would like to be told of all the errors in the program.

Return to top

Mija