Introduction
Greetings, fellow developers and enthusiasts! Here, amidst the digital expanse, we present to you the culmination of an ambitious endeavor: a MiniJava compiler meticulously crafted in Java, tailored for the MIPS32 architecture.
Inspired by the timeless wisdom encapsulated within "Modern Compiler Implementation in Java" by Andrew W. Appel and fortified by the comprehensive resources provided by Cambridge University Press, this repository stands as a testament to our commitment to excellence in compiler construction.
Within these virtual walls, you'll discover a treasure trove of code, documentation, and insights meticulously assembled to guide you through the intricacies of compiler development. From lexical analysis to MIPS32 code generation, every line of code, every algorithmic decision is informed by the principles expounded in our esteemed references.
Whether you're a seasoned compiler architect seeking to hone your craft or an aspiring enthusiast eager to delve into the arcane arts of compiler construction, this repository offers a sanctuary for exploration, learning, and collaboration.
So, come forth, fellow travelers, and embark on a journey where MiniJava programs transcend mere abstraction to find tangible expression on the MIPS32 architecture. Welcome to the MiniJava Compiler for MIPS32 Repository—where theory meets practice, and innovation knows no bounds!
Parser
A parser is a critical component of a compiler, responsible for analyzing the grammatical structure of a program's source code and transforming it into a format understandable to the compiler. In the context of the mini Java to MIPS32 compiler, the parser plays the pivotal role of interpreting the syntactic constructs of mini Java language, such as statements, expressions, and control flow structures, and translating them into an intermediate form that can be further processed by the compiler. Key points of the parser include recognizing and interpreting tokens, applying defined grammar rules for mini Java, and detecting and reporting syntactic errors in the source code.
Tokens
Tokens are fundamental units of a programming language, representing various syntactic elements within the code. In the context of the mini Java to MIPS32 compiler, tokens are categorized into different types, each serving a distinct purpose in the parsing process.
Punctuation
Punctuation tokens consist of symbols used for structuring code and delimiting expressions.
Type | Examples | Type | Examples | |
---|---|---|---|---|
LSQUIRLY | { | RSQUIRLY | } | |
LPAREN | ( | RPAREN | ) | |
LBRACKET | [ | RBRACKET | ] | |
COMMA | , | SEMICOLON | ; | |
DOT | . |
Keywords
Keywords are reserved identifiers in the language that convey specific meanings or functionalities.
Type | Examples | Type | Examples | |
---|---|---|---|---|
CLASS | class | PUBLIC | public | |
STATIC | static | VOID | void | |
MAIN | main | EXTENDS | extends | |
RETURN | return | STRING | string | |
IF | if | ELSE | else | |
WHILE | while | THIS | this | |
NEW | new | LENGTH | length | |
SOUT | System.out.println |
Types
Types represent data categories in the language, defining the nature of variables and expressions.
Type | Examples |
---|---|
INT | int |
INT_ARRAY | int[] |
BOOLEAN | boolean |
Literals
Literals represent constant values in the code, providing specific data to be processed or manipulated by the program. In the context of the mini Java to MIPS32 compiler, literals are categorized into different types, each representing a distinct category of constant values.
Type | Examples |
---|---|
TRUE_LITERAL | true |
FALSE_LITERAL | false |
IDENTIFIER | gabs , saidx30 |
INTEGER_LITERAL | 762 , 21 |
Operators
Operators are symbols used to perform operations on variables and values in a programming language. In the mini Java to MIPS32 compiler, operators are classified into different types, each representing a specific operation or manipulation.
Type | Examples |
---|---|
EQ | = |
AND | && |
LT | < |
PLUS | + |
MINUS | - |
STAR | * |
BANG | ! |
Statements
Statements are fundamental components of a program that perform specific actions or control the flow of execution. In the mini Java to MIPS32 compiler, various types of statements are supported, each serving a unique purpose:
-
Block Statement: Represents a block of statements enclosed within curly braces
{}
. This allows for the grouping of multiple statements together. -
Conditional Statement (if-else): Executes one of two statements based on the evaluation of a specified expression within the
if
condition. If the expression evaluates to true, the first statement is executed; otherwise, the second statement is executed. -
Loop Statement (while): Repeatedly executes a statement as long as a specified expression within the
while
condition evaluates to true. -
Print Statement: Outputs the result of an evaluated expression to the standard output using
System.out.println
. -
Assignment Statement: Assigns the value of an evaluated expression to a variable identified by an identifier (
id = Exp
). -
Array Assignment Statement: Assigns the value of an evaluated expression to a specific index of an array identified by another expression (
id[Exp] = Exp
).
Expressions
Expressions represent computations or values within a program, often involving operators, literals, variables, and method calls. In the mini Java to MIPS32 compiler, various types of expressions are supported, each serving a specific purpose in defining program behavior:
-
Binary Operation: Represents an operation performed between two expressions, denoted as
Exp op Exp
. -
Array Access: Accesses an element within an array using square brackets, expressed as
Exp [Exp]
. -
Array Length: Retrieves the length of an array using the
length
property, indicated asExp . length
. -
Method Invocation: Calls a method on an object, specified as
Exp . id ( ExpList )
, whereid
represents the method name andExpList
denotes the list of arguments passed to the method. -
Literals:
INTEGER_LITERAL
: Represents integer constants.true
andfalse
: Boolean literals.
-
Identifiers: Represents variables or object references within the program.
-
Keyword Expressions:
this
: Refers to the current object instance.new int [Exp]
: Creates a new integer array of the specified size.new id()
: Instantiates a new object of the specified class.
-
Unary Operation: Represents a unary operation applied to an expression, such as negation (
!Exp
). -
Parenthesized Expression: Encloses an expression within parentheses to control evaluation order, as in
(Exp)
. -
Expression List: Represents a list of expressions used as arguments in a method call or array initialization. It is defined as
Exp ExpRest*
, whereExpRest
allows for additional expressions separated by commas.
These expressions enable the creation of complex computations and interactions within mini Java programs, facilitating their translation into MIPS32 instructions.
Usability
In the mini Java to MIPS32 compiler, the usability of the language constructs is defined by the grammar rules and structural conventions used in the code. Here are the key components that contribute to the usability of mini Java programs:
-
Start Symbol (Program):
- The starting point of a mini Java program is defined by a
MainClass
followed by zero or moreClassDecl
.
- The starting point of a mini Java program is defined by a
-
MainClass:
- A main class is defined by the following syntax:
This structure specifies the class containing theclass id { public static void main(String[] id) { Statement } }
main
method, which serves as the entry point for program execution.
- A main class is defined by the following syntax:
-
Class Declaration (ClassDecl):
- The syntax for defining a class is:
Optionally, a class can extend another class using the syntax:class id { VarDecl* MethodDecl* }
This structure defines the properties and methods associated with a class.class id extends id { VarDecl* MethodDecl* }
- The syntax for defining a class is:
-
Variable Declaration (VarDecl):
- Variables are declared using the syntax
Type id;
, specifying the data type and identifier for the variable.
- Variables are declared using the syntax
-
Method Declaration (MethodDecl):
- Methods are declared with the syntax:
This structure defines the signature, parameters, and body of a method.public Type id(FormalList) { VarDecl* Statement* return Exp; }
- Methods are declared with the syntax:
-
Formal Parameters (FormalList):
- Formal parameters are specified as a list of data types and identifiers, with optional additional parameters
defined by
FormalRest
.
- Formal parameters are specified as a list of data types and identifiers, with optional additional parameters
defined by
These conventions ensure consistency and readability in mini Java code, enhancing its usability and maintainability.
Example
class Factorial {
public static void main(String[] a) {
System.out.println(new Fac().ComputeFac(10));
}
}
class Fac {
public int ComputeFac(int num) {
int num_aux;
if (num < 1)
num_aux = 1;
else
num_aux = num * (this.ComputeFac(num-1));
return num_aux;
}
}
graph LR subgraph "id" second(((2))) fourth(((4))) end subgraph "if" third(((3))) end primary((1)) -->|i|second primary -->|"`**a-h**, **j-z**`"|fourth second -->|f| third second -->|"`**a-e**, **g-z**, **0-9**`"| fourth third -->|"`**0-9**, **a-z**`"| fourth fourth -->|"`**0-9**, **a-z**`"| fourth
Abstract Syntax Tree
Overview
The abstract syntax tree (AST) is a tree representation of the source code. It is used to represent the structure of the code in a way that is easy to manipulate and analyze. The AST is created by the parser, which reads the source code and generates a tree of nodes that represent the different parts of the code.
flowchart TD A[Program] --> B[ClassDecl] A --> C[MainClass] C --> Sout[System.out.println] Sout --> Hello["Hello, World!"] B --> E[VarDecl] B --> F[MethodDecl] F --> J[Statement] J --> K[Exp]
Nodes
The nodes of the AST represent the different elements of the source code, such as statements, expressions, and declarations. Each node has a specific type and contains information about the code it represents. For example, a node representing an assignment statement might contain information about the variable being assigned to and the value being assigned.
// Abstract base class for all AST nodes
public abstract class Node {
public abstract <T> T accept(Visitor<T> v);
}
// Example of a node representing an assignment statement
public class Assign extends Statement {
private Identifier identifier;
private Expression value;
@Override
public <T> T accept(Visitor<T> v) {
return v.visit(this);
}
}
Generic Type T
is used to allow the visitor to return different types of values depending on the operation being
Visitors
Each node in the AST has an accept
method that takes a visitor as an argument. The visitor is an interface that
defines methods for each type of node in the AST. When the accept
method is called on a node, it calls the
appropriate method on the visitor, passing itself as an argument. This allows the visitor to perform operations on
the node and its children.
// Visitor interface
public interface Visitor<T> {
T visit(Assign assign);
T visit(Identifier identifier);
T visit(Expression expression);
{ ... }
}
Orthogonal directions of modularity.
The visitor pattern allows for orthogonal directions of modularity. This means that the code that defines the nodes of the AST is separate from the code that defines the operations that can be performed on the nodes. This separation makes it easy to add new operations to the AST without modifying the existing code.
Examples of operations that can be performed on the AST include type checking, table of symbols and mermaid printer, code generation, and optimization(like llvm).
Semantic Analysis
se-man-tic: of or relating to meaning in language
Webster’s Dictionary
The semantic analysis phase of a compiler connects variable definitions to their uses, checks that each expression has a correct type, and translates the abstract syntax into a simpler representation suitable for generating machine code.
Contents |
---|
Table of Symbols |
Type Checking |
Table of Symbols
Overview
This table of symbols is a part of the semantic analysis phase of the compiler. It is responsible for building a symbol table that keeps track of the names and types of variables and functions in the program. The symbol table is used by other parts of the compiler, such as the type checker and code generator, to ensure that the program is well-formed and to generate correct code.
// Implementation of the SymbolTableVisitor
public class SymbolTableVisitor implements Visitor<Void> {
private MainTable mainTable = new MainTable();
private ClassTable currentClassTable = null;
private MethodTable currentMethodTable = null;
private final ArrayList<SymbolTableException> errors = new ArrayList<>();
private final boolean ignoreExtends = false;
public Void visit(And a) {
return null;
}
public Void visit(BooleanType b) {
return null;
}
{ ...}
}
Imperative Symbol Table
The imperative symbol table is built using mutable data structures, such as maps and lists. It is updated in place as the compiler traverses the AST, adding new symbols and checking for errors.
Is the one we are using in our implementation.
Functional Symbol Table
The functional symbol table is built using immutable data structures, such as trees and lists. It is updated by creating new copies of the symbol table with the new symbols added. This approach is more functional and can make it easier to reason about the symbol table, but it can be less efficient than the imperative approach.
Symbols
A symbol is a name that is associated with a type. In the symbol table, symbols are used to represent variables, functions, classes, and other entities in the program. Each symbol has a name and a type, which is used to determine how the symbol can be used in the program.
Details of Implementation
MainTable
The main table is the root of the symbol table. It contains a map of class names to class tables.
public class MainTable {
HashMap<String, ClassTable> map = new HashMap<>();
}
ClassTable
The class table contains a map of field names to types and a map of method names to method tables. It also contains a reference to the parent class table, if any.
In SymbolTableVisitor we have a currentClassTable to keep track of the current class we are in.
public class ClassTable {
private String className;
private ClassTable parent;
HashMap<String, Type> fieldsContext = new HashMap<>();
HashMap<String, MethodTable> methodsContext = new HashMap<>();
}
MethodTable
The method table contains a map of parameter names to types and a map of local variable names to types. It also contains a reference to the parent class table.
In SymbolTableVisitor we have a currentMethodTable to keep track of the current method we are in.
public class MethodTable {
private String methodName;
private Type methodReturnType;
private ClassTable classParent;
LinkedHashMap<String, Type> paramsContext = new LinkedHashMap<>();
HashMap<String, Type> localsContext = new HashMap<>();
}
Exceptions
The SymbolTableException is a custom exception that is thrown when an error occurs during the building of the symbol table.
The SymbolTableVisitor don't throw exceptions, but it keeps track of them in a list of SymbolTableExceptions.
public class SymbolTableException extends RuntimeException {
public SymbolTableException(String errorMessage) {
super(errorMessage);
}
public SymbolTableException(String errorMessage, Throwable err) {
super(errorMessage, err);
}
}
Type Checking
Overview
Type checking is a process of verifying and enforcing the constraints of types in a program. The type checking process can be performed at compile-time or run-time. Type checking is a fundamental part of the compiler or interpreter. It helps to catch bugs at the early stages of development and ensures that the program is well-typed.
Type Checking in AST
Type checking in AST is a process of verifying and enforcing the constraints of types in the AST nodes. The type checking process can be performed at compile-time or run-time. Type checking in AST can be implemented using the visitor pattern. The type checking visitor traverses the AST and performs type checking on the AST nodes.
Type Checking Visitor
In our compiler, we have implemented a type checking visitor that traverses the AST and performs type checking on the AST nodes. The type checking visitor is implemented as a subclass of the AST visitor. The type checking visitor implements the visit methods for each AST node type. The visit methods perform type checking on the AST nodes.
The class TypeCheckingVisitor implements Visitor<Type>
and SemanticAnalysisStrategy
// T is the return type of the visitor, which is Type in this case
public interface Visitor<T> {
T visit(And a);
T visit(BooleanType b);
{ ...}
}
// SemanticAnalysisStrategy is an interface that defines a method
// to check the semantics of a program
public interface SemanticAnalysisStrategy {
boolean isSemanticsOk(Program program);
void isSemanticsOkOrThrow(Program program) throws SemanticAnalysisException;
}
Error Handling
When the type-checker detects a type error or an undeclared identifier, it should print an error message and continue – because the programmer would like to be told of all the errors in the program.