summaryrefslogtreecommitdiffstats
path: root/report/esoteric_project_report.tex
diff options
context:
space:
mode:
Diffstat (limited to 'report/esoteric_project_report.tex')
-rw-r--r--report/esoteric_project_report.tex141
1 files changed, 69 insertions, 72 deletions
diff --git a/report/esoteric_project_report.tex b/report/esoteric_project_report.tex
index 78a8528..82c4e2c 100644
--- a/report/esoteric_project_report.tex
+++ b/report/esoteric_project_report.tex
@@ -47,11 +47,6 @@
%\pagenumbering{roman}
%\newpage
-
-\begin{abstract}
-ABSTRACT INCOMPLETE
-\end{abstract}
-
\chapter{Introduction}
\label{chap:intro}
\section{Project Statement}
@@ -105,15 +100,18 @@ In the case of compiler, code is converted from high level program code to assem
In contrast to this, Interpreter directly runs the code given to it, or in other words, an interpreter is a separate program that reads and executes the instructions itself, rather than converting them into the assembly language of that particular architecture being run on. The principal advantage of this is that a computationally expensive compilation process does not have to be run every time, therefore leading to faster startup times. However, this approach carries with it a big disadvantage, namely that because every line has to be interpreted, execution is slower than the equivalent instructions run in a compiled language.
-\section{Interpreted Programming Languages}
-There are a number of prominent interpreted languages in use today, including Python, Matlab, Perl, Ruby and PHP. All of these languages have interpreters that are written in C or C++, which are compiled languages. The interpreters for each language are loaded into memory, along with an instruction that they then interpret.
-
\section{Esoteric Languages (Esolangs)}
-Esolangs are programming languages designed to be jokes or proof of concept languages, rather than those to be used in actual programming tasks. Typically, the reason they are created is for fun rather than any serious desire to solve a particular programming problem, and so most languages aim to be funny, to be as difficult to use as possible or to have few features while still being Turing complete.
+Esolangs are programming languages designed to be jokes or proof of concept languages, rather than those to be used in actual programming tasks. Typically, the reason they are created is for fun rather than any serious desire to solve a particular programming problem, and so most languages aim to be funny, to be as difficult to use as possible or to have few features while still being Turing complete. Some esoteric languages are also designed to adapt existing languages into simpler or more usable forms.
-\section{Example - Shakespearian Programming Language}
+\section{Shakespearian Programming Language}
The Shakespeare Programming Language is esoteric code parodied on extracts from Romeo and Juliet. Here, code is written so it reads like how a script would, such as \texttt{[Enter Juliet]} where \texttt{Juliet} may be a variable name. Other examples include using \texttt{Acts} or \texttt{Scenes} as GOTO statements, where the interpreter can jump to a certain point in the code and read from there. Otherwise, these are ignored similar to the title or character descriptions. Other examples include LOLCODE, Hodor or White Space. This serves the purpose of writing code in a different manner to usual, sometimes for humour. The aim is to replace typical language features so that it can still be read by a compiler or interpreter but also look and read very differently by the user.
+\section{Befunge}
+Befunge is one of the most popular and well known esoteric languages, intentionally designed to be hard to compile. Befunge code is written in a 2d plane, where ASCII characters represent instructions. An instruction pointer starts in the top left of this plane, and begins travelling right. Data is stored in a stack, similar to many other esoteric languages. One other feature of Befunge is that it is self-modifying, meaning instructions can modify the plane the code is on. Since Befunge-98, there is no fixed size of the plane, meaning Befunge is a Turing complete language.
+
+\section{Pyth}
+Pyth is a language designed to be both concise, but also clear to read. Pyth is designed to be similar to python, but with every instruction a single character. When ran, a pyth code is first compiled into python before being executed. Unlike many other esoteric languages, Pyth is procedural based, not stack based. Print statements are also implicit in Pyth, and all constructs are end of line terminated.
+
\chapter{Compiler Design and Methodology}
This chapter will discuss how compilers work, and the design behind them. The goal of a compiler is to convert the source code into an from which the CPU can execute. As mentioned in chapter 2, a compiler works by converting the entire source code in one go, then executing, unlike a interpreter which executes each line at a time. However, before the code can be executed a number of steps have to be taken.
\section{The Lexer (Lexical Analysis)}
@@ -129,70 +127,25 @@ At this stage, the code is converted into a form which can be executed by the CP
One other method is to compile into another, existing programming language. This method is known as a source-to-source compiler. Here, instead of writing to machine code or bytecode, the compiler creates a string of valid source code for the language being targeted, based on the initial source code for the main compiler. This can then be executed using other, pre-existing compilers. One common use of these compilers is for web browsers, where many languages compile to javascript.
-\chapter{Grammar}
-
+\chapter{Our Esoteric Language}
+\section{EsotericFortran}
+For our esoteric language we decided to adapt an existing language into a modern form. Specifically, we chose the language Fortran. We chose Fortran as contained all the the features we were interested in implementing, but was also different enough from most currently popular languages.
-\section{Introduction to Language Grammar}
-\begin{verbatim}
+Of course, we have decided to make some changes from Fortran. Most importantly, we decided to include keywords in our language unlike Fortran, as we thought that allowing variables to share names with keywords adds unnecessary confusion. We have also decided to make other, small changes from Fortran, either due to time constraints or to simplify to a more modern form.
+\section{Language Grammar}
+Before we can begin implementation, we need to design the grammar of our language. The grammar not only gives us and understanding of how our language will work and allow us to spot and issues, but also our chosen parser, a recursive descent parser, heavily relies on the language grammar for its implementation.
-Work in progress currently
+To define the language we are using formal grammars. Formal grammars are a way represent the language as a set of rules. In order to fully represent the language we are implementing we have a context-free grammar, a type 2 grammar on the Chomsky hierarchy. Context-free grammars are a type of formal grammar in the form
+T $\rightarrow$ t
+where A is a nonterminal, and t a series of terminals and nonterminals. Nonterminals are references to other rules in the language, and a terminal is an end point in the language. They are context-free as they do not look at the context of the nonterminal.
+In order to define the language in a readable way, we have used a slightly adjusted from of the Backus-Naur form (BNF).
+\begin{verbatim}<program> ::= <statement>*
\end{verbatim}
-Context free grammars – something = something else
-
-Context free because there is no rule or context about where this grammar belongs, only where it goes
-
-Rewrite rules:
-S $\rightarrow$ DET N
-Were first invented by Panini %Pāṇini
- who worked on Sanscrit grammar rules
-
-Port Royal Grammer by Antoine Arnauld and Claude Lancelot in 1660 who worked at Port-Royal-des-Champs
-
-then Ferdinand de Saussure (19th century)
-
-then Chomsky with Chomsky hierarchy.
-
-
-BNF form (Backus Naur form) is a meta language for grammars
-Named after the people who invented it simultaneously
-example:
-\begin{verbatim}<program> ::= <sequence>?
-\end{verbatim}
-This means a program consists of a sequence of allowable grammars followed by question mark
-Can be read as “program consists of a sequence of things followed by question mark”
-
-
-Grammar consists of a series of rewrite rules
-Grammar is a quadruple. It is a set of:
-\begin{itemize}
- \item Terminal symbols
- \item Non terminal symbols
- \item Set of rules
- \item Start symbol (one of the non-terminal symbols)
-\end{itemize}
-
-
-Any regular grammar can be recognised by regular expressions (refer to regular grammar in Chomksy hierarchy)
-
-After Chomsky came Backus and Naur – BNF
-
-Niklaus Wirth – Pascal language
-
-Dijkstra – programming without goto (gotoless programming)
-
-
-\section {Why we chose FORTRAN}
-
-asdf
-
-\clearpage
-\chapter{BNF form}
-
-\cite{davie1982recursive} talk about bnf form here
-
-
-\begin{table}[h!]
+This example means that a program is defined as some number of statements. Here, we have used the '*' symbol to mean one or more and surrounded nonterminals with '<' and '>'. For terminals we have decided to surround any terminals with '"', such as for keywords or defining identifiers or numbers. If a nontermial has multiple possible rules we have seperated them using '|', meaning "or".
+\newpage
+\subsection{Statements Grammar}
+\begin{table}[!htb]
\begin{center}
\begin{tabular}{|l|r|}
\hline
@@ -248,7 +201,9 @@ asdf
\caption{Grammar table for EsotericFortran Statements}
\end{center}
\end{table}
-\begin{table}[h!]
+\newpage
+\subsection{Expression Grammar}
+\begin{table}[!htb]
\begin{center}
\begin{tabular}{|l|r|}
\hline
@@ -349,8 +304,50 @@ Conditions could include ``have they written \texttt{int} before this keyword''
The above code represents our decision making when it comes to identifying statements. We have two primary algorithms in this class; \texttt{matchAndAdvance()} and \texttt{matchOrError()}.
\subsection{matchAndAdvance()}
-asdf
+We have a function which tries to match the grammar of each token, and executes code depending on whether all the conditions were met. \texttt{matchAndAdvance()} takes a TokenType variable and checks if it matches what was expected. If so, it advances and keeps iterating until the end of the file is met. If not, it returns an error since something unexpected occurred. An example of where this function is used is in our Statement methods, where we check whether the following code is a statement or not. This is below, in Figure 5.3.
+
+
\subsection{matchOrError()}
+Our \texttt{matchOrError()} function does very similar to our previously mentioned \texttt{matchAndAdvance()}, with the exception of if something unexpected happens during the program's analysis, it will throw an error and display on the UI of our program, giving us feedback on what is causing the problem in our code.
+
+\section{Expression.java \& Statement.java}
+Programs are generally made up of expressions and statements. This is why we made abstract classes Expression.java and Statement.java. In our Expression class, we create a binary expression, which is when left and right objects are created, separated by an operator. This for example could be \texttt{5 * 4 + 3 * 6}. Here, our left and right operands are 5 * 4 and 3 * 6, separated by a PLUS token type. This can further be broken down by the numbers separated by the multiplication sign. Our Statements are lists of blocks which have a certain type, whether they are Prints or conditionals, or even iterables such as Whiles. An expression is also a statement, so we also accounted for having an ExpressionStatement. Ultimately, we have many functions in these abstracts which return certain types for each statement or expression, and this can be used for identification when our Parser eventually reads input.
+
+\section{Translator.java \& ExecuteC.java}
+Our translator file takes our list of statements previously analysed by our chain process, going through out lexer and parser, and one by one is converted into C code which can then be executed as it would normally to provide an output. We store these staments in an ArrayList and simply iterate over them using functions such as compileToC(), where for each statement, we evaluate it and add it to a text file which is eventually printed to the UI.
+
+\section{Agile \& Project Management}
+We worked well as a team. There were little issues. We decided to use the Agile method since it would lead to not only increased team communication, but better organisation when it came to our code. It also meant our development was flexible; we could choose what we wanted to work on at a given time with the use of sprints. \newline
+
+During our allocated project time, we did four major sprints. Our initial priority was creating the main framework for the compiler of our esoteric language. Then we handled variables and print statements. Then implemented conditionals and loops. Finally, we tackled arrays and functions. Once we tied up these loose ends, we could actually make a full Sieve of Eratosthenes algorithm to show the capability of the program. \\
+
+\subsection{Sprint 1: Creating the framework for the compiler}
+\begin{itemize}
+\item Language.java
+\item Token.java
+\item TokenScanner.java
+\item TokenType.java
+\end{itemize}
+
+\subsection{Sprint 2: Variables and Print statements}
+\begin{itemize}
+\item Parser.java
+\item Expression.java
+\item Statement.java
+\end{itemize}
+\subsection{Sprint 3: Conditionals and loops}
+\begin{itemize}
+\item Parser.java
+\item Utils.java
+\end{itemize}
+\subsection{Sprint 4: Arrays \& functions}
+\begin{itemize}
+\item Parser.java
+\item Environment.java
+\item ExecuteC.java
+\item Translator.java
+
+\end{itemize}
\chapter{Implementation}\label{Impl}