From 5414054a57cfa49b96f1dc6129be7331f54b802d Mon Sep 17 00:00:00 2001 From: AidenRushbrooke <72034940+AidenRushbrooke@users.noreply.github.com> Date: Wed, 15 Dec 2021 10:38:40 +0000 Subject: Removed and cut various sections --- report/esoteric_project_report.pdf | Bin 293516 -> 281078 bytes report/esoteric_project_report.tex | 248 ++----------------------------------- report/report.pdf | Bin 67012 -> 0 bytes report/report.tex | 39 ------ 4 files changed, 10 insertions(+), 277 deletions(-) delete mode 100644 report/report.pdf delete mode 100644 report/report.tex diff --git a/report/esoteric_project_report.pdf b/report/esoteric_project_report.pdf index 2156a5d..21bf6c7 100644 Binary files a/report/esoteric_project_report.pdf and b/report/esoteric_project_report.pdf differ diff --git a/report/esoteric_project_report.tex b/report/esoteric_project_report.tex index 8ab54d3..4246ec8 100644 --- a/report/esoteric_project_report.tex +++ b/report/esoteric_project_report.tex @@ -89,16 +89,9 @@ Won't have: \end{itemize} \section{Report structure} -This report will first cover our research into compilers and esoteric languages in Chapter 2. Then, we will describe the overall design and structure of a compiler based on our research in Chapter 3. Chapter 4 will focus on the design of our own esoteric language, including grammar tables. Chapter 5 will describe the implementation and documentation of the compiler, explaining how it was implemented using agile methods. Chapter 6 will show evidence of testing the compiler, with both unit tests and overall test harnesses for each section. Finally, Chapter 7 will be the conclusion, where we will compare the final result to our initial requirements, and give any further work we would do if given time. +This report will first cover our research into esoteric languages in Chapter 2. Then, we will describe the overall design and structure of a compiler based on our research in Chapter 3. Chapter 4 will focus on the design of our own esoteric language, including grammar tables. Chapter 5 will describe the implementation and documentation of the compiler, explaining how it was implemented using agile methods. Chapter 6 will show evidence of testing the compiler, with both unit tests and overall test harnesses for each section. Finally, Chapter 7 will be the conclusion, where we will compare the final result to our initial requirements, and give any further work we would do if given time. -\chapter{Research of Similar Systems} - -\section{Compilers vs Interpreters} -Within the field of computer programming languages, there are two main types: compiled languages and interpreted languages. Both of these styles have distinct advantages and disadvantages. Compilers and interpreters both convert code written in human readable higher-level languages to instructions that a computer can execute. - -In the case of compiler, code is converted from high level program code to assembly language then to machine language. This process can be slow, especially when there is a large amount of code to compile, but execution is fast as only the instructions directly executed are required to be kept in memory. - -In contrast to this, Interpreter directly runs the code given to it, or in other words, an interpreter is a separate program that reads and executes the instructions itself, rather than converting them into the assembly language of that particular architecture being run on. The principal advantage of this is that a computationally expensive compilation process does not have to be run every time, therefore leading to faster startup times. However, this approach carries with it a big disadvantage, namely that because every line has to be interpreted, execution is slower than the equivalent instructions run in a compiled language. +\chapter{Research of Esoteric Languages} \section{Esoteric Languages (Esolangs)} Esolangs are programming languages designed to be jokes or proof of concept languages, rather than those to be used in actual programming tasks. Typically, the reason they are created is for fun rather than any serious desire to solve a particular programming problem, and so most languages aim to be funny, to be as difficult to use as possible or to have few features while still being Turing complete. Some esoteric languages are also designed to adapt existing languages into simpler or more usable forms. @@ -120,8 +113,6 @@ The first step for a compiler is to convert the initial source code into groups The second stage of the compiler is to try and understand how the tokens link together to form a a program. The parser uses the list of tokens created and compares it to the language grammar. This allows it to understand how the tokens are linked, and can then use this information to create abstract syntax trees (AST), which represents the code in a logical fashion. The parser can also detect any syntax errors in the initial source code when comparing against the language grammar. The specific parser we have chosen is a recursive descent parser, a Top-Down parser where each rule in the grammar maps to a specific function in the parser implementation. A recursive descent parser works by starting at the first rule in the grammar, then working downward, finding each statement or expression. We chose this parser design as it is both simple to implement and gives a lot of control to how the parser functions and error reporting compared to parsers created through a parser generator and grammar table. Many compiler implementations also use recursive descent parsers, such as GCC \cite{GCCParser} and Clang. -\section{Optimization} -This step involves making a variety of optimizations to the current program to speed up processing. This may involve simplifying loops, or pre-calculating complicated expressions which only contain constants. However, this step is not vital to the function of an effective compiler, with some deciding to ignore optimization entirely. \section{Code Generation} At this stage, the code is converted into a form which can be executed by the CPU, such as machine code. However, due to the complexity of converting directly to machine code, an alternative is commonly done. Here, the compiler writes to a alternate form designed to run on a created virtual machine. This form is known as bytecode. Java is a well known example which uses this method, where java programs are compiled to run in the Java virtual machine. @@ -261,7 +252,7 @@ This example means that a program is defined as some number of statements. Here, \end{center} \end{table} -\chapter{Documentation}\label{MethLab} +\chapter{Implementation}\label{MethLab} We decided it would be a good idea to note down and record our application stage by stage while developing our Esolang so that we may refer back to previous parts for reference. It will also help track our progress throughout the project to ensure we stay on track. @@ -277,7 +268,7 @@ Of course the very first thing that is required when it comes to translating any \section{TokenScanner.java \& TokenType.java} -This begins the lexical analysis of our program, by examining each string by character, and identifying what type of symbol it is, whether it is text or something further such as operations. When the program reads a particular character, such as ``+'', it needs to know that it is firstly, an operator and second, a plus operand; and so we can start to identify left and right expressions to calculate Math, but this is mentioned in more detail later in the document alongside the sprints and implementation. There are other examples other than the addition operator. We also have prepared for ``-'' or subtract, the lookahead function since ``=='' is conveyed differently to ``=''. All of these are converted to tokens one by one before being sent to our parser algorithms. +This begins the lexical analysis of our program, by examining each string by character, and identifying what type of symbol it is, whether it is text or something further such as operations. When the program reads a particular character, such as ``+'', it needs to know that it is firstly, an operator and second, a plus operand. There are other examples other than the addition operator. We also have prepared for ``-'' or subtract, the lookahead function since ``=='' is conveyed differently to ``=''. All of these are converted to tokens one by one before being sent to our parser algorithms. \begin{figure}[h!] \begin{center} @@ -286,11 +277,11 @@ This begins the lexical analysis of our program, by examining each string by cha \end{center} \end{figure} -In the image above we have identified a means for using switch statements to recognise strings and turning them into fixed tokens. This is also why we use \texttt{TokenType.java}, which uses a fixed number of enums so the program knows exactly what they have read. In t his case for example, case `*' is equal to \texttt{TokenType.STAR}, and so on. As mentioned, some tokens are multiple characters long and so we have further cases to compensate for these. They usually consist of double equals ==, or comparables such as $<$= and $>$=. +In the image above we have identified a means for using switch statements to recognise strings and turning them into fixed tokens. This is also why we use \texttt{TokenType.java}, which uses a fixed number of enums so the program knows exactly what they have read. In this case for example, case `*' is equal to \texttt{TokenType.STAR}, and so on. As mentioned, some tokens are multiple characters long and so we have further cases to compensate for these. They usually consist of double equals ==, or comparables such as $<$= and $>$=. \section{Parser.java} -This class is the primary code of our program. It contains everything we need to start recognising the code given and actually \emph{do} something with it. It takes a list of tokens from our analyser, and goes through our list of defined grammar, attempting to work out what each token translates to, essentially breaking down each line to find out what it is. Is it an if statement? Is it a print? Is it an expression or Math equation? We have lots of conditions is this class and each one is tested until one of them is met, denoting the result. Each possible grammar rules must match one of the conditions in the parser, otherwise it does not make sense and the code may fail to compile. We have supported this via exceptions and default case error messages. \newline +This class is the primary code of our program. It contains everything we need to start recognising the code given and actually \emph{do} something with it. It takes a list of tokens from our analyser, and goes through our list of defined grammar, attempting to work out what each token translates to, essentially breaking down each line to find out what it is. We have lots of conditions is this class and each one is tested until one of them is met, denoting the result. Each possible grammar rules must match one of the conditions in the parser, otherwise it does not make sense and the code may fail to compile. We have supported this via exceptions and default case error messages. \newline Conditions could include ``have they written \texttt{int} before this keyword'' or ``Is this a single colon or a double colon?'' or ``Is there a string after said colons?'' If all of these conditions are met, we can identify that a variable can be declared, for example. @@ -301,23 +292,16 @@ Conditions could include ``have they written \texttt{int} before this keyword'' \end{center} \end{figure} -The above code represents our decision making when it comes to identifying statements. We have two primary algorithms in this class; \texttt{matchAndAdvance()} and \texttt{matchOrError()}. - -\subsection{matchAndAdvance()} -We have a function which tries to match the grammar of each token, and executes code depending on whether all the conditions were met. \texttt{matchAndAdvance()} takes a TokenType variable and checks if it matches what was expected. If so, it advances and keeps iterating until the end of the file is met. If not, it returns an error since something unexpected occurred. An example of where this function is used is in our Statement methods, where we check whether the following code is a statement or not. This is below, in Figure 5.3. - - -\subsection{matchOrError()} -Our \texttt{matchOrError()} function does very similar to our previously mentioned \texttt{matchAndAdvance()}, with the exception of if something unexpected happens during the program's analysis, it will throw an error and display on the UI of our program, giving us feedback on what is causing the problem in our code. +The above code represents our decision making when it comes to identifying statements. \section{Expression.java \& Statement.java} -Programs are generally made up of expressions and statements. This is why we made abstract classes Expression.java and Statement.java. In our Expression class, we create a binary expression, which is when left and right objects are created, separated by an operator. This for example could be \texttt{5 * 4 + 3 * 6}. Here, our left and right operands are 5 * 4 and 3 * 6, separated by a PLUS token type. This can further be broken down by the numbers separated by the multiplication sign. Our Statements are lists of blocks which have a certain type, whether they are Prints or conditionals, or even iterables such as Whiles. An expression is also a statement, so we also accounted for having an ExpressionStatement. Ultimately, we have many functions in these abstracts which return certain types for each statement or expression, and this can be used for identification when our Parser eventually reads input. +Programs are generally made up of expressions and statements. This is why we made abstract classes Expression.java and Statement.java. Different types of expressions and statements then inherit from these. An example of an expression class is a binary expression, made up of a left and right section, and a middle operator. Our Statements are lists of blocks which have a certain type, whether they are Prints or conditionals, or even iterables such as Whiles. An expression is also a statement, so we also accounted for having an ExpressionStatement. These classes can be used for identification when our translator writes to C. \section{Translator.java \& ExecuteC.java} -Our translator file takes our list of statements previously analysed by our chain process, going through out lexer and parser, and one by one is converted into C code which can then be executed as it would normally to provide an output. We store these staments in an ArrayList and simply iterate over them using functions such as compileToC(), where for each statement, we evaluate it and add it to a text file which is eventually printed to the UI. +Our translator file takes our list of statements previously analysed by our chain process, going through out lexer and parser, and one by one is converted into C code which can then be executed as it would normally to provide an output. We store these statements in an ArrayList and simply iterate over them using functions such as compileToC(), where for each statement, we evaluate it and add it to a text file which is eventually printed to the UI. \section{Agile \& Project Management} -We worked well as a team. There were little issues. We decided to use the Agile method since it would lead to not only increased team communication, but better organisation when it came to our code. It also meant our development was flexible; we could choose what we wanted to work on at a given time with the use of sprints. \newline +We decided to use the Agile method since it would lead to not only increased team communication, but better organisation when it came to our code. It also meant our development was flexible; we could choose what we wanted to work on at a given time with the use of sprints. \newline During our allocated project time, we did four major sprints. Our initial priority was creating the main framework for the compiler of our esoteric language. Then we handled variables and print statements. Then implemented conditionals and loops. Finally, we tackled arrays and functions. Once we tied up these loose ends, we could actually make a full Sieve of Eratosthenes algorithm to show the capability of the program. \\ @@ -349,218 +333,6 @@ During our allocated project time, we did four major sprints. Our initial priori \end{itemize} -\chapter{Implementation}\label{Impl} - -In this chapter you cover the actual implementation of your project. Implementation will be done in sprints so you may wish to use different sub-sections for each sprint and then dedicate a section to your final deliverable. Section \ref{Figures} with figures should not remain in the final report. - -The long string following the sprint number is the git commit at the point in the git repository at which the tests were conducted. - -\section{Early sprints} -\subsection{Sprint 1 - \texttt{cb29252f1e0d29d555fb232f39d343930fc76105}} -The first sprint focused on developing the initial stages of the compiler. The main goal of the first sprint was to create and define the grammar for our custom esoteric language, then to create a working program to handle simple expressions through first applying lexical analysis, then using a parser. The first section of code implemented was the framework for the compiler, handing the users input and linking each section together. We decided to give the user the choice of either running the compiler with a single line of code, or running from a file containing source code, with the path given as an argument. Once completed, work on the lexical scanner could begin. The bulk of the scanner class is a function with takes the source code string as an argument, and returns an arraylist of tokens. This was done by looping through each character and assigning the correct token using a switch statement.The parser then takes this list of tokens and forms an abstract syntax tree, where each type of expression in our grammar is represented as its own class, extending a base expression class. - -For example: - -\begin{verbatim} -Code: 3-1+2 -\end{verbatim} -Produces the output: -\begin{verbatim} -NUMBER 3 3.0 -MINUS - null -NUMBER 1 1.0 -PLUS + null -NUMBER 2 2.0 -\end{verbatim} -A more complex example: -\begin{verbatim} -Code: (36/2 + 45.2) * 3 -\end{verbatim} -Produces the result: -\begin{verbatim} -LEFT_PAREN ( null -NUMBER 36 36.0 -SLASH / null -NUMBER 2 2.0 -PLUS + null -NUMBER 45.2 45.2 -RIGHT_PAREN ) null -STAR * null -NUMBER 3 3.0 -\end{verbatim} - -\subsection{Sprint 2 - \texttt{69b0ad07bac30beca1397ff187468e7597203c44}} -The goal of sprint 2 was to introduce statements to the language, starting with the simple print statement. The secondary goal was to add variable definition and assignment. Firstly text scanning was added to the lexer by first matching a alphabet character, then looking ahead until the last character. The extracted text was compared to a list of keywords, and either assigned a matching keyword, or set as an identifier. The parser was also improved, changing the return result to a list of statements. In order to more accurately check for undefined variables, an environment class was implemented, which stores any defined variables. The environment is then checked when a variable is used,which allows for more informative error messages. - -For example, running \texttt{java Interpreter.Language example2.txt} where \texttt{example2.txt} is the following: - -\begin{verbatim} -var :: a -a=5 -a=a+1 -print a -\end{verbatim} -Produces the result: -\begin{verbatim} -6.0 -\end{verbatim} -Testing involved making sure the program worked with the rules of left-associativity; for example, the program: -\begin{verbatim} -var :: a -a=3-1+2 -print a -\end{verbatim} -Produces the result \texttt{4.0}. -We also tested BIDMAS rules by using complex expressions: -\begin{verbatim} -var :: a -a=(36/2 + 45.2) * 3 -print a -\end{verbatim} -Returns \texttt{189.60000000000002}, which is correct considering the inaccuracy of floating points in computers. Finally, a test of a more complex program: -\begin{verbatim} -var :: a -a=5 -a=a+1 -print a - -a=7 -a=a*2 -print a - -var :: b -b = 10 -print a+b -\end{verbatim} -Produces the result: -\begin{verbatim} -6.0 -14.0 -24.0 -\end{verbatim} -You can test the error handling in several ways. For example the program -\begin{verbatim} -a=3 -print a -\end{verbatim} -Uses an undefined variable \texttt{a}. The output of running this is: -\begin{verbatim} -An error was encountered -Variable undefined -\end{verbatim} -You can also try and run a program with a syntax error, for example: -\begin{verbatim} -var :: a -a=3+ -print a -\end{verbatim} -Produces: -\begin{verbatim} -An error was encountered -Expected Expression -\end{verbatim} -This is how it reacts to using an undefined token: -\begin{verbatim} -var :: a -a=3 -pprint a -\end{verbatim} -Produces: -\begin{verbatim} -An error was encountered -Undefined Variable -\end{verbatim} - -\subsection{Sprint 3 - \texttt{a12094123dcacee41a7472031db6fe6027b083a7}} - -This version added full compilation to a binary by translating to a C program and using gcc to compile it straight away. It also added strings as lists of characters, and simple if statements. For example running \texttt{java Compiler.Language example.txt} where \texttt{example.txt} is the program: -\begin{verbatim} -character (len=10)::hello -hello="hello" -if 5==5 then -hello="goodbye " -endif -print *,hello,6," test" endprint -\end{verbatim} -Produces, in the \texttt{build/} folder, \texttt{example.c}: -\begin{verbatim} -#include -#include -int main(){ -char hello[11]; -strcpy(hello,"hello"); -if(5==5){ -strcpy(hello,"goodbye "); -} -printf("%s%d%s",hello,6," test"); -} -\end{verbatim} -It also compiles this file automatically, checking for errors during the process, and produces the executable binary \texttt{example.exe} in windows, or \texttt{./example} if run in a distribution of GNU+Linux. It then runs the binary straight away, checking for runtime errors. The output of running this binary is: -\begin{verbatim} -goodbye 6 test -\end{verbatim} -Here's an example of a program which produces a runtime error: -\begin{verbatim} -character (len=3)::hello -hello="hello" -if 5==5 then -hello="goodbye " -endif -print *,hello,6," test" endprint -\end{verbatim} -It produces a runtime error since the character string is too short. When it's run, it says: -\begin{verbatim} -An error was encountered -Runtime Error -\end{verbatim} -This version also added better error messages. For example \texttt{hello="hello} produces: -\begin{verbatim} -An error was encountered -Strings must end with " -\end{verbatim} -Moreover, the program -\begin{verbatim} -character (len=10)::hello -hello="hello" -if 5==5 then -hello="goodbye " -print *,hello,6," test" endprint -\end{verbatim} -Produces the error message: -\begin{verbatim} -An error was encountered -endif missing -\end{verbatim} -Finally, lets test if statements with another program: -\begin{verbatim} -character (len=10)::hello -hello="hello" -if 4==5 then -hello="goodbye " -endif -print *,hello,6," world" endprint -\end{verbatim} -Produces the result: -\begin{verbatim} -include -#include -int main(){ -char hello[11]; -strcpy(hello,"hello"); -if(4==5){ -strcpy(hello,"goodbye "); -} -printf("%s%d%s",hello,6," world"); -} - -hello6 world -\end{verbatim} - -\subsection{Sprint n} - - -\section{Final implementation} - \chapter{Testing} The testing was done in two main stages, first unit tests after each sprint were done in order to check correctness of the additions made, then large scale tests in order to check the compiler works as expected, is Turing complete and can handle any inputs. diff --git a/report/report.pdf b/report/report.pdf deleted file mode 100644 index 33a6960..0000000 Binary files a/report/report.pdf and /dev/null differ diff --git a/report/report.tex b/report/report.tex deleted file mode 100644 index c35bd0e..0000000 --- a/report/report.tex +++ /dev/null @@ -1,39 +0,0 @@ -\documentclass[12pt]{scrartcl} -\title{Advanced Programming Esoteric Project} -\author{Alfie Eagleton 100274245\\Eden Attenborough 100301654\\Aiden Rushbrooke 100235483\\Chris Sutcliffe 100228212} - -\begin{document} - -\maketitle -\clearpage - -\section{Introduction \& Objectives} -For this project we have decided to create an Esolang interpreter in Java so that we can write code, craft Boolean expressions and calculate Math. We plan to research, design and implement an esoteric language while also understanding fundamental advanced programming concepts. - - -\section{Functional Requirements} -After some discussion, we decided that our goals for this project are to create grammar for a language and form a lexical analysis algorithm. After this, we may develop a parser or interpreter so that we can convert our high level code into bytecode which can be read more easily by the JVM. - -\section{Non-functional Requirements} -Our code should be well tested, and be capable of handling edge cases. It should be modular and scalable so that functionality can easily be added without undue difficulty. It would be good if our interpreter could calculate mathematical functions and have good performance with well optimized code. - -\section{Technology} -We are looking at coding Java in IntelliJ, and potentially using GNU Bison to parse. - -\section{MoSCoW Analysis} -We thought about what our interpreter may or may not have so that we have a clear understanding of our priorities when coding. We decided that our software \emph{must} read and interpret commands immediately from a command line. The language \emph{must} be Turing complete, and be capable of detecting syntax and runtime errors. It \emph{should} be able to read from file and execute a script. It also \emph{should} be able to optimize source code during runtime. The interpreter \emph{could} be able to implement functions and procedures with return values, and be able to handle memory allocation and disposal. Our program \emph{won't have} a graphical or web-based interface, since it would take up too much development time and is not entirely needed for the scope of this project. It also will not implement object-oriented programming features. - -\section{Similar systems and Research} -\subsection{Compilers vs Interpreters} -Within the field of computer programming languages, there are two main types: compiled languages and interpreted languages. Both of these styles have distinct advantages and disadvantages. Compilers and interpreters both convert code written in human readable higher-level languages to instructions that a computer can execute. In the case of compiler, code is converted from high level program code to assembly language then to machine language. This process can be slow, especially when there is a large amount of code to compile, but execution is fast as only the instructions directly executed are required to be kept in memory. In contrast to this, Interpreter directly runs the code given to it, or in other words, an interpreter is a separate program that reads and executes the instructions itself, rather than converting them into the assembly language of that particular architecture being run on. The principal advantage of this is that a computationally expensive compilation process does not have to be run every time, therefore leading to faster startup times. However, this approach carries with it a big disadvantage, namely that because every line has to be interpreted, execution is slower than the equivalent instructions run in a compiled language. - -\subsection{Programming Languages} -There are a number of prominent interpreted languages in use today, including Python, Matlab, Perl, Ruby and PHP. All of these languages have interpreters that are written in C or C++, which are compiled languages. The interpreters for each language are loaded into memory, along with an instruction that they then interpret. - -\subsection{Esolangs} -Esolangs are programming languages designed to be jokes or proof of concept languages, rather than those to be used in actual programming tasks. Typically, the reason they are created is for fun rather than any serious desire to solve a particular programming problem, and so most languages aim to be funny, to be as difficult to use as possible or to have few features while still being Turing complete. - -\subsection{Example - Shakespearian Programming Language} -The Shakespeare Programming Language is esoteric code parodied on extracts from Romeo and Juliet. Here, code is written so it reads like how a script would, such as \texttt{[Enter Juliet]} where \texttt{Juliet} may be a variable name. Other examples include using \texttt{Acts} or \texttt{Scenes} as GOTO statements, where the interpreter can jump to a certain point in the code and read from there. Otherwise, these are ignored similar to the title or character descriptions. Other examples include LOLCODE, Hodor or White Space. This serves the purpose of writing code in a different manner to usual, sometimes for humour. The aim is to replace typical language features so that it can still be read by a compiler or interpreter but also look and read very differently by the user. - -\end{document} \ No newline at end of file -- cgit v1.2.3