Written by Luka Kerr on May 4, 2019

## Regular Expressions and Finite Automata

### Thompson’s construction

• Syntax driven
• Inductive
• Use $S$ to denote the starting state and $A$ to denote the final accepting state

Inductive Base:

For $\epsilon$, construct the NFA

For $a \in \Sigma$, construct the NFA

Inductive Step: suppose $N(r)$ and $N(s)$ are NFAs for regular expressions $r$ and $s$. Then:

$r | s$:

$rs$:

$r^*$:

$(r)$: $N((r)) \equiv N(r)$

### Subset Construction

• Used to convert a NFA to DFA
• At most $2^n$ DFA states where $n$ is the total number of NFA states

Operations used:

Operation Description
$\epsilon$-closure($s$) Set of NFA states readable from NFA state $s$ on $\epsilon$-transitions
$\epsilon$-closure($T$) Set of NFA states readable from some state $s$ in $T$ on $\epsilon$-transitions
move($T$, $a$) Set of NFA states to which there is a transition on input $a$ from some state $s$ in $T$

where

• $s$ is a NFA state
• $T$ is a set of NFA states

#### Algorithm

let $s_0$ be the start state of the NFA;

DFAstates contains the only unmarked state $\epsilon$-closure($s_0$);

while there is an unmarked state $T$ in DFAstates do

$\quad$ mark $T$

$\quad$ for each input symbol $a$ do

$\qquad$ $U := \epsilon$-closure(move($T$, $a$));

$\qquad$ if $U$ is not in DFAstates then

$\qquad \quad$ add $U$ as an unmarked state in DFAstates;

$\qquad$ DFATrans[$T$, $a$] $:= U$;

$\quad$ done

done

### DFA Minimisation

let $\prod$ be the partition with the two groups:

1. one is the set of all final states

2. the other is the set of all non-final states

let $\prod_{new} = \prod$;

for each group $G$ in $\prod_{new}$ do

$\quad$ partition $G$ into subgroups such that two states $s$ and $t$

$\qquad$ are in the same subgroup iff for all input symbols

$\qquad$ $a$, states $s$ and $t$ have transitions on $a$ to

$\qquad$ states in the same group of $\prod_{new}$;

$\quad$ replace $G$ in $\prod_{new}$ by the set of subgroups formed;

done

### Chomsky’s Hierarchy

Grammar Known As Definition Machine
Type 0 Unrestricted grammar $\alpha \ne \epsilon$ Turing
Type 1 Context-sensitive grammar (CSG) $|\alpha| \le |\beta|$ Linear bounded
Type 2 Context-free grammar (CFG) $A \to \alpha$ Stack automation
Type 3 Regular grammar $A \to w \mid B w$ Finite state automation

## Context Free Grammars

### BNF

BNF is a notation for writing CFG’s. Each production $(A, \alpha)$ is written as $A \to \alpha$, where the arrow $\to$ means “is defined to be” or “derives”.

A vertical bar $|$ reads “or else”.

### EBNF

EBNF is an extended version of BNF, including regular expressions.

• $( \alpha )^*$ means that $\alpha$ can be repeated zero or more times
• $( \alpha )^+$ means that $\alpha$ can be repeated one or more times
• $( \alpha )?$ means that $\alpha$ is optional

### Derivations

A grammar derives sentences by:

1. Beginning with the start symbol
2. Repeatedly replacing a nonterminal by the right-hand side of a production with that nonterminal on the left-hand side, until there are no more nonterminals to replace

Such a sequence of replacements is called a derivation of the sentence being analysed.

At each step in a derivation, 2 choices are made:

1. Which nonterminal to replace?
2. Which alternative to use for that nonterminal?

Let $G :=$

$\begin{array}{l} S \to ABC \\ A \to aA \mid \epsilon \\ B \to b B \mid \epsilon \\ C \to cC \mid \epsilon \end{array}$

#### Leftmost Derivation

Always replace the leftmost nonterminal.

Example

Deriving $abbc$ in the grammar $G$ using a leftmost derivation gives:

$S \Rightarrow ABC \Rightarrow aABC \Rightarrow aBC \Rightarrow abBC \Rightarrow abbC \Rightarrow abbcC \Rightarrow abbc$

#### Rightmost Derivation

Always replace the rightmost nonterminal.

Example

Deriving $abbc$ in the grammar $G$ using a rightmost derivation gives:

$S \Rightarrow ABC \Rightarrow ABcC \Rightarrow ABc \Rightarrow AbBc \Rightarrow AbbBc \Rightarrow Abbc \Rightarrow aAbbc \Rightarrow abbc$

### Parse Trees

In a parse tree:

• The start symbol is always at the root of the tree
• Nonterminals are always interior nodes
• Terminals are always leaves in the tree

### Recursive Grammar

Left recursive productions: $A \to A \alpha$

Right recursive productions: $A \to \alpha A$

#### Eliminating Left Recursion

##### Grammar Rewriting

Left recursive grammar

$A \to A \alpha \mid \beta$

Non-left recursive grammar

$A \to \beta A^\prime \ A^\prime \to \alpha A^\prime \mid \epsilon$

##### Regular Operators

Left recursive grammar

$A \to \alpha \ A \to A \beta_1 \mid A \beta_2$

Non-left recursive grammar

$A \to \alpha(\beta_1 \mid \beta_2)^*$

## Top Down Parsing

Let $G :=$

$\begin{array}{l} E \to T Q \\ Q \to +T Q \mid -T Q \mid \epsilon \\ T \to F R \\ R \to * F R \mid / F R \mid \epsilon \\ F \to \text{Int} \mid (E) \end{array}$

### First Sets

First($A$) is is the set of terminals which can appear as the first element of any chain of rules matching nonterminal $A$.

Using the grammar $G$

• First($TQ$) = { Int, ( }

• First($*R$) = { * }

Follow($A$) is the set of terminals that appear immediately to the right of the non-terminal $A$. That is Follow($A$) = $\{ a : S \overset{*}{\implies} \dots A a \dots \}$

By convention, assume every input is terminated by a $sign. Follow sets: • Are only constructed for non-terminals • Do not contain$\epsilon$Rules: • If$A$is the start symbol of the grammar, then Follow($A$) includes$
• If $A \to pBq$ is a production:
• Follow($B$) includes First($q$) - $\epsilon$
• If First($q$) contains $\epsilon$, then Follow($B$) includes Follow($A$)
• If $A \to pB$ is a production, then Follow($B$) includes Follow($A$)

Using the grammar $G$

• Follow($E$) = {), $} • Follow($Q$) = {),$}

• Follow($T$) = {+, -} + Follow($Q$) = {+, -, ), $} • Follow($R$) = Follow($T$) = {+, -, ),$}

### $LL(1)$ Grammar

A grammar is $LL(1)$ if for every nonterminal of the form $A \to \alpha_1 \mid \dots \mid \alpha_n$, the select sets are pairwise disjoint, i.e. Select($A \to \alpha_i$) $\cap$ Select($A \to \alpha_j$) = $\emptyset$ for all $i$ and $j$ such that $i \ne j$.

• A grammar with left recursion is not $LL(1)$
• Examples of left recursion are:
• $A \to A \alpha$
• $A \to B \alpha \ B \to A \beta$
• A grammar with common prefixes is not $LL(1)$
• An example of a common prefix is:
• $A \to \textbf{If } \alpha \ \beta \textbf{ fi} \ A \to \textbf{If } “(“ \ \alpha \ “)” \ \beta \textbf{ fi}$
• $LL(1)$ grammar can be parsed deterministically using 1 token of lookahead

### $LL(1)$ Parsing Table

for every production $A \to \alpha$ in the grammar do

$\quad$ forall $a$ in Select($A \to \alpha$), set $Table[A, a] = \alpha$

For example, using the grammar $G$, the $LL(1)$ parsing table would look like:

Int $+$ $-$ $*$ $/$ $($ $)$ ETQTQQ+TQ-TQ\epsilon\epsilonTFRFRR\epsilon\epsilon*FR/FR\epsilon\epsilonF$Int$(E)$If every entry in the table contains no more than 1 production, the grammar is$LL(1)$. ###$LL(1)$Table Driven Parser push$ onto the stack;

push the start symbol onto the stack;

while stack not empty do

$\quad$ let $X$ be the top stack symbol and $a$ be the lookahead symbol in the input;

$\quad$ if $X$ is a terminal then

$\qquad$ if $X = a$ then

$\quad \qquad$ pop $X$ and get the next token;

$\qquad$ else

$\quad \qquad$ error;

$\qquad$ fi

$\quad$ else

$\qquad$ if $Table[X, a]$ non-empty then

$\quad \qquad$ pop $X$;

$\quad \qquad$ push $Table[X, a]$ onto stack in the reverse order;

$\qquad$ else

$\quad \qquad$ error;

$\qquad$ fi

$\quad$ fi

done

For example, using the grammar $G$, and the $LL(1)$ parsing table above, a table driven parse of $i + i$ would look like:

Stack Input Production Derivation
$$E i + i$$ $E \to TQ$ $E \implies_{lm} TQ$
$$QT i + i$$ $T \to FR$ $\implies_{lm} FRQ$
$$QRF i + i$$ $F \to i$ $\implies_{lm}iRQ$
$$QRi i + i$$ pop and goto next token
$$QR + i$$ $R \to \epsilon$ $\implies_{lm} iQ$
$$Q +i$$ $Q \to + TQ$ $\implies_{lm} i + TQ$
$$QT+ +i$$ pop and goto next token
$$QT i$$ $T \to FR$ $\implies_{lm} i + FRQ$
$$QRF i$$ $F \to i$ $\implies_{lm} i + iRQ$
$$QRi i$$ pop and goto next token
$$QR  R \to \epsilon \implies{lm} i + iRQ$$Q $Q \to \epsilon$ $\implies{lm} i + iRQ$
   $\implies_{lm} i + iQ$

## Attribute Grammars

An attribute grammar is a triple $A = (G, V, F)$ where

• $G$ is a CFG
• $V$ is a finite set of distinct attributes
• $F$ is a finite set of semantic rules about the attributes

Each attribute contains a name and a type, and can represent anything we choose.

Let $G :=$

$\begin{array}{l} S \to E \\ E \to E \mathbin{/} E \\ E \to \textbf{num} \\ E \to \textbf{num} \cdot \textbf{num} \end{array}$

An example attribute grammar for $G$ is as follows:

Production Semantic Rules
$S \to E$ $E$.type if $E$.isFloat then float else int
$E \to E_1 \ / \ E_2$ $E$.isFloat = $E_1$.isFloat or $E_2$.isFloat
$E_1$.type = $E$.type
$E_2$.type = $E$.type
$E \to \textbf{num}$ $E$.isFloat = false
$E \to \textbf{num} \cdot \textbf{num}$ $E$.isFloat = true

### L-attributed and S-attributed

#### L-attributed

An attribute grammar is L-attributed if each inherited attribute of $X_i, 1 \le i \le n$ on the right hand side of $X_0 \to X_1 X_2 \cdots X_m$ depends only on:

• The attributes of the symbols $X_1, X_2, \cdots , X_{i - 1}$ to the left of $X_i$ in the production
• The inherited attributes of $X_0$

#### S-attributed

An attribute grammar is S-attributed if it uses synthesised attributes only.

Every S-attributed grammar is L-attributed.

### Synthesised and Inherited Attributes

Let $X_0 \to X_1 X_2 \cdots X_n$ be a production, and $A(X)$ be the set of attributes associated with a grammar symbol $X$.

#### Synthesised

A synthesised attribute $syn$ of $X_0$ is computed by:

$X_0.syn = f(A(X_1), A(X_2), \dots , A(X_n))$

The attribute $syn$ on a tree node depends on those of its children.

#### Inherited

An inherited attribute $inh$ of $X_i$, where $1 \le i \le n$, is computed by:

$X_i.inh = g(A(X_0), A(X_1), \dots , A(X_n))$

The attribute $inh$ on a tree node depends on those on its parent and/or siblings. $inh$ can also depend on other attributed in $A(X_i)$.

## Static Semantics

A semantic analyser enforces a language’s semantic constraints, specifically:

• Scope rules
• Type rules

### Scope

The scope of a declaration is the portion of the program over which the declaration takes effect.

A declaration is in scope at a particular point in a program if and only if the declaration’s scope includes that point.

An example of scope:

int k;           // k begin scope

void foo() {
int i;         // i begin scope
int j;         // j begin scope
i = 1;
j = 7;
print(i);      // -> 1
print(j);      // -> 7
{              // new block
int i;       // new block scoped i begin scope
i = 2;
print(i);    // -> 2
print(j);    // -> 7
}
print(i);      // -> 1
print(j);      // -> 7
}


#### Scope Levels

• Declarations in the outermost block are in level 1
• Increment the scope level by 1 every time after moving from an enclosing to an enclosed block

### Type Checking

The process of applying a language’s type rules to infer the type of each construct and comparing that type with the expected type.

## Code Generation

### Jasmin

#### Example

Java program:

class Test {
static int add(int a, int b) {
return a + b;
}

public static void main(String argv[]) {
}
}


Jasmin output:

.source Test.java
.class Test
.super java/lang/Object

.limit stack 2
.limit locals 2
.var 0 is a I from Label0 to Label1
.var 1 is b I from Label0 to Label1

Label0:
Label1:
ireturn
.end method

.method public static main([Ljava/lang/String;)V
.limit stack 2
.limit locals 1
.var 0 is argv [Ljava/lang/String; from Label0 to Label1

Label0:
iconst_1
iconst_2
pop
Label1:
return
.end method


### Code Templates

#### Arithmetic Expression $E_1 i + E_2$

\begin{aligned} \left[\left[E_1 i + E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \left[\left[E_2\right]\right] \\ & \text{emit("iadd")} \end{aligned}

#### Boolean Expression $E_1 \ \&\& \ E_2$

\begin{aligned} \left[\left[E_1 \ \&\& \ E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \text{ifeq L1} \\ & \left[\left[E_2\right]\right] \\ & \text{ifeq L1} \\ & \text{iconst\_1} \\ & \text{goto L2} \\ \text{L1:} \\ & \text{iconst\_0} \\ \text{L2:} \end{aligned}

#### Relational Expression $E_1 i > E_2$

\begin{aligned} \left[\left[E_1 i > E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \left[\left[E_2\right]\right] \\ & \text{if\_icmpgt L1} \\ & \text{iconst\_0} \\ & \text{goto L2} \\ \text{L1:} \\ & \text{iconst\_1} \\ \text{L2:} \end{aligned}

#### Relational Expression $E_1 f > E_2$

\begin{aligned} \left[\left[E_1 i > E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \left[\left[E_2\right]\right] \\ & \text{fcmpg} \\ & \text{ifgt L1} \\ & \text{iconst\_0} \\ & \text{goto L2} \\ \text{L1:} \\ & \text{iconst\_1} \\ \text{L2:} \end{aligned}

#### Assignment Expression $a = E$

\begin{aligned} \left[\left[a = E\right]\right]: \\ & \left[\left[E\right]\right] \\ & \text{istore\_1} \\ \end{aligned}

#### Assignment Expression $LHS = RHS$

\begin{aligned} \left[\left[LHS = RHS\right]\right]: \\ & \left[\left[LHS\right]\right] \\ & \left[\left[RHS\right]\right] \\ & \textit{appropriate store instruction} \\ \end{aligned}

#### If Expression if $(E)$ then $S1$ else $S2$

\begin{aligned} \text{if} \left[\left[E\right]\right] \text{then} \left[\left[S1\right]\right] \text{else} \left[\left[S2\right]\right]: \\ & \text{ifeq L1} \\ & \left[\left[S1\right]\right] \\ & \text{goto L2} \\ \text{L1:} \\ & \left[\left[S2\right]\right] \\ \text{L2:} \end{aligned}

#### While Expression while $(E)$ $S$

\begin{aligned} \text{while} \left[\left[E\right]\right] \left[\left[S\right]\right]: \\ \text{L1:} \\ & \left[\left[E\right]\right]: \\ & \text{ifeq L2} \\ & \left[\left[S\right]\right] \\ & \text{goto L1} \\ \text{L2:} \\ & \textit{pop continue label L1 from continue stack} \\ & \textit{pop break label L2 from break stack} \end{aligned}

#### Return Expression return $E$

\begin{aligned} \text{return} \left[\left[E\right]\right]: \\ & \left[\left[E\right]\right] \\ & \textit{appropriate return instruction} \\ \end{aligned}

#### Expression Statement $E$

\begin{aligned} \left[\left[E\right]\right]: \\ & \left[\left[E\right]\right] \\ & \text{pop} \\ \end{aligned}