Written by Luka Kerr on May 4, 2019

Regular Expressions and Finite Automata

Thompson’s construction

Inductive Base:

For $\epsilon$, construct the NFA

nfa1

For $a \in \Sigma$, construct the NFA

nfa2

Inductive Step: suppose $N(r)$ and $N(s)$ are NFAs for regular expressions $r$ and $s$. Then:

$r | s$:

nfa3

$rs$:

nfa4

$r^*$:

nfa5

$(r)$: $N((r)) \equiv N(r)$

Subset Construction

Operations used:

Operation Description
$\epsilon$-closure($s$) Set of NFA states readable from NFA state $s$ on $\epsilon$-transitions
$\epsilon$-closure($T$) Set of NFA states readable from some state $s$ in $T$ on $\epsilon$-transitions
move($T$, $a$) Set of NFA states to which there is a transition on input $a$ from some state $s$ in $T$

where

Algorithm

let $s_0$ be the start state of the NFA;

DFAstates contains the only unmarked state $\epsilon$-closure($s_0$);

while there is an unmarked state $T$ in DFAstates do

$\quad$ mark $T$

$\quad$ for each input symbol $a$ do

$\qquad$ $U := \epsilon$-closure(move($T$, $a$));

$\qquad$ if $U$ is not in DFAstates then

$\qquad \quad$ add $U$ as an unmarked state in DFAstates;

$\qquad$ DFATrans[$T$, $a$] $:= U$;

$\quad$ done

done

DFA Minimisation

let $\prod$ be the partition with the two groups:

  1. one is the set of all final states

  2. the other is the set of all non-final states

let $\prod_{new} = \prod$;

for each group $G$ in $\prod_{new}$ do

$\quad$ partition $G$ into subgroups such that two states $s$ and $t$

$\qquad$ are in the same subgroup iff for all input symbols

$\qquad$ $a$, states $s$ and $t$ have transitions on $a$ to

$\qquad$ states in the same group of $\prod_{new}$;

$\quad$ replace $G$ in $\prod_{new}$ by the set of subgroups formed;

done

Chomsky’s Hierarchy

Grammar Known As Definition Machine
Type 0 Unrestricted grammar $\alpha \ne \epsilon$ Turing
Type 1 Context-sensitive grammar (CSG) $|\alpha| \le |\beta|$ Linear bounded
Type 2 Context-free grammar (CFG) $A \to \alpha$ Stack automation
Type 3 Regular grammar $A \to w \mid B w$ Finite state automation

Context Free Grammars

BNF

BNF is a notation for writing CFG’s. Each production $(A, \alpha)$ is written as $A \to \alpha$, where the arrow $\to$ means “is defined to be” or “derives”.

A vertical bar $|$ reads “or else”.

EBNF

EBNF is an extended version of BNF, including regular expressions.

Derivations

A grammar derives sentences by:

  1. Beginning with the start symbol
  2. Repeatedly replacing a nonterminal by the right-hand side of a production with that nonterminal on the left-hand side, until there are no more nonterminals to replace

Such a sequence of replacements is called a derivation of the sentence being analysed.

At each step in a derivation, 2 choices are made:

  1. Which nonterminal to replace?
  2. Which alternative to use for that nonterminal?

Let $G :=$

\[\begin{array}{l} S \to ABC \\ A \to aA \mid \epsilon \\ B \to b B \mid \epsilon \\ C \to cC \mid \epsilon \end{array}\]

Leftmost Derivation

Always replace the leftmost nonterminal.

Example

Deriving $abbc$ in the grammar $G$ using a leftmost derivation gives:

\[S \Rightarrow ABC \Rightarrow aABC \Rightarrow aBC \Rightarrow abBC \Rightarrow abbC \Rightarrow abbcC \Rightarrow abbc\]

Rightmost Derivation

Always replace the rightmost nonterminal.

Example

Deriving $abbc$ in the grammar $G$ using a rightmost derivation gives:

\[S \Rightarrow ABC \Rightarrow ABcC \Rightarrow ABc \Rightarrow AbBc \Rightarrow AbbBc \Rightarrow Abbc \Rightarrow aAbbc \Rightarrow abbc\]

Parse Trees

In a parse tree:

Recursive Grammar

Left recursive productions: $A \to A \alpha$

Right recursive productions: $A \to \alpha A$

Eliminating Left Recursion

Grammar Rewriting

Left recursive grammar

$A \to A \alpha \mid \beta$

Non-left recursive grammar

$A \to \beta A^\prime \ A^\prime \to \alpha A^\prime \mid \epsilon$

Regular Operators

Left recursive grammar

$A \to \alpha \ A \to A \beta_1 \mid A \beta_2$

Non-left recursive grammar

$A \to \alpha(\beta_1 \mid \beta_2)^*$

Top Down Parsing

Let $G :=$

\[\begin{array}{l} E \to T Q \\ Q \to +T Q \mid -T Q \mid \epsilon \\ T \to F R \\ R \to * F R \mid / F R \mid \epsilon \\ F \to \text{Int} \mid (E) \end{array}\]

First Sets

First($A$) is is the set of terminals which can appear as the first element of any chain of rules matching nonterminal $A$.

Using the grammar $G$

Follow Sets

Follow($A$) is the set of terminals that appear immediately to the right of the non-terminal $A$. That is Follow($A$) = $\{ a : S \overset{*}{\implies} \dots A a \dots \}$

By convention, assume every input is terminated by a $ sign.

Follow sets:

Rules:

Using the grammar $G$

Select Sets

Using the grammar $G$

$LL(1)$ Grammar

A grammar is $LL(1)$ if for every nonterminal of the form $A \to \alpha_1 \mid \dots \mid \alpha_n$, the select sets are pairwise disjoint, i.e. Select($A \to \alpha_i$) $\cap$ Select($A \to \alpha_j$) = $\emptyset$ for all $i$ and $j$ such that $i \ne j$.

$LL(1)$ Parsing Table

for every production $A \to \alpha$ in the grammar do

$\quad$ forall $a$ in Select($A \to \alpha$), set $Table[A, a] = \alpha$

For example, using the grammar $G$, the $LL(1)$ parsing table would look like:

  Int $+$ $-$ $*$ $/$ $($ $)$ $
$E$ $TQ$         $TQ$    
$Q$   $+TQ$ $-TQ$       $\epsilon$ $\epsilon$
$T$ $FR$         $FR$    
$R$   $\epsilon$ $\epsilon$ $*FR$ $/FR$   $\epsilon$ $\epsilon$
$F$ Int         $(E)$    

If every entry in the table contains no more than 1 production, the grammar is $LL(1)$.

$LL(1)$ Table Driven Parser

push $ onto the stack;

push the start symbol onto the stack;

while stack not empty do

$\quad$ let $X$ be the top stack symbol and $a$ be the lookahead symbol in the input;

$\quad$ if $X$ is a terminal then

$\qquad$ if $X = a$ then

$\quad \qquad$ pop $X$ and get the next token;

$\qquad$ else

$\quad \qquad$ error;

$\qquad$ fi

$\quad$ else

$\qquad$ if $Table[X, a]$ non-empty then

$\quad \qquad$ pop $X$;

$\quad \qquad$ push $Table[X, a]$ onto stack in the reverse order;

$\qquad$ else

$\quad \qquad$ error;

$\qquad$ fi

$\quad$ fi

done

For example, using the grammar $G$, and the $LL(1)$ parsing table above, a table driven parse of $i + i$ would look like:

Stack Input Production Derivation
$$E$ $i + i$$ $E \to TQ$ $E \implies_{lm} TQ$
$$QT$ $i + i$$ $T \to FR$ $\implies_{lm} FRQ$
$$QRF$ $i + i$$ $F \to i$ $\implies_{lm}iRQ$
$$QRi$ $i + i$$ pop and goto next token  
$$QR$ $+ i$$ $R \to \epsilon$ $\implies_{lm} iQ$
$$Q$ $+i$$ $Q \to + TQ$ $\implies_{lm} i + TQ$
$$QT+$ $+i$$ pop and goto next token  
$$QT$ $i$$ $T \to FR$ $\implies_{lm} i + FRQ$
$$QRF$ $i$$ $F \to i$ $\implies_{lm} i + iRQ$
$$QRi$ $i$$ pop and goto next token  
$$QR$ $ $R \to \epsilon$ $\implies{lm} i + iRQ$
$$Q$ $ $Q \to \epsilon$ $\implies{lm} i + iRQ$
$ $   $\implies_{lm} i + iQ$

Attribute Grammars

An attribute grammar is a triple $A = (G, V, F)$ where

Each attribute contains a name and a type, and can represent anything we choose.

Let $G :=$

\[\begin{array}{l} S \to E \\ E \to E \mathbin{/} E \\ E \to \textbf{num} \\ E \to \textbf{num} \cdot \textbf{num} \end{array}\]

An example attribute grammar for $G$ is as follows:

Production Semantic Rules
$S \to E$ $E$.type if $E$.isFloat then float else int
$E \to E_1 \ / \ E_2$ $E$.isFloat = $E_1$.isFloat or $E_2$.isFloat
  $E_1$.type = $E$.type
  $E_2$.type = $E$.type
$E \to \textbf{num}$ $E$.isFloat = false
$E \to \textbf{num} \cdot \textbf{num}$ $E$.isFloat = true

L-attributed and S-attributed

L-attributed

An attribute grammar is L-attributed if each inherited attribute of $X_i, 1 \le i \le n$ on the right hand side of $X_0 \to X_1 X_2 \cdots X_m$ depends only on:

S-attributed

An attribute grammar is S-attributed if it uses synthesised attributes only.

Every S-attributed grammar is L-attributed.

Synthesised and Inherited Attributes

Let $X_0 \to X_1 X_2 \cdots X_n$ be a production, and $A(X)$ be the set of attributes associated with a grammar symbol $X$.

Synthesised

A synthesised attribute $syn$ of $X_0$ is computed by:

\[X_0.syn = f(A(X_1), A(X_2), \dots , A(X_n))\]

The attribute $syn$ on a tree node depends on those of its children.

Inherited

An inherited attribute $inh$ of $X_i$, where $1 \le i \le n$, is computed by:

\[X_i.inh = g(A(X_0), A(X_1), \dots , A(X_n))\]

The attribute $inh$ on a tree node depends on those on its parent and/or siblings. $inh$ can also depend on other attributed in $A(X_i)$.

Static Semantics

A semantic analyser enforces a language’s semantic constraints, specifically:

Scope

The scope of a declaration is the portion of the program over which the declaration takes effect.

A declaration is in scope at a particular point in a program if and only if the declaration’s scope includes that point.

An example of scope:

int k;           // k begin scope

void foo() {
  int i;         // i begin scope
  int j;         // j begin scope
  i = 1;
  j = 7;
  print(i);      // -> 1
  print(j);      // -> 7
  {              // new block
    int i;       // new block scoped i begin scope
    i = 2;
    print(i);    // -> 2
    print(j);    // -> 7
  }
  print(i);      // -> 1
  print(j);      // -> 7
}

Scope Levels

Type Checking

The process of applying a language’s type rules to infer the type of each construct and comparing that type with the expected type.

Code Generation

Jasmin

Example

Java program:

class Test {
  static int add(int a, int b) {
    return a + b;
  }

  public static void main(String argv[]) {
    Test.add(1, 2);
  }
}

Jasmin output:

.source Test.java
.class Test
.super java/lang/Object

.method static add(II)I
.limit stack 2
.limit locals 2
.var 0 is a I from Label0 to Label1
.var 1 is b I from Label0 to Label1

Label0:
  iload_0
  iload_1
  iadd
Label1:
  ireturn
.end method

.method public static main([Ljava/lang/String;)V
.limit stack 2
.limit locals 1
.var 0 is argv [Ljava/lang/String; from Label0 to Label1

Label0:
  iconst_1
  iconst_2
  invokestatic Test/add(II)I
  pop
Label1:
  return
.end method

Code Templates

Arithmetic Expression $E_1 i + E_2$

\[\begin{aligned} \left[\left[E_1 i + E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \left[\left[E_2\right]\right] \\ & \text{emit("iadd")} \end{aligned}\]

Boolean Expression $E_1 \ \&\& \ E_2$

\[\begin{aligned} \left[\left[E_1 \ \&\& \ E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \text{ifeq L1} \\ & \left[\left[E_2\right]\right] \\ & \text{ifeq L1} \\ & \text{iconst\_1} \\ & \text{goto L2} \\ \text{L1:} \\ & \text{iconst\_0} \\ \text{L2:} \end{aligned}\]

Relational Expression $E_1 i > E_2$

\[\begin{aligned} \left[\left[E_1 i > E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \left[\left[E_2\right]\right] \\ & \text{if\_icmpgt L1} \\ & \text{iconst\_0} \\ & \text{goto L2} \\ \text{L1:} \\ & \text{iconst\_1} \\ \text{L2:} \end{aligned}\]

Relational Expression $E_1 f > E_2$

\[\begin{aligned} \left[\left[E_1 i > E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \left[\left[E_2\right]\right] \\ & \text{fcmpg} \\ & \text{ifgt L1} \\ & \text{iconst\_0} \\ & \text{goto L2} \\ \text{L1:} \\ & \text{iconst\_1} \\ \text{L2:} \end{aligned}\]

Assignment Expression $a = E$

\[\begin{aligned} \left[\left[a = E\right]\right]: \\ & \left[\left[E\right]\right] \\ & \text{istore\_1} \\ \end{aligned}\]

Assignment Expression $LHS = RHS$

\[\begin{aligned} \left[\left[LHS = RHS\right]\right]: \\ & \left[\left[LHS\right]\right] \\ & \left[\left[RHS\right]\right] \\ & \textit{appropriate store instruction} \\ \end{aligned}\]

If Expression if $(E)$ then $S1$ else $S2$

\[\begin{aligned} \text{if} \left[\left[E\right]\right] \text{then} \left[\left[S1\right]\right] \text{else} \left[\left[S2\right]\right]: \\ & \text{ifeq L1} \\ & \left[\left[S1\right]\right] \\ & \text{goto L2} \\ \text{L1:} \\ & \left[\left[S2\right]\right] \\ \text{L2:} \end{aligned}\]

While Expression while $(E)$ $S$

\[\begin{aligned} \text{while} \left[\left[E\right]\right] \left[\left[S\right]\right]: \\ \text{L1:} \\ & \left[\left[E\right]\right]: \\ & \text{ifeq L2} \\ & \left[\left[S\right]\right] \\ & \text{goto L1} \\ \text{L2:} \\ & \textit{pop continue label L1 from continue stack} \\ & \textit{pop break label L2 from break stack} \end{aligned}\]

Return Expression return $E$

\[\begin{aligned} \text{return} \left[\left[E\right]\right]: \\ & \left[\left[E\right]\right] \\ & \textit{appropriate return instruction} \\ \end{aligned}\]

Expression Statement $E$

\[\begin{aligned} \left[\left[E\right]\right]: \\ & \left[\left[E\right]\right] \\ & \text{pop} \\ \end{aligned}\]