COMP3131 Notes

Written by Luka Kerr on May 4, 2019

Regular Expressions and Finite Automata

Thompson’s construction

Syntax driven
Inductive
Use $S$ to denote the starting state and $A$ to denote the final accepting state

Inductive Base:

For $\epsilon$, construct the NFA

nfa1

For $a \in \Sigma$, construct the NFA

nfa2

Inductive Step: suppose $N(r)$ and $N(s)$ are NFAs for regular expressions $r$ and $s$. Then:

$r | s$:

nfa3

$rs$:

nfa4

$r^*$:

nfa5

$(r)$: $N((r)) \equiv N(r)$

Subset Construction

Used to convert a NFA to DFA
At most $2^n$ DFA states where $n$ is the total number of NFA states

Operations used:

Operation	Description
$\epsilon$-closure($s$)	Set of NFA states readable from NFA state $s$ on $\epsilon$-transitions
$\epsilon$-closure($T$)	Set of NFA states readable from some state $s$ in $T$ on $\epsilon$-transitions
move($T$, $a$)	Set of NFA states to which there is a transition on input $a$ from some state $s$ in $T$

where

$s$ is a NFA state
$T$ is a set of NFA states

Algorithm

let $s_0$ be the start state of the NFA;

DFAstates contains the only unmarked state $\epsilon$-closure($s_0$);

while there is an unmarked state $T$ in DFAstates do

$\quad$ mark $T$

$\quad$ for each input symbol $a$ do

$\qquad$ $U := \epsilon$-closure(move($T$, $a$));

$\qquad$ if $U$ is not in DFAstates then

$\qquad \quad$ add $U$ as an unmarked state in DFAstates;

$\qquad$ DFATrans[$T$, $a$] $:= U$;

$\quad$ done

done

DFA Minimisation

let $\prod$ be the partition with the two groups:

one is the set of all final states
the other is the set of all non-final states

let $\prod_{new} = \prod$;

for each group $G$ in $\prod_{new}$ do

$\quad$ partition $G$ into subgroups such that two states $s$ and $t$

$\qquad$ are in the same subgroup iff for all input symbols

$\qquad$ $a$, states $s$ and $t$ have transitions on $a$ to

$\qquad$ states in the same group of $\prod_{new}$;

$\quad$ replace $G$ in $\prod_{new}$ by the set of subgroups formed;

done

Chomsky’s Hierarchy

Grammar	Known As	Definition	Machine
Type 0	Unrestricted grammar	$\alpha \ne \epsilon$	Turing
Type 1	Context-sensitive grammar (CSG)	$\|\alpha\| \le \|\beta\|$	Linear bounded
Type 2	Context-free grammar (CFG)	$A \to \alpha$	Stack automation
Type 3	Regular grammar	$A \to w \mid B w$	Finite state automation

Context Free Grammars

BNF

BNF is a notation for writing CFG’s. Each production $(A, \alpha)$ is written as $A \to \alpha$, where the arrow $\to$ means “is defined to be” or “derives”.

A vertical bar $|$ reads “or else”.

EBNF

EBNF is an extended version of BNF, including regular expressions.

$( \alpha )^*$ means that $\alpha$ can be repeated zero or more times
$( \alpha )^+$ means that $\alpha$ can be repeated one or more times
$( \alpha )?$ means that $\alpha$ is optional

Derivations

A grammar derives sentences by:

Beginning with the start symbol
Repeatedly replacing a nonterminal by the right-hand side of a production with that nonterminal on the left-hand side, until there are no more nonterminals to replace

Such a sequence of replacements is called a derivation of the sentence being analysed.

At each step in a derivation, 2 choices are made:

Which nonterminal to replace?
Which alternative to use for that nonterminal?

Let $G :=$

\[\begin{array}{l} S \to ABC \\ A \to aA \mid \epsilon \\ B \to b B \mid \epsilon \\ C \to cC \mid \epsilon \end{array}\]

Leftmost Derivation

Always replace the leftmost nonterminal.

Example

Deriving $abbc$ in the grammar $G$ using a leftmost derivation gives:

\[S \Rightarrow ABC \Rightarrow aABC \Rightarrow aBC \Rightarrow abBC \Rightarrow abbC \Rightarrow abbcC \Rightarrow abbc\]

Rightmost Derivation

Always replace the rightmost nonterminal.

Example

Deriving $abbc$ in the grammar $G$ using a rightmost derivation gives:

\[S \Rightarrow ABC \Rightarrow ABcC \Rightarrow ABc \Rightarrow AbBc \Rightarrow AbbBc \Rightarrow Abbc \Rightarrow aAbbc \Rightarrow abbc\]

Parse Trees

In a parse tree:

The start symbol is always at the root of the tree
Nonterminals are always interior nodes
Terminals are always leaves in the tree

Recursive Grammar

Left recursive productions: $A \to A \alpha$

Right recursive productions: $A \to \alpha A$

Eliminating Left Recursion

Grammar Rewriting

Left recursive grammar

$A \to A \alpha \mid \beta$

Non-left recursive grammar

$A \to \beta A^\prime \ A^\prime \to \alpha A^\prime \mid \epsilon$

Regular Operators

Left recursive grammar

$A \to \alpha \ A \to A \beta_1 \mid A \beta_2$

Non-left recursive grammar

$A \to \alpha(\beta_1 \mid \beta_2)^*$

Top Down Parsing

Let $G :=$

\[\begin{array}{l} E \to T Q \\ Q \to +T Q \mid -T Q \mid \epsilon \\ T \to F R \\ R \to * F R \mid / F R \mid \epsilon \\ F \to \text{Int} \mid (E) \end{array}\]

First Sets

First($A$) is is the set of terminals which can appear as the first element of any chain of rules matching nonterminal $A$.

Using the grammar $G$

First($TQ$) = { Int, ( }
First($*R$) = { * }

Follow Sets

Follow($A$) is the set of terminals that appear immediately to the right of the non-terminal $A$. That is Follow($A$) = $\{ a : S \overset{*}{\implies} \dots A a \dots \}$

By convention, assume every input is terminated by a $ sign.

Follow sets:

Are only constructed for non-terminals
Do not contain $\epsilon$

Rules:

If $A$ is the start symbol of the grammar, then Follow($A$) includes $
If $A \to pBq$ is a production:
- Follow($B$) includes First($q$) - $\epsilon$
- If First($q$) contains $\epsilon$, then Follow($B$) includes Follow($A$)
If $A \to pB$ is a production, then Follow($B$) includes Follow($A$)

Using the grammar $G$

Follow($E$) = {), $}
Follow($Q$) = {), $}
Follow($T$) = {+, -} + Follow($Q$) = {+, -, ), $}
Follow($R$) = Follow($T$) = {+, -, ), $}
Follow($F$) = {*, /} + Follow($R$) = {*, /, +, -, ), $}

Select Sets

One select set for every production in the grammar
The select set for a production of the form $A \to \alpha$ is:
- Select($A \to \alpha$) = First($\alpha$) - { $\epsilon$ }
  - If $\epsilon \in$ First($\alpha$), then Select($A \to \alpha$) includes Follow($A$)

Using the grammar $G$

Select($E \to TQ$) = First($TQ$) = { Int, ( }
Select($Q \to +TQ$) = First($+TQ$) = { + }
Select($R \to \epsilon$) = (First($\epsilon$) - { $\epsilon$ }) $\cup$ Follow($R$) = { +, -, ), $ }

$LL(1)$ Grammar

A grammar is $LL(1)$ if for every nonterminal of the form $A \to \alpha_1 \mid \dots \mid \alpha_n$, the select sets are pairwise disjoint, i.e. Select($A \to \alpha_i$) $\cap$ Select($A \to \alpha_j$) = $\emptyset$ for all $i$ and $j$ such that $i \ne j$.

A grammar with left recursion is not $LL(1)$
- Examples of left recursion are:
  - $A \to A \alpha$
  - $A \to B \alpha \ B \to A \beta$
A grammar with common prefixes is not $LL(1)$
- An example of a common prefix is:
  - $A \to \textbf{If } \alpha \ \beta \textbf{ fi} \ A \to \textbf{If } “(“ \ \alpha \ “)” \ \beta \textbf{ fi}$
$LL(1)$ grammar can be parsed deterministically using 1 token of lookahead

$LL(1)$ Parsing Table

for every production $A \to \alpha$ in the grammar do

$\quad$ forall $a$ in Select($A \to \alpha$), set $Table[A, a] = \alpha$

For example, using the grammar $G$, the $LL(1)$ parsing table would look like:

	Int	$+$	$-$	$*$	$/$	$($	$)$	$
$E$	$TQ$					$TQ$
$Q$		$+TQ$	$-TQ$				$\epsilon$	$\epsilon$
$T$	$FR$					$FR$
$R$		$\epsilon$	$\epsilon$	$*FR$	$/FR$		$\epsilon$	$\epsilon$
$F$	Int					$(E)$

If every entry in the table contains no more than 1 production, the grammar is $LL(1)$.

$LL(1)$ Table Driven Parser

push $ onto the stack;

push the start symbol onto the stack;

while stack not empty do

$\quad$ let $X$ be the top stack symbol and $a$ be the lookahead symbol in the input;

$\quad$ if $X$ is a terminal then

$\qquad$ if $X = a$ then

$\quad \qquad$ pop $X$ and get the next token;

$\qquad$ else

$\quad \qquad$ error;

$\qquad$ fi

$\quad$ else

$\qquad$ if $Table[X, a]$ non-empty then

$\quad \qquad$ pop $X$;

$\quad \qquad$ push $Table[X, a]$ onto stack in the reverse order;

$\qquad$ else

$\quad \qquad$ error;

$\qquad$ fi

$\quad$ fi

done

For example, using the grammar $G$, and the $LL(1)$ parsing table above, a table driven parse of $i + i$ would look like:

Stack	Input	Production	Derivation
$$E$	$i + i$$	$E \to TQ$	$E \implies_{lm} TQ$
$$QT$	$i + i$$	$T \to FR$	$\implies_{lm} FRQ$
$$QRF$	$i + i$$	$F \to i$	$\implies_{lm}iRQ$
$$QRi$	$i + i$$	pop and goto next token
$$QR$	$+ i$$	$R \to \epsilon$	$\implies_{lm} iQ$
$$Q$	$+i$$	$Q \to + TQ$	$\implies_{lm} i + TQ$
$$QT+$	$+i$$	pop and goto next token
$$QT$	$i$$	$T \to FR$	$\implies_{lm} i + FRQ$
$$QRF$	$i$$	$F \to i$	$\implies_{lm} i + iRQ$
$$QRi$	$i$$	pop and goto next token
$$QR$	$	$R \to \epsilon$	$\implies{lm} i + iRQ$
$$Q$	$	$Q \to \epsilon$	$\implies{lm} i + iRQ$
$	$		$\implies_{lm} i + iQ$

Attribute Grammars

An attribute grammar is a triple $A = (G, V, F)$ where

$G$ is a CFG
$V$ is a finite set of distinct attributes
$F$ is a finite set of semantic rules about the attributes

Each attribute contains a name and a type, and can represent anything we choose.

Let $G :=$

\[\begin{array}{l} S \to E \\ E \to E \mathbin{/} E \\ E \to \textbf{num} \\ E \to \textbf{num} \cdot \textbf{num} \end{array}\]

An example attribute grammar for $G$ is as follows:

Production	Semantic Rules
$S \to E$	$E$.type if $E$.isFloat then float else int
$E \to E_1 \ / \ E_2$	$E$.isFloat = $E_1$.isFloat or $E_2$.isFloat
	$E_1$.type = $E$.type
	$E_2$.type = $E$.type
$E \to \textbf{num}$	$E$.isFloat = false
$E \to \textbf{num} \cdot \textbf{num}$	$E$.isFloat = true

L-attributed and S-attributed

L-attributed

An attribute grammar is L-attributed if each inherited attribute of $X_i, 1 \le i \le n$ on the right hand side of $X_0 \to X_1 X_2 \cdots X_m$ depends only on:

The attributes of the symbols $X_1, X_2, \cdots , X_{i - 1}$ to the left of $X_i$ in the production
The inherited attributes of $X_0$

S-attributed

An attribute grammar is S-attributed if it uses synthesised attributes only.

Every S-attributed grammar is L-attributed.

Synthesised and Inherited Attributes

Let $X_0 \to X_1 X_2 \cdots X_n$ be a production, and $A(X)$ be the set of attributes associated with a grammar symbol $X$.

Synthesised

A synthesised attribute $syn$ of $X_0$ is computed by:

\[X_0.syn = f(A(X_1), A(X_2), \dots , A(X_n))\]

The attribute $syn$ on a tree node depends on those of its children.

Inherited

An inherited attribute $inh$ of $X_i$, where $1 \le i \le n$, is computed by:

\[X_i.inh = g(A(X_0), A(X_1), \dots , A(X_n))\]

The attribute $inh$ on a tree node depends on those on its parent and/or siblings. $inh$ can also depend on other attributed in $A(X_i)$.

Static Semantics

A semantic analyser enforces a language’s semantic constraints, specifically:

Scope rules
Type rules

Scope

The scope of a declaration is the portion of the program over which the declaration takes effect.

A declaration is in scope at a particular point in a program if and only if the declaration’s scope includes that point.

An example of scope:

int k;           // k begin scope

void foo() {
  int i;         // i begin scope
  int j;         // j begin scope
  i = 1;
  j = 7;
  print(i);      // -> 1
  print(j);      // -> 7
  {              // new block
    int i;       // new block scoped i begin scope
    i = 2;
    print(i);    // -> 2
    print(j);    // -> 7
  }
  print(i);      // -> 1
  print(j);      // -> 7
}

Scope Levels

Declarations in the outermost block are in level 1
Increment the scope level by 1 every time after moving from an enclosing to an enclosed block

Type Checking

The process of applying a language’s type rules to infer the type of each construct and comparing that type with the expected type.

Code Generation

Jasmin

Example

Java program:

class Test {
  static int add(int a, int b) {
    return a + b;
  }

  public static void main(String argv[]) {
    Test.add(1, 2);
  }
}

Jasmin output:

.source Test.java
.class Test
.super java/lang/Object

.method static add(II)I
.limit stack 2
.limit locals 2
.var 0 is a I from Label0 to Label1
.var 1 is b I from Label0 to Label1

Label0:
  iload_0
  iload_1
  iadd
Label1:
  ireturn
.end method

.method public static main([Ljava/lang/String;)V
.limit stack 2
.limit locals 1
.var 0 is argv [Ljava/lang/String; from Label0 to Label1

Label0:
  iconst_1
  iconst_2
  invokestatic Test/add(II)I
  pop
Label1:
  return
.end method

Code Templates

Arithmetic Expression $E_1 i + E_2$

\[\begin{aligned} \left[\left[E_1 i + E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \left[\left[E_2\right]\right] \\ & \text{emit("iadd")} \end{aligned}\]

Boolean Expression $E_1 \ \&\& \ E_2$

\[\begin{aligned} \left[\left[E_1 \ \&\& \ E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \text{ifeq L1} \\ & \left[\left[E_2\right]\right] \\ & \text{ifeq L1} \\ & \text{iconst\_1} \\ & \text{goto L2} \\ \text{L1:} \\ & \text{iconst\_0} \\ \text{L2:} \end{aligned}\]

Relational Expression $E_1 i > E_2$

\[\begin{aligned} \left[\left[E_1 i > E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \left[\left[E_2\right]\right] \\ & \text{if\_icmpgt L1} \\ & \text{iconst\_0} \\ & \text{goto L2} \\ \text{L1:} \\ & \text{iconst\_1} \\ \text{L2:} \end{aligned}\]

Relational Expression $E_1 f > E_2$

\[\begin{aligned} \left[\left[E_1 i > E_2\right]\right]: \\ & \left[\left[E_1\right]\right] \\ & \left[\left[E_2\right]\right] \\ & \text{fcmpg} \\ & \text{ifgt L1} \\ & \text{iconst\_0} \\ & \text{goto L2} \\ \text{L1:} \\ & \text{iconst\_1} \\ \text{L2:} \end{aligned}\]

Assignment Expression $a = E$

\[\begin{aligned} \left[\left[a = E\right]\right]: \\ & \left[\left[E\right]\right] \\ & \text{istore\_1} \\ \end{aligned}\]

Assignment Expression $LHS = RHS$

\[\begin{aligned} \left[\left[LHS = RHS\right]\right]: \\ & \left[\left[LHS\right]\right] \\ & \left[\left[RHS\right]\right] \\ & \textit{appropriate store instruction} \\ \end{aligned}\]

If Expression if $(E)$ then $S1$ else $S2$

\[\begin{aligned} \text{if} \left[\left[E\right]\right] \text{then} \left[\left[S1\right]\right] \text{else} \left[\left[S2\right]\right]: \\ & \text{ifeq L1} \\ & \left[\left[S1\right]\right] \\ & \text{goto L2} \\ \text{L1:} \\ & \left[\left[S2\right]\right] \\ \text{L2:} \end{aligned}\]

While Expression while $(E)$ $S$

\[\begin{aligned} \text{while} \left[\left[E\right]\right] \left[\left[S\right]\right]: \\ \text{L1:} \\ & \left[\left[E\right]\right]: \\ & \text{ifeq L2} \\ & \left[\left[S\right]\right] \\ & \text{goto L1} \\ \text{L2:} \\ & \textit{pop continue label L1 from continue stack} \\ & \textit{pop break label L2 from break stack} \end{aligned}\]

Return Expression return $E$

\[\begin{aligned} \text{return} \left[\left[E\right]\right]: \\ & \left[\left[E\right]\right] \\ & \textit{appropriate return instruction} \\ \end{aligned}\]

Expression Statement $E$

\[\begin{aligned} \left[\left[E\right]\right]: \\ & \left[\left[E\right]\right] \\ & \text{pop} \\ \end{aligned}\]