COMP3231 Notes

Written by Luka Kerr on May 3, 2019

Introduction & Overview

What Is An Operating System?

The operating system is an abstract machine
The operating system is a resource manager
The operating system is the software privileged mode

Kernel

The portion of the operating system that is running in privileged mode
Usually stays in main memory

Structure

Usually monolithic
Has many interdependencies:
- Scheduling on virtual memory
- Virtual memory (VM) on I/O to disk
- VM on files
- Files on VM
- … and so on

Processes & Threads

Processes

Also called a task or job
Execution of an individual program
Encompasses one or more threads

A process can be in 3 states:

Running
Blocked
Ready

A processes’ information is stored in a process control block (PCB). Each entry in the PCB may contain items such as:

Registers
Process state
Process ID
Pointers to different memory segments (code, data, stack)
Priority

A process consists of at least 3 segments:

Code
Data
Stack

Threads

Unit of execution
Can be traced
Belongs to a process

The thread model is a model that seperates execution from the environment. Items stored per thread include:

Program counter
Registers
Stack
State
Local variables

Each thread shares items across the process they are running in, including:

Global variables
Open files
Address space

Threads are cheaper to create and destroy than processes, and can take advantage of the parallelism available on machines with more than one CPU.

Context Switch

A context switch refers to either a switch between threads or a switch between processes.

A context switch can occur anytime the operating system is invoked

On a system call
On an exception
On an interrupt

Concurrency & Synchronisation

Concurrency Example

The following example demonstrates a race condition. Since the global variable count is shared between threads, but the function increment() isn’t run mutually exclusively, depending on the execution order, count could result in a different value after different runs of the program.

Thread 1

void increment() {
  int i;
  i = count;
  i = i + 1;
  count = i;
}

Thread 2

void increment() {
  int i;
  i = count;
  i = i - 1;
  count = i;
}

Critical Region

A critical region is a region of code where shared resources are accessed. In the example above, the critical region is from the line i = count to the line count = i.

We can control access to the shared resource by controlling access to the code that accesses the resource. As shown above, uncoordinated entry to the critical region results in a race condition.

To coordinate access to critical regions we can use a number of solutions:

Locks
- Acquire lock before critical region, release after critical region
Disabling Interrupts
- Before entering a critical region, disable interrupts
- After leaving the critical region, enable interrupts
Hardware Support
- Test and set instruction
- If lock == 0, set and acquire the lock
- If lock == 1, another process/thread has the lock
Semaphores
- P(): test/wait/down
- V(): increment/signal/up
Monitors
- A set of procedures, variables, data types are grouped in a special kind of module, a monitor
- Only one process/thread can be in the monitor at any one time

Semaphores

Semaphores wait for a resource if it’s not available. During waiting, processes are put into a queue to avoid busy waiting.

When a resource is available, a process will signal, and any process waiting on that resource will be resumed.

Condition Variables

A condition variable is basically a container of threads that are waiting for a certain condition. They allow a process to wait within a monitor until some condition is met, before regaining exclusive access and resuming their task.

Deadlock

What Is Deadlock?

Formal definition: A set of processes is deadlocked if each process in the set is waiting for an event that only another process in the set can cause.

There are 4 conditions for deadlock to occur:

Mutual exclusion condition, where each resource is assigned to 1 process or is available
Hold and wait condition, where processes holding resources can request additional resources
No preemption condition, where previously granted resources cannot be forcibly taken away
Circular wait condition, where there must be a chain of > 1 processes, each waiting for a resource held by the next member of the chain

Example

Suppose a process holds a resource A, and requests a resource B. At the same time, a different process holds the resource B and requests A. Both processes cannot proceed and are deadlocked.

Mitigating Deadlock

Strategies for dealing with deadlock include:

Ignoring the problem altogether
Prevention
Detection and recovery
Dynamic avoidance

Ignoring The Problem

This is a reasonable solution if deadlock is highly unlikely to occur. It is a trade off between convenience and correctness.

Prevention

By preventing one of the conditions for deadlock to occur, deadlock can be avoided.

Preventing mutual exclusion usually isn’t feasible
Preventing the hold and wait condition requires processes to request all resources needed on startup, which is not always possible
Preventing the no preemption condition isn’t feasible, since forcible taking resources from a process is not a viable option

A solution is to prevent the circular wait condition. This is done by ordering resources.

Detection & Recovery

If we can determine a system is deadlocked, we can try to recover and restore progress to the system.

Let:

The resources in existence be $E = (e_1, e_2, \dots , e_m)$
the resources available be $A = (a_1, a_2, \dots , a_m)$

Using two data structures:

A current allocation matrix ($C$) where row $n$ is the current allocation to process $n$ $\begin{bmatrix} c_{11} & c_{12} & c_{13} & \cdots & c_{1m} \\ c_{21} & c_{22} & c_{23} & \cdots & c_{2m} \\ \vdots & \vdots & \vdots & & \vdots \\ c_{n1} & c_{n2} & c_{n3} & \cdots & c_{nm} \end{bmatrix}$
A request matrix ($R$) where row $n$ is what process $n$ needs $\begin{bmatrix} r_{11} & r_{12} & r_{13} & \cdots & r_{1m} \\ r_{21} & r_{22} & r_{23} & \cdots & r_{2m} \\ \vdots & \vdots & \vdots & & \vdots \\ r_{n1} & r_{n2} & r_{n3} & \cdots & r_{nm} \end{bmatrix}$

resources can be kept track of, and deadlock can be detected.

The detection algorithm is as follows:

Look for an unmarked process $P_i$ for which the $i$-th row of $R$ is less than or equal to $A$
If found, add the $i$-th row of $C$ to $A$ and mark $P_i$. Go to step 1
If no such process exists, terminate

At this point, any remaining processes are deadlocked.

To recover from deadlock various actions can be taken:

Take a resource from another process
Rollback to a previous state (no guarantee deadlock won’t occur again)
Kill processes

Dynamic Avoidance

Introduce a concept of ‘safe’ and ‘unsafe’ states. If the system is in a safe state, then it is not deadlocked, and there exists a scheduling order that results in every process running to completion.

To avoid deadlock, we can only grant requests that result in safe states.

Livelock

Similar to deadlock, but processes aren’t blocked. They can change state regularly, but don’t make progress.

Starvation

A process never receives the resource it is waiting for, despite the resource (repeatedly) becoming free, the resource is always allocated to another waiting process.

System Calls

System calls can be viewed as special function calls that provide for a controlled entry into the kernel. While in kernel mode, some operation is performed and then execution returns to the caller with a result.

Some examples include:

File operations (read(), write(), …)
Process management (fork(), exit(), …)
Directory management (mkdir(), link(), …)

MIPS

In MIPS system calls are invoked via a syscall instruction. The syscall instruction causes an exception and transfers control to the general exception handler.

OS/161

In OS/161 syscall arguments passed and returned via the normal C function calling conventions.

Additionally:

Register v0 contains the system call number
On return register a3 contains:
- 0 if success with v0 containing the result
- Non-zero if error with errno containing the error, and v0 containing -1

Computer Hardware, Memory Heirarchy, Caching

Memory Hierarchy

Typical Access Time	Memory Type	Typical Capacity
1 nsec	Registers	< 1 KB
2nsec	Cache	1 MB
10 nsec	Main Memory	64 - 512 MB
10 msec	Magnetic Disk	5 - 50 GB
100 sec	Magnetic Tape	20 - 100 GB

Going down the hierarchy:

Decreasing cost per bit
Increasing capacity
Increasing access time
Decreasing frequency of access to the memory by the processor

Caching

Caching allows for faster access to slower storage by using intermediate-speed storage as a cache.

CPU Cache

The CPU cache is located between the CPU and main memory.

Properties:

1 to a few cycles when accessed
Is hardware maintained
Sizes range from a few KB to tens of MB
Uses a hierarchy of caches (2-5 levels)

Effective Access Time

\[T_{eff} = H \times T_1 + (1 - H) \times T_2\]

where

$T_1$ = access time of memory 1
$T_2$ = access time of memory 2
$H$ = hit rate in memory 1

For example:

Cache memory access time 1ns
Main memory access time 10ns
Hit rate of 95%

\[T_{eff} = 0.95 \times 1 + (1 - 0.95) \times (1 + 10) = 1.5\]

We add $1$ to $T_2$ since we need to account for a lookup and miss in memory 1.

Disk Access

Disk access is very slow relative to main memory, and is dominated by the time taken to move the head over data.

To partially fix this, the operating system keeps a subset of the disk’s data in main memory.

This principle is used for other slow access times, such as accessing media via the internet.

File Management

File Structure Abstractions

Three kinds of files:

Byte sequence
- Unstructured, simple management for OS
Record sequence
- Collection of bytes treated as a unit
Key-based, tree structured
- Records of variable length, each has an associated key

File Directories

Provide mapping between file names and the files themselves
Contain information about files (attributes, location, ownership)
Directory itself is a file owned by the operating system

Access Rights

None
- Can’t read, see or write to file
Knowledge
- Can see the file and who the owner is
Execution
- Can load and execute a program but can’t copy or write to it
Reading
- Can read from a file
Appending
- Can add data to the file but cannot modify or delete any of the file’s contents
Updating
- Can modify, delete, and add to the file’s data
Changing protection
- Can change access rights granted to other users
Deletion
- Can delete a file
Owners
- Has all rights previously listed
- May grant rights to others

UNIX Access Permissions

Sample Output Of ls -la:

total 16
drwxr-xr-x@  6 Luka  staff   192 29 Apr 13:43 .
drwxr-xr-x@  7 Luka  staff   224 24 Nov 12:42 ..
-rw-r--r--@  1 Luka  staff  6148 29 Apr 13:43 .DS_Store
drwxr-xr-x@ 15 Luka  staff   480 29 Apr 13:43 COMP2111
drwxr-xr-x@  7 Luka  staff   224 27 Apr 17:38 COMP3131
drwxr-xr-x@ 10 Luka  staff   320  1 May 11:50 COMP3231

There are 3 groups in a file’s permissions: user, group, other.

d for directories
r for read permissions
w for write permissions
x for execute permissions

UNIX Storage Stack

Item	Description
Application	Syscall interface (`open()`, `close()`…)
File Descriptor Table	Keeps track of files opened by each process individually
Open File Table	Keeps track of files opened by user level processes
VFS	Unified interface to multiple file systems
FS	Handles physical location of data on disk, exposes files/directories
Buffer Cache	Keep recently accessed disk blocks in memory
Disk Scheduler	Schedule disk accesses from multiple processes
Device Driver	Handles device specific protocols
Disk Controller	Handles disk geometry, bad sectors
HDD/SSD	Primary storage device

Contiguous Allocation

A file occupies a contiguous set of blocks on the disk.

Linked List Allocation

Each block contains a pointer to the next block in the chain. Free blocks are also linked in chain.

Pros:

Only single metadata entry per file
Best for sequential files

Cons:

Poor for random access
Blocks end up scattered across the disk due to free list eventually being randomised

File Allocation Table

Keep a map of the entire file system in a separate table. The table is stored on the disk and is replicated in memory.

Pros:

Random access is fast

Cons:

Requires a lot of memory for large disks
Free block lookup is slow

I-node Based File Structure

Have a separate table (index-node or i-node) for each file, only keeping tables for open files in memory.

Fragmentation

External Fragmentation

The space wasted external to the allocated memory regions
Memory space exists to satisfy a request but it is unusable as it is not contiguous

Internal Fragmentation

The space wasted internal to the allocated memory regions
Allocated memory may be slightly larger than requested memory; this size difference is wasted memory internal to a partition

Example

Take a simplified version of an operating system with multiple processes running:

| P1 |P2|x| . P3 . |xx|  P4 .|x| P5 |

The spaces marked with an x are free memory locations in the operating system, and the spaces marked with a . are free memory inside individual processes.

The spaces marked with an x are examples of external fragmentation, since a new process won’t be big enough to fit in any of the spaces individually, but overall there is enough memory to fit the process.

The spaces marked with a . are examples of internal fragmentation, since memory is left unused inside individual processes, but this memory can’t be used by other processes.

Virtual File System

Provides single system call interface for many file systems (DOS, Ext2, …)
Transparent handling of network file systems (NFS, AFS, …)
File-based interface to arbitrary device drivers (/dev) and kernel data structures (/proc)
Provides an indirection layer for system calls

struct vfs {
  int           (*fs_sync)(struct fs *);       /* flush content to disk */
  const char   *(*fs_getvolname)(struct fs *); /* get volume name */
  struct vnode *(*fs_getroot)(struct fs *);    /* get the root vnode */
  int           (*fs_unmount)(struct fs *);    /* unmount the file system */
  
  void *fd_data /* private file system specific data */
};

Some VFS functions include:

int vfs_open(char *path, int openflags, mode_t mode, struct vnode **ret);
void vfs_close(struct vnode *vn);
int vfs_readlink(char *path, struct uio *data);
int vfs_symlink(const char *contents, char *path);
int vfs_mkdir(char *path);
int vfs_link(char *oldpath, char *newpath);
int vfs_remove(char *path);
int vfs_rmdir(char *path);
int vfs_rename(char *oldpath, char *newpath);
int vfs_chdir(char *path);
int vfs_getcwd(struct uio *buf);

Vnode

Represents a file (inode) in the underlying filesystem
Points to the real inode
Contains pointers to functions to manipulate files/inodes

struct vnode {
  int                    vn_refcount;  /* reference count for this vnode */
  struct spinlock        vn_countlock; /* lock for mutex */
  struct fs              *vn_fs;       /* file system containing this vnode */
  void                   *vn_data;     /* file system specific vnode data */
  const struct vnode_ops *vn_ops;      /* pointers to functions operating on vnodes */
}

Some Vnode operations include:

struct vnode_ops {
  int (*vop_eachopen)(struct vnode *object, int flags_from_open);
  int (*vop_reclaim)(struct vnode *vnode);
  int (*vop_read)(struct vnode *file, struct uio *uio);
  int (*vop_readlink)(struct vnode *link, struct uio *uio);
  int (*vop_getdirentry)(struct vnode *dir, struct uio *uio);
  int (*vop_write)(struct vnode *file, struct uio *uio);
  int (*vop_ioctl)(struct vnode *object, int op, userptr_t data);
  int (*vop_stat)(struct vnode *object, struct stat *statbuf);
  int (*vop_gettype)(struct vnode *object, int *result);
  int (*vop_isseekable)(struct vnode *object, off_t pos);
  int (*vop_fsync)(struct vnode *object);
  int (*vop_truncate)(struct vnode *file, off_t len);
  int (*vop_namefile)(struct vnode *file, struct uio *uio);
  int (*vop_creat)(struct vnode *dir, const char *name, int excl, struct vnode **result);
  int (*vop_symlink)(struct vnode *dir, const char *contents, const char *name);
  int (*vop_mkdir)(struct vnode *parentdir, const char *name);
  int (*vop_link)(struct vnode *dir, const char *name, struct vnode *file);
  int (*vop_remove)(struct vnode *dir, const char *name);
  int (*vop_rmdir)(struct vnode *dir, const char *name);
  int (*vop_rename)(struct vnode *vn1, const char *name1, struct vnode *vn2, const char *name2);
  int (*vop_lookup)(struct vnode *dir, char *pathname, struct vnode **result);
  int (*vop_lookparent)(struct vnode *dir, char *pathname, struct vnode **result, char *buf, size_t len);
}

File Descriptor Table & Open File Table

A file (read Vnode) can be represented as a file struct:

struct file {
  int           flags;           /* what permissions this file is opened with */
  off_t         offset;          /* offset into this file */
  struct vnode *vn;              /* pointer to underlying vnode */
  struct lock  *flock;           /* lock for mutex to file */
  unsigned int  reference_count; /* how many processes are referencing this file */
}

The operating system contains a global open file table for all processes:

struct file *file_table[__OPEN_MAX];

Each process contains a file descriptor table containing pointers to file’s in the open file table:

struct file **file_descriptor_table[__OPEN_MAX];

Buffer Cache

Used as temporary storage when transferring data between two entities.

Buffering Disk Blocks

Allow applications to work with arbitrarily sized region of a file
Writes can return immediately after copying to kernel buffer
Can implement read-ahead by pre-loading next block on disk into kernel buffer

Cache

Fast storage used to temporarily hold data to speed up repeated access to the data.

Caching Disk Blocks

On access
Before loading block from disk, check if it is in cache first
Avoids disk access
Can optimise for repeated access for single or several processes

File System Consistency

File data is expected to survive. If a LRU policy was implemented for evicting the buffer cache, critical data may be in memory forever if it is frequently used.

Generally, cached disk blocks are prioritised in terms of how critical they are to file system consistency. In UNIX, flushd flushes all modified blocks to the disk every 30 seconds.

An alternative is to use a write through cache, which writes modified blocks immediately to the disk. This has worse performance and generates much more disk traffic. It is still used for some devices, for example in USB drives.

Disk Scheduler

Access time for a disk is calculated as follows:

\[T_a = T_s + \dfrac{1}{2r} + \dfrac{b}{rN}\]

where:

$T_s$ is seek time (ms or s)
$r$ is rotational speed (rpm)
$b$ is bytes to transfer
$N$ is the number of bytes per track on the disk

Disk Arm Scheduling Algorithms

First Come First Serve (FCFS)

Process requests as they come
Fair (no starvation)
Good for a few processes with clustered requests
Deteriorates to random if there are many processes

Shortest Seek Time First (SSTF)

Select request that minimises the seek time
Generally performs much better than FCFS
May lead to starvation

Elevator Algorithm (SCAN)

Moves head in one direction
Better than FCFS, usually worse than SSTF
Avoids starvation
Makes poor use of sequential reads (on down-scan)

Modified Elevator (C-SCAN)

Like elevator, but reads sectors in only one direction
Better locality on sequential reads
Better use of read ahead cache on controller
Reduces max delay to read a particular sector

Ext2

Second Extended Filesystem
Features:
- Block size (1024, 2048, and 4096) configured at file system creation
- Inode-based FS

Ext2 Inodes

Mode
- Type
- Access mode
UID
- User ID
GID
- Group ID
atime
- Time of last access
ctime
- Time when file was created
mtime
- Time when file was last modified
Size
- Offset of the highest byte written
Block count
- Number of disk blocks used by the file
Direct blocks
- Block numbers of first 12 blocks in the file

Indirection

-------------------
|                 |    --------    --------
| single indirect | -> |      | -> | data |
|                 |    --------    --------
------------------|
|                 |    --------    --------    --------
| double indirect | -> |      | -> |      | -> | data |
|                 |    --------    --------    --------
------------------|
|                 |    --------    --------    --------    --------
| triple indirect | -> |      | -> |      | -> |      | -> | data |
|                 |    --------    --------    --------    --------
-------------------

Assuming 4 byte block numbers and 1K blocks, the max file size is calculated as $12 + 256 + 65536 + 16777216 = 16843020$ blocks $\approx$ 16GB.

Calculated from:

Number of blocks addressable from one block = $(1K \ \text{or} \ 1024)/4 = 256$
Direct blocks = $12$
Single indirect blocks = $256$
Double indirect blocks = $256^2$
Triple indirect blocks = $256^3$

Single Indirect Block

A block number of a block containing block numbers.

Requires two disk access to read, one for the indirect block and one for the target block.

Double Indirect Block

Block number of a block containing block numbers of blocks containing block numbers.

Triple Indirect Block

Block number of a block containing block numbers of blocks containing block numbers of blocks containing block numbers.

Access Patterns

To read 1 byte
- Best case: 1 read via a direct block
- Worst case: 4 reads via the triple indirect block
To write 1 byte
- Best case: 1 write via direct block
- Worst case (if all blocks allocated): 4 reads + 1 write
- Worst case (if no blocks allocated):
  - 4 writes (write all blocks)
  - 1 read, 4 writes (read/write 1 indirect, write 2 indirect, write 1 data)
  - 2 reads, 3 writes (read 1 indirect, read/write 1 indirect, write 1 indirect; write 1 data)
  - 3 reads, 2 writes (read 2 indirect, read/write 1 indirect; write 1 data)

File System Reliability

Disk writes are buffered in RAM, on crash or power outage data can be lost
File system operations are non-atomic
e2fsck
- Scans the disk after an unclean shutdown and attempts to restore file system invariants

Ext3

Third Extended Filesystem
Features:
- Journaling
- Backward and forward compatibility with Ext2

Journaling

Journaling Block Device

Interface:

Start a new transaction
Update a disk block as part of a transaction
Complete a transaction
Commit a transaction by writing it’s data to the journal (persistent storage)
Checkpoint a transaction by flushing the journal to the disk

Modes

Ext3 supports two journaling modes:

Metadata+data
- Enforces atomicity of all file system operations
Metadata journaling
- Metadata is journalled
- Data blocks are written directly to the disk
- Improves performance and enforces file system integrity

Memory Management

Fixed Partitioning

Divides memory into fixed equal-sized partitions
Any process $\le$ partition size can be loaded into a partition
Partitions are free or busy

Problems

Any unused space in the partition is wasted (internal fragmentation)
Processes smaller than main memory, but larger than a partition cannot run

Dynamic Partitioning

Partitions are of variable length
Process is allocated exactly what it needs

Problems

External fragmentation

Dynamic Partitioning Allocation Algorithms

Classic Approach

Represent available memory as a linked list of available “holes”

struct hole {
  unsigned int address; /* base address */
  unsigned int size;    /* size of memory hole */
  struct hole *link;    /* link to next memory hole */
}

Kept in order of increasing address

Dynamic Partitioning Placement Algorithm

First fit is generally the better performing algorithm, and easier to implement.

First Fit

Scan the list for the first entry that fits
- If greater in size, break it into an allocated and free part

Next Fit

Like first fit, but begins search from where the last request succeeded
Performs worse than first fit as it breaks up the large free space at end of memory

Best Fit

Chooses the block that is closest in size to the request
Performs worse than first fit due to having to search complete list

Worst Fit

Chooses the block that is largest in size
Very poor performance

Compaction

Compaction is the shuffling of memory contents to place all free memory together in one large block. It generally requires hardware support.

External fragmentation can be reduced using compaction.

Swapping

A process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution.

Virtual Memory

Two classic variants:

Paging (most dominant)
Segmentation

Paging

60K - 64K |  X  | <- Virtual Page
56K - 60K |  X  |
52K - 56K |  3  | \
...                -> |-----| 28K - 32K  <- Page frame
4K - 8K   |  X  |     |-----| ...
0K - 4K   |  5  | --> |-----| 0K - 4K

Partition physical memory into small equal sized chunks called frames
Divide each process’s virtual (logical) address space into same size chunks called pages
The operating system maintains a page table to translate each virtual address to physical address
Process’s physical memory does not have to be contiguous
No external fragmentation
Small internal fragmentation
Allows sharing by mapping several pages to the same frame
Abstracts physical organisation

Segmentation

Supports user’s view of memory
A program is a collection of segments (functions, methods, stack, arrays, variables, …)
Logical address consists of a two tuple: (segment number, offset)
A segment table has entries containing a base address and limit
Allows sharing of segments

Page Faults

Referencing an invalid page triggers a page fault.

There are two main types:

Illegal address
Page not resident

Page Table

A page table is structured logically as an array of frame numbers. Each page table entry (PTE) also has other bits:

valid bit
- Indicates a valid mapping for the page
dirty bit
- Indicates the page may have been modified in memory
reference bit
- Indicates the page has been accessed
protection bits
- R/W/X or combinations of the three
caching bit
- Indicates whether the processor should bypass the cache when accessing memory

Two Level Page Table

Contains a root level page table, and a second level page table for each entry in the root level.

        Virtual Address             -------------------------v
-------------------------------     |     -------------------------      
| 10 bits | 10 bits | 12 bits | -----     | Frame Number | Offset |     
-------------------------------           -------------------------  
     |         |              Second Level      ^        |
     |         |             ----------------   |        |          Main Memory
     |         |             |              |   |        |         --------------
     |         ------------> | Frame Number | ---        |         |            |
     |     Root Level     |  |              |            |         |            |
     |     -----------    |  |              |             -------> | Page Frame |
     |     |         |    |  ----------------                      |            |
     |---> | Pointer | ----                                        |            |
           |         |                                             |            |
           -----------                                             --------------

Inverted Page Table

An array of page numbers stored by frame number.

Grows with size of RAM, not virtual address space
Need a separate data structure for non-resident pages
Saves a vast amount of space

Hashed Page Table

Similar to the inverted page table, but more efficient with its size based on physical memory size and not virtual.

Translation Lookaside Buffer

Given a virtual address, the processor examines the TLB
If matching PTE found (TLB hit), the address is translated
Otherwise (TLB miss), the page number is used to index into the process’s page table

Properties

Holds a recently used subset of page table entries
TLB may or may not be under operating system control
Typically 64 - 128 entries
Can have separate TLBs for instruction fetch and data access
Is a shared piece of hardware
Entries are process specific

On a context switch, the TLB needs to be flushed so that all its entries are invalidated.

MIPS R3000

MIPS R3000 TLB

struct entry {
  /* entry hi */
  unsigned int vpn  : 20; /* virtual page number */
  unsigned int asid : 6;  /* asid */
  /* next 6 bits unused */
  
  /* entry lo */
  unsigned int pfn : 20; /* page frame number */
  unsigned int n   : 1;  /* not cacheable bit */
  unsigned int d   : 1;  /* dirty bit */
  unsigned int v   : 1;  /* valid bit */
  unsigned int g   : 1;  /* global bit */
  /* next 8 bits unused */
};

MIPS R3000 Address Space Layout

kseg0
- 512 megabytes
- TLB not used
- Cacheable
- Only kernel-mode accessible
- Usually where the kernel code is placed
kuseg
- 2 gigabytes
- TLB translated
- Cacheable
- User mode and kernel mode accessible
- Page size is 4K
kseg1
- 512 megabytes
- Not cacheable
- Only kernel-mode accessible
- Where devices are accessed

Principle Of Locality

Programs tend to reuse data and instructions they have used recently
90/10 rule: a program spends 90% of its time in 10% of its code

An implication of locality is that we can reasonably predict what instructions and data a program will use in the near future based on its accesses in the recent past

Temporal locality

Recently accessed items are likely to be accessed in the near future.

Spatial locality

Items whose addresses are near one another tend to be referenced close together in time.

Working Set

The pages/segments required by an application in a time window ($\Delta$) is called its memory working set.

Thrashing

Occurs when a computer’s virtual memory resources are overused, leading to a constant state of paging and page faults, inhibiting most application-level processing.

To recover from thrashing we can suspend a few processes to reduce degree of multiprogramming.

Page Replacement

In terms of performance, the policies below rank as:

Optimal
LRU
Clock
FIFO

Optimal Replacement Policy

Remove the page that won’t be used for the longest time.

Impossible to implement
A theoretic reference point

FIFO Replacement Policy

Remove the oldest page.

Easy to implement
Age of a page is isn’t necessarily related to usage

Least Recently Used Replacement Policy

Remove the least recently used page.

Impossible to implement efficiently
Implementation requires a time stamp to be kept for each page, updated on every reference

Clock Replacement Policy

When a page is used, set a reference bit to 1. When scanning for a victim, remove the first page with a reference bit of 0, and set all reference bits to 0.

I/O

Device Drivers

Originally compiled into the kernel, nowadays they are dynamically loaded when needed
Are classified into similar categories - block devices and character (stream of data) devices
The operating system defines a standard interface to the different classes of devices

A device drivers job is to translate requests through the device-independent standard interface (open, close, read, write) into appropriate sequence of commands (register manipulations) for the particular hardware.

Programmed I/O

I/O controller performs an action rather than the processor
Known as polling or busy waiting
Wastes CPU cycles

Interrupt Driven I/O

Processor is interrupted when I/O controller is ready to exchange data
No needless waiting
Consumes a lot of processor time because every word read or written passes through the processor

Direct Memory Access

Transfers data directly between memory and device, CPU not needed for copying
Processor is only involved at the beginning and end of transfer
Reduces number of interrupts
Requires contiguous regions (buffers)

Buffering

Firstly, let

$T$ be the transfer time for a block from device
$C$ be the computation time to process incoming block
$M$ be the time to copy kernel buffer to user buffer

No Buffering

Process must read/write a device a byte/word at a time
Process must what until each I/O is complete
Inefficient

Cost: $T + C$

User Level Buffering

Process specifies a memory buffer that incoming data is placed in until it fills
Only a single system call, and block/wakeup per data buffer
Could lose data if buffer is paged out to disk

Single Buffer

Operating system assigns a buffer in kernel’s memory for an I/O request
Swapping can occur since input is taking place in system memory

Cost: $max(T, C) + M$

Double Buffer

Use two system buffers rather than one
A process can transfer data to or from one buffer while the operating system empties or fills the other buffer
May be insufficient for really bursty traffic

Cost: $max(T, C + M)$

Circular Buffer

More than two buffers are used
Each individual buffer is one unit in a circular buffer
Used when I/O operation must keep up with process

Multiprocessors

Bus Based UMA

More than one processor on a single bus connected to memory
Bus bandwidth becomes a bottleneck with more than just a few CPUs
Each processor has a cache to reduce its need for memory access
Each processor has private local memory
- Keep private data off the shared memory bus
- Complicates application development

Cache Consistency

Usually handled by hardware
Writes to one cache propagate or invalidate to other caches

Multiprocessor Operating Systems

Individual Operating Systems

Each CPU has its own operating system
Simple to implement, avoids concurrency issues
Consistency is an issue

Master Slave Multiprocessors

Operating system runs mostly on a single CPU (CPU 1)
User level apps run on the other CPUs
All system calls are passed to CPU 1 for processing
Little synchronisation required
Single, centralised scheduler
Memory can be allocated as needed to all CPUs
Cross CPU traffic is slow compared to local

Symmetric Multiprocessors

OS kernel run on all processors
Concurrency issues across processors, solve using one of:
- Have 1 single “big lock” so only one CPU can be in the kernel at a time
- Identify largely independent parts of the kernel and make each of them their own critical section

MCS Locks

Each CPU enqueues its own private lock variable into a queue and spins on it
On lock release, the releaser unlocks the next lock in the queue

Scheduling

Preemptive And Non-Preemptive Scheduling

Non-Preemptive
- Once a thread is in the running state, it continues until it completes, it blocks on I/O or voluntarily yields the CPU
Preemptive
- Current thread can be interrupted by OS and moved to ready state
- Usually after a timer interrupt and process has exceeded its maximum run time
- Ensures fairer service as single thread can’t monopolise the system

Scheduling Algorithms

Goals
- Fairness
- Policy enforcement
- Balance/efficiency

Interactive System Algorithms

Minimise response time
Provide proportionality

Real Time System Algorithms

Must meet deadlines
Provide predictability

Interactive Scheduling

Round Robin Scheduling

Each process is given a timeslice to run in
When the timeslice expires, the next process preempts the current process, and runs for its timeslice, and so on

Priorities

Each process or thread is associated with a priority
Provides a basic mechanism to influence a scheduler decision
Usually implemented by multiple priority queues, with round robin on each queue

Single Shared Ready Queue

When a CPU goes idle, it takes the highest priority process from the shared ready queue

Real Time Scheduling

Correctness of the system may depend not only on the logical result of the computation but also on the time when these results are produced
Not necessarily fast, but must be predictable

Given

$m$ periodic events
event $i$ occurs within period $P_i$ and requires $C_i$ seconds

then the load can only be handled if $\Sigma_{i = 1}^{m} \dfrac{C_i}{P_i} \le 1$.

Soft Real Time Systems

Must mostly meet all deadlines. For example:

Washing machine runs slightly shorter or longer every so often
Frames may get dropped at times in a multi media system

Hard Real Time Systems

Must always meet all deadlines. For example:

An airbag system has to react in milliseconds
A jet must adjust immediately to winds

Rate Monotonic Scheduling

Priorities are assigned based on the period of each task
The shorter the period, the higher the priority

Earliest Deadline First Scheduling

The task with the earliest deadline is chosen next