Last updated on December 29, 2022

Instruction Set Principles

Appendix A in Computer Architecture A Quantitative Approach (6th) by Hennessy and Patterson (2017)

Introduction

Instruction set architecture:
- The portion of the computer visible to the programmer or compiler writer

Classifying Instruction Set Architectures

Alternatives inside a processors
- Stack vs Accumulator vs Set of registers
- One of operands
  - Stack architecture: Implicitly on top of stack
  - Accumulator architecture: Implicitly the accumulator
  - General-purpose register (GPR) architecture: Only explicit operands (reg or mem)
Register Computers
- Register-memory architecture
- Load-store architecture
- Memory-memory architecture
- More registers for special case usage
  - Extended accumulator
  - Special purpose register computer
Why GPR? Registers are …
- Efficient!
- Good for compliers to use than other types of storage
- Able to hold variables
Q. How many regs?
- Compliers would prefer that all regs be equivalent and unreserved
- So, the answer depends on the effective of the complier
Two major instruction set characteristics of GPR architecture
- ALU instructions has two or three operands?
- How many operands may be memory addresses in ALU instructions?
  - Zero: Load-store architecture
  - One: Register-memory architecture
  - Two/Three: Memory-memory architecture

Memory Addressing

How memory addresses are interpreted and how they are specified?

Interpreting memory addresses

Conventions for ordering the bytes within a larger object
- Little Endian byte order: Put the 0th byte at the least significant position (to the left)
- Big Endian byte order: Put the 0th byte at the most significant position (to the right)
- String case?
  - Big Endian: BACKWARD
  - Litter Endian: DRAWKCAB
Object address alignment
- Address A with size S: A%S=0 $\rightarrow$ Aligned
- If misaligned, may need two memory accesses since memories are typically aligned on a multiple of word or double-word boundary

Addressing modes

How architects specify the address of an object they will access?
Advantages
- Significantly reduces instruction counts
Disadvantages
- Add to the complexity of building a computer with increased average cycle-per-instruction (CPI)

Displacement Addressing Mode

eg. Add R4, 100(R1)
Major questions? : proper range of displacements?
- Possible displacement range vs Instruction length
- In practice, same with the immediate size

Immediate or Literal Addressing Mode

Used in arithmetic operations, comparison, and moves
~ 1/4 have an immediate operand
Immediate instruction set measurement? :
- Support all operations? or just subset?
- proper range of values for immediates?

Summary: Memory Addressing

Support at least the following addressing modes
- Displacement, immediate, register-indirect (77%-99%)
The size of the address for displacement mode: at least 12-16 bits (75%-99%)
The size of the immediate field: at least 8-16 bits

Type and Size of Operands

Character 8b (Unicode 16b), half word 16b, word 32b, single-precision float 32b, double-precision float 64b …
Should 64b access path be supported?
How to designate the type of an operand?
- Encoding in the opcode: Most often
- A tag annotated to the data and hardware-based interpretation: Out-of-date

Operations in the instruction set

Categories of instructions include in most ISAs
10 Most frequently used instructions
- So, make these operations faster!

Instructions for Control Flow

Change in control
- Unconditional: Jump
- Conditional: Branch
Four different types of control flow change:
- Conditional branches
- Jumps
- Procedure calls
- Procedure returns

Address Modes for Control Flow Instructions

Always specify the destination address
PC-relative control flow instructions:
- Program counter (PC) + Displacement
- Fewer bit displacement if the destination is not too far from current instruction
- Position independence
  - Efficient when the program is linked / linked dynamically during execution
Return/indirect jumps
- Need other than PC-relative addressing if the target is not know at compile time
When to use Register Indirect Jumps (= Target address is not know at compile time)
- Case or switch statements (selects among several alternatives)
- Virtual functions or methods in OOP C++/Java (eg. method override)
- High-order functions or function pointers in C/C++ (function to be passed as arguments)
- Dynamically shared libraries (a library to be loaded/linked at runtime only)
Next Question: How far branch targets are from branches?

Conditional Branch Options

Three primary techniques to evaluate branch conditions
- Condition code
- Condition register
- Compare and branch

Procedure Invocation Options

Return address must be saved somewhere
- Special link register or a GPR
Two basic conventions to save registers
- Caller saving
- Callee saving
- Caller vs Callee saved registers?
- Example: RISC-V Bytes: Caller and Callee Saved Registers

Summary: Instructions for Control Flow

Most frequently executed instructions
New ISA needs to have
- Branch/Jump to hundreds of instruction distances $\rightarrow$ PC-relative branch displacement of at least 8 bits
- Register indirect, PC-relative addressing to support returns and others
So New ISA should be like ….
- a load-store architecture
  - with displacement, immediate, and register indirect addressing modes
- data: 8-, 16-, 32-, and 64-bit integers and 32- and 64-bit floating-point data
- Instructions for
  - simple operations,
  - PC-relative conditional branches,
  - jump and link instructions
    - for procedure call
  - register indirect jumps
    - for procedure return (plus a few other uses).

Encoding an Instruction Set

How to decode the binary representation of instruction?
- Specified with opcode
The important decision: How to encode the addressing modes with the operations?
- 1 to 10 addressing modes
- 1 to 5 operands
- $\rightarrow$ Impact on the size of instructions
- Many more bits are consumed in encoding addressing modes and register fields than in specifying the opcode
Competing forces when encoding the instruction set
- Desire to have many registers and addressing modes
- Impact of the size of registers and addressing mode fields
- Desire to regular instruction length for easy handling in pipelined implementation (at least, multiples of bytes)
Three popular choices for encoding the instruction set
- Variable / Fixed / Hybrid

Example of variable encoding in 80x86 (1~17 bytes)
- add EAX, 1000(EBX)
- 1 byte for opcode of 32 bit integer add operation with two operands
- 1 bytes for source/destination reg, addressing mode (displacement), and base reg.
- 4 bytes for the size of the address field ( = 1000 = $2^8$ > 1byte and < 4 byte)
- Thus, 6 bytes-instruction

Reduced Code Size in RISCs

Standard = 32 bit instruction
Hybrid
- Support both 16 (narrow instructions with less operations/address/immediates) and 32 bit
- Effectively, its instruction caches looks 25% larger
Compression
- IBM simply hardware-base compress/decompress instructions with 32 bit (Run-length encoding)
- In memory: Compressed
- In instruction-cache: Decompressed
- Simple compiler and decoder
- To handle branches, a hash table in memory that maps between compressed and uncompressed addresses.

Summary: Encoding an Instruction Set

More interested in performance: Variable encoding
More interested in code size: Fixed encoding

Crosscutting Issues: The Role of Compilers

Understanding compiler technology today is critical to designing and efficiently implementing an instruction set.

The Structure of Recent Compilers

Goal?
- Correctness, speed of compiled code, fast compilation, debugging support, interoperability among language
Optimization passes (phases)
Phase ordering problem
- eg. global common subexpression elimination: store common results to temporary registers, and replace expressions with same results with the temporary registers. So, definitely need some registers to be reserved for the temporary results
Optimization classes
- High-level optimizations
- Local optimizations (basic block)
- Global optimizations (across branches, optimizing loops)
- Register allocation
- Processor-dependent optimizations

Register Allocation

Graph coloring
- Construct a graph representing the possible candidates for allocation to a register and then to use the graph to allocate registers
- Use a limited set of colors so that no two adjacent nodes in a dependency graph have the same color
- Achieve 100% register allocation
- NP-complete, heuristic algorithms
- Works best when there are at least 16 general-purpose registers available (more for floating-point)

Impact of Optimization on Performance

Major types of optimizations and examples in each class
Change in instruction count for the programs lucas and mcf from the SPEC2000 as compiler optimization levels vary
The Impact of Compiler Technology on the Architect’s Decisions
- How are variables allocated and addressed?
- How many registers are needed to allocate variables appropriately?
- Stack $\leftarrow$ Local variables
  - Primarily scalars, addressed relative to the stack pointer
- Global data area $\leftarrow$ Statically declared objects such as global variables and constants
  - Usually array or other aggregate data structures
- Heap $\leftarrow$ Dynamic objects
  - Accessed by pointers
- Aliasing (reference a local variable with a pointer):
  - Variables referred by pointers (a = *p) cannot be register allocated
  - Difficult or impossible to decide what a pointer may refer to

How the Architect Can Help the Compiler Writer

Provide regularity
- Make three primary components of instruction set to be orthogonal
  - operations, data types, addressing modes
- Counter example: special-purpose registers
Provide primitives, not solutions
Simplify trade-offs among alternatives
Provide instructions that bind the quantities known at compile time as constants

Compiler Support (or Lack Thereof) for Multimedia Instructions

SIMD instruction designers ignored the previous subsection!
- Solutions, not primitives
- Short of registers
- Data types not matching existing program languages
- Require low-level graphics library and its own compiler technology
- Intel’s MMX
  - with vectors of eight 8bit / four 16-bit / two 32-bit elements
  - 64-bit = the sum of the sizes of the elements
  - 128-bit = streaming SIMD extension (SSE)
Vector computers
- Hiding latency of memory access
- Loading many elements at once
- Overlapping execution with data transfer
- Collect data scattered about memory
- eg. Strided addressing and gather/scatter addressing
  - But, SIMD computers support only unit stride accesses

Putting It All Together: The RISC-V Architecture

RISC-V emphasizes
- A simple load-store instruction set
- Design for pipelining efficiency including a fixed instruction set encoding
- Efficiency as a compiler target
- So, easy architecture to understand

RISC-V Instruction Set Organization

Three base instruction sets and the instruction set extensions
In this section, RV64IMAFD (aka RV64G) is illustrated

Registers for RISC-V

32 64-bit general-purpose registers (GPR, aka integer registers, x0-x31)
‘F’ and ‘D’ extensions: 32 32-bit floating point registers (FPR, f0-f31)
x0 = always zero
GPR $\leftrightarrow$ Special registers : eg. floating-point status register holds information about the results of FP operations

Data Types for RISC-V

8, 16, 32, 64 bits for integer, 32, 64 bits for floating point

Addressing Modes for RISC-V Data Transfers

Immediate and Displacement with 12-bit fields
Register indirect: placing 0 in the 12-bit displacement field
Limited absolute addressing with a 12-bit field: using register 0 as the base register
RV64G memory: byte addressing with 64-bit address, Little Endian byte numbering
Memory access need not be aligned; but aligned access is better
RISC V addressing modes from CS61C doc
- (a) Base displacement addressing: Adds an immediate to a register value to create a memory address (used for lw, lb, sw, sb)
- (b) PC-relative addressing: Uses the PC and adds the immediate value of the instruction (multiplied by 2) to create an address (used by branch and jump instructions)
- (c) Register Addressing: Uses the value in a register as a memory address (jr)

RISC-V Instruction Format

32-bit instructions with a 7-bit primary opcode wiki
The use of instruction fields for each instruction type
Opcode specifies operations (ALU instruction, ALU immediate, load, store, branch, jump)
- “funct” fields $\rightarrow$ specific operations (add, subtract …)

RISC-V Operations (RV64G)

Loads and stores
ALU operations
- add, subtract, AND, OR, XOR, shifts, LUI (load upper immediate)
Branches and Jumps
Floating-point
RISC-V Control Flow Instructions
Typical control flow instructions in RISC-V
Offset: half word offset (so, need 1bit shift)
Branches: equal, greater than, less than, their inverses)
RISC-V Floating-Point Operations
fadd.d/s: FP with double/single precision
flw, fsw: FP load, FP store