Instruction Set Principles


Introduction

  • Instruction set architecture:
    • The portion of the computer visible to the programmer or compiler writer

Classifying Instruction Set Architectures

  • Alternatives inside a processors
    • Stack vs Accumulator vs Set of registers
    • One of operands
      • Stack architecture: Implicitly on top of stack
      • Accumulator architecture: Implicitly the accumulator
      • General-purpose register (GPR) architecture: Only explicit operands (reg or mem)
  • Register Computers
    • Register-memory architecture
    • Load-store architecture
    • Memory-memory architecture
    • More registers for special case usage
      • Extended accumulator
      • Special purpose register computer
  • Why GPR? Registers are …
    • Efficient!
    • Good for compliers to use than other types of storage
    • Able to hold variables
  • Q. How many regs?
    • Compliers would prefer that all regs be equivalent and unreserved
    • So, the answer depends on the effective of the complier
  • Two major instruction set characteristics of GPR architecture
    • ALU instructions has two or three operands?
    • How many operands may be memory addresses in ALU instructions?
      • Zero: Load-store architecture
      • One: Register-memory architecture
      • Two/Three: Memory-memory architecture


Memory Addressing

  • How memory addresses are interpreted and how they are specified?

Interpreting memory addresses

  • Conventions for ordering the bytes within a larger object
    • Little Endian byte order: Put the 0th byte at the least significant position (to the left)
    • Big Endian byte order: Put the 0th byte at the most significant position (to the right)
    • String case?
      • Big Endian: BACKWARD
      • Litter Endian: DRAWKCAB
  • Object address alignment
    • Address A with size S: A%S=0 $\rightarrow$ Aligned
    • If misaligned, may need two memory accesses since memories are typically aligned on a multiple of word or double-word boundary

Addressing modes

  • How architects specify the address of an object they will access?
  • Advantages
    • Significantly reduces instruction counts
  • Disadvantages
    • Add to the complexity of building a computer with increased average cycle-per-instruction (CPI)

Displacement Addressing Mode

  • eg. Add R4, 100(R1)
  • Major questions? : proper range of displacements?
    • Possible displacement range vs Instruction length
    • In practice, same with the immediate size

Immediate or Literal Addressing Mode

  • Used in arithmetic operations, comparison, and moves
  • ~ 1/4 have an immediate operand
  • Immediate instruction set measurement? :
    • Support all operations? or just subset?
    • proper range of values for immediates?

Summary: Memory Addressing

  • Support at least the following addressing modes
    • Displacement, immediate, register-indirect (77%-99%)
  • The size of the address for displacement mode: at least 12-16 bits (75%-99%)
  • The size of the immediate field: at least 8-16 bits

Type and Size of Operands

  • Character 8b (Unicode 16b), half word 16b, word 32b, single-precision float 32b, double-precision float 64b …
  • Should 64b access path be supported?
  • How to designate the type of an operand?
    • Encoding in the opcode: Most often
    • A tag annotated to the data and hardware-based interpretation: Out-of-date

Operations in the instruction set

  • Categories of instructions include in most ISAs
  • 10 Most frequently used instructions
    • So, make these operations faster!


Instructions for Control Flow

  • Change in control
    • Unconditional: Jump
    • Conditional: Branch
  • Four different types of control flow change:
    • Conditional branches
    • Jumps
    • Procedure calls
    • Procedure returns

Address Modes for Control Flow Instructions

  • Always specify the destination address
  • PC-relative control flow instructions:
    • Program counter (PC) + Displacement
    • Fewer bit displacement if the destination is not too far from current instruction
    • Position independence
      • Efficient when the program is linked / linked dynamically during execution
  • Return/indirect jumps
    • Need other than PC-relative addressing if the target is not know at compile time
  • When to use Register Indirect Jumps (= Target address is not know at compile time)
    • Case or switch statements (selects among several alternatives)
    • Virtual functions or methods in OOP C++/Java (eg. method override)
    • High-order functions or function pointers in C/C++ (function to be passed as arguments)
    • Dynamically shared libraries (a library to be loaded/linked at runtime only)
  • Next Question: How far branch targets are from branches?

Conditional Branch Options

  • Three primary techniques to evaluate branch conditions
    • Condition code
    • Condition register
    • Compare and branch

Procedure Invocation Options

Summary: Instructions for Control Flow

  • Most frequently executed instructions
  • New ISA needs to have
    • Branch/Jump to hundreds of instruction distances $\rightarrow$ PC-relative branch displacement of at least 8 bits
    • Register indirect, PC-relative addressing to support returns and others
  • So New ISA should be like ….
    • a load-store architecture
      • with displacement, immediate, and register indirect addressing modes
    • data: 8-, 16-, 32-, and 64-bit integers and 32- and 64-bit floating-point data
    • Instructions for
      • simple operations,
      • PC-relative conditional branches,
      • jump and link instructions
        • for procedure call
      • register indirect jumps
        • for procedure return (plus a few other uses).

Encoding an Instruction Set

  • How to decode the binary representation of instruction?
    • Specified with opcode
  • The important decision: How to encode the addressing modes with the operations?
    • 1 to 10 addressing modes
    • 1 to 5 operands
    • $\rightarrow$ Impact on the size of instructions
    • Many more bits are consumed in encoding addressing modes and register fields than in specifying the opcode
  • Competing forces when encoding the instruction set
    • Desire to have many registers and addressing modes
    • Impact of the size of registers and addressing mode fields
    • Desire to regular instruction length for easy handling in pipelined implementation (at least, multiples of bytes)
  • Three popular choices for encoding the instruction set
    • Variable / Fixed / Hybrid
  • Example of variable encoding in 80x86 (1~17 bytes)
    • add EAX, 1000(EBX)
    • 1 byte for opcode of 32 bit integer add operation with two operands
    • 1 bytes for source/destination reg, addressing mode (displacement), and base reg.
    • 4 bytes for the size of the address field ( = 1000 = $2^8$ > 1byte and < 4 byte)
    • Thus, 6 bytes-instruction

Reduced Code Size in RISCs

  • Standard = 32 bit instruction
  • Hybrid
    • Support both 16 (narrow instructions with less operations/address/immediates) and 32 bit
    • Effectively, its instruction caches looks 25% larger
  • Compression
    • IBM simply hardware-base compress/decompress instructions with 32 bit (Run-length encoding)
    • In memory: Compressed
    • In instruction-cache: Decompressed
    • Simple compiler and decoder
    • To handle branches, a hash table in memory that maps between compressed and uncompressed addresses.

Summary: Encoding an Instruction Set

  • More interested in performance: Variable encoding
  • More interested in code size: Fixed encoding

Crosscutting Issues: The Role of Compilers

  • Understanding compiler technology today is critical to designing and efficiently implementing an instruction set.

The Structure of Recent Compilers

  • Goal?
    • Correctness, speed of compiled code, fast compilation, debugging support, interoperability among language
  • Optimization passes (phases)
  • Phase ordering problem
    • eg. global common subexpression elimination: store common results to temporary registers, and replace expressions with same results with the temporary registers. So, definitely need some registers to be reserved for the temporary results
  • Optimization classes
    • High-level optimizations
    • Local optimizations (basic block)
    • Global optimizations (across branches, optimizing loops)
    • Register allocation
    • Processor-dependent optimizations

Register Allocation

  • Graph coloring
    • Construct a graph representing the possible candidates for allocation to a register and then to use the graph to allocate registers
    • Use a limited set of colors so that no two adjacent nodes in a dependency graph have the same color
    • Achieve 100% register allocation
    • NP-complete, heuristic algorithms
    • Works best when there are at least 16 general-purpose registers available (more for floating-point)

Impact of Optimization on Performance

  • Major types of optimizations and examples in each class
  • Change in instruction count for the programs lucas and mcf from the SPEC2000 as compiler optimization levels vary

    The Impact of Compiler Technology on the Architect’s Decisions

    • How are variables allocated and addressed?
    • How many registers are needed to allocate variables appropriately?
    • Stack $\leftarrow$ Local variables
      • Primarily scalars, addressed relative to the stack pointer
    • Global data area $\leftarrow$ Statically declared objects such as global variables and constants
      • Usually array or other aggregate data structures
    • Heap $\leftarrow$ Dynamic objects
      • Accessed by pointers
    • Aliasing (reference a local variable with a pointer):
      • Variables referred by pointers (a = *p) cannot be register allocated
      • Difficult or impossible to decide what a pointer may refer to

How the Architect Can Help the Compiler Writer

  • Provide regularity
    • Make three primary components of instruction set to be orthogonal
      • operations, data types, addressing modes
    • Counter example: special-purpose registers
  • Provide primitives, not solutions
  • Simplify trade-offs among alternatives
  • Provide instructions that bind the quantities known at compile time as constants

Compiler Support (or Lack Thereof) for Multimedia Instructions

  • SIMD instruction designers ignored the previous subsection!
    • Solutions, not primitives
    • Short of registers
    • Data types not matching existing program languages
    • Require low-level graphics library and its own compiler technology
    • Intel’s MMX
      • with vectors of eight 8bit / four 16-bit / two 32-bit elements
      • 64-bit = the sum of the sizes of the elements
      • 128-bit = streaming SIMD extension (SSE)
  • Vector computers
    • Hiding latency of memory access
    • Loading many elements at once
    • Overlapping execution with data transfer
    • Collect data scattered about memory
    • eg. Strided addressing and gather/scatter addressing
      • But, SIMD computers support only unit stride accesses

Putting It All Together: The RISC-V Architecture

  • RISC-V emphasizes
    • A simple load-store instruction set
    • Design for pipelining efficiency including a fixed instruction set encoding
    • Efficiency as a compiler target
    • So, easy architecture to understand

RISC-V Instruction Set Organization

  • Three base instruction sets and the instruction set extensions

  • In this section, RV64IMAFD (aka RV64G) is illustrated

Registers for RISC-V

  • 32 64-bit general-purpose registers (GPR, aka integer registers, x0-x31)
  • ‘F’ and ‘D’ extensions: 32 32-bit floating point registers (FPR, f0-f31)
  • x0 = always zero
  • GPR $\leftrightarrow$ Special registers : eg. floating-point status register holds information about the results of FP operations

Data Types for RISC-V

  • 8, 16, 32, 64 bits for integer, 32, 64 bits for floating point

Addressing Modes for RISC-V Data Transfers

  • Immediate and Displacement with 12-bit fields
  • Register indirect: placing 0 in the 12-bit displacement field
  • Limited absolute addressing with a 12-bit field: using register 0 as the base register
  • RV64G memory: byte addressing with 64-bit address, Little Endian byte numbering
  • Memory access need not be aligned; but aligned access is better
  • RISC V addressing modes from CS61C doc
    • (a) Base displacement addressing: Adds an immediate to a register value to create a memory address (used for lw, lb, sw, sb)
    • (b) PC-relative addressing: Uses the PC and adds the immediate value of the instruction (multiplied by 2) to create an address (used by branch and jump instructions)
    • (c) Register Addressing: Uses the value in a register as a memory address (jr)

RISC-V Instruction Format

  • 32-bit instructions with a 7-bit primary opcode wiki
  • The use of instruction fields for each instruction type
  • Opcode specifies operations (ALU instruction, ALU immediate, load, store, branch, jump)
    • “funct” fields $\rightarrow$ specific operations (add, subtract …)

RISC-V Operations (RV64G)

  • Loads and stores
  • ALU operations
    • add, subtract, AND, OR, XOR, shifts, LUI (load upper immediate)
  • Branches and Jumps
  • Floating-point
  • RISC-V Control Flow Instructions
  • Typical control flow instructions in RISC-V
  • Offset: half word offset (so, need 1bit shift)
  • Branches: equal, greater than, less than, their inverses)
  • RISC-V Floating-Point Operations
  • fadd.d/s: FP with double/single precision
  • flw, fsw: FP load, FP store

RISC-V Instruction Set Cheatsheat by Erik Engheim

  • Arithmetic operations
  • Logical operations
  • Load/Store operations
  • Branching operations

Reference

Notes Mentioning This Note

Table of Contents


Share on: