~/projects/assembler-linker

$ ls tech-stack/

CSystems ProgrammingAssemblyCompiler Design

$ ls links/

no links available

$ cat overview.md

Project Overview

This project implements a complete assembly system for the LC-2K instruction set architecture, consisting of an assembler and linker. The system enables multi-file program development by supporting separate compilation and linking of LC-2K assembly files. It's designed to teach fundamental concepts of assembly, linking, and low-level programming.

Key Components

Assembler Implementation

The assembler is a two-pass system that processes LC-2K assembly files and produces object files containing:

  • Machine code instructions
  • Symbol tables for global labels
  • Relocation information for linking
  • Data section contents

Key features include:

c
// Example symbol table entry structure struct SymbolEntry { char *label; char type; // 'T'ext, 'D'ata, or 'U'ndefined int offset; };

The assembler handles both local and global symbols, with local symbols starting with lowercase letters and global symbols with uppercase. This distinction is crucial for proper scope handling during linking.

Object File Format

The object file structure follows a specific format with sections for:

  1. Header information (size of each section)
  2. Text segment (machine code)
  3. Data segment
  4. Symbol table
  5. Relocation table

Example object file layout:

6 1 1 2           // Header
0x00810006        // Text section
0x00840000
...
0x00000005        // Data section
SubAdr U 0        // Symbol table
0 lw five         // Relocation table
1 lw SubAdr

Linker Implementation

The linker combines multiple object files into a single executable by:

  1. Concatenating text and data segments
  2. Resolving global symbols across files
  3. Applying relocations
  4. Managing the stack section

A key feature is handling the special Stack label, which is automatically resolved to point beyond the combined text and data segments.

Technical Challenges

Symbol Resolution

One of the most complex aspects was implementing correct symbol resolution, particularly:

  • Handling undefined global references
  • Maintaining correct offsets after combining multiple files

Relocation Processing

The relocation system required careful attention to:

  • Calculating correct absolute addresses
  • Handling PC-relative addressing for branches
  • Managing symbol references across different sections

Implementation Details

Error Handling

The system implements robust error checking including:

  • Undefined local symbol detection
  • Invalid instruction format validation
  • Duplicate symbol definition checks
  • Offset range verification

Memory Management

Careful consideration was given to memory layout, particularly:

  • Separate text and data sections
  • Stack growth management
  • Symbol table organization

Learning Outcomes

This project provided deep insights into:

  1. Assembly language processing
  2. Object file formats and linking
  3. Symbol resolution and relocation
  4. Memory layout in executable files

Future Improvements

Potential enhancements could include:

  • Support for more complex relocations
  • Additional optimization passes
  • Enhanced error reporting
  • Support for external libraries

Code Examples

Relocation Entry Processing

c
void processRelocation(RelocationEntry *entry, int baseAddr) { int finalAddr = baseAddr + entry->offset; // Apply relocation to the instruction/data memory[entry->location] += finalAddr; }

Symbol Resolution

c
int resolveSymbol(char *symbol, SymbolTable *table) { for (int i = 0; i < table->size; i++) { if (strcmp(table->entries[i].name, symbol) == 0) { return table->entries[i].address; } } return -1; // Symbol not found }

The complete implementation demonstrates fundamental concepts in systems programming while providing practical experience with assembly language processing and program linking.

$ cat features.txt

  • Two-pass assembler supporting local and global symbols
  • Multi-file linker with symbol resolution
  • Object file generation with relocation information
  • Stack-based recursive function support
  • Error detection for undefined symbols and invalid instructions