TinyGPU is a tiny educational GPU simulator β a minimal SIMT-style simulator with:
π Built for learning and visualization - see how threads, registers, and memory interact across cycles!
SHLD and SHST for robust shared memory operations.SYNC semantics for better thread coordination.ruff, black) and testing workflows.

Clone and install (editable):
git clone https://github.com/deaneeth/tinygpu.git
cd tinygpu
pip install -e .
pip install -r requirements-dev.txt
Run a demo (odd-even sort):
python -m examples.run_odd_even_sort
Produces:
outputs/run_odd_even_sort/run_odd_even_sort_*.gifβ a visual GPU-style sorting process.
examples/run_vector_add.py β simple parallel vector addexamples/run_vector_add_kernel.py β vector add with kernel argumentsexamples/run_test_loop.py β branch/loop test (sum 1..4)examples/run_test_cmp.py β comparison and branching testexamples/run_test_kernel_args.py β kernel arguments testexamples/run_odd_even_sort.py β odd-even transposition sort (GIF)examples/run_reduce_sum.py β parallel reduction (GIF)examples/run_block_shared_sum.py β per-block shared memory exampleexamples/run_sync_test.py β synchronization testexamples/debug_repl.py β interactive REPL debugger| Instruction | Operands | Description |
|---|---|---|
SET Rd, imm |
Rd = destination register, imm = immediate value |
Set register Rd to an immediate constant. |
ADD Rd, Ra, Rb |
Rd = destination, Ra + Rb |
Add two registers and store result in Rd. |
ADD Rd, Ra, imm |
Rd = destination, Ra + immediate |
Add register and immediate value. |
MUL Rd, Ra, Rb |
Multiply two registers. | Rd = Ra * Rb |
MUL Rd, Ra, imm |
Multiply register by immediate. | Rd = Ra * imm |
LD Rd, addr |
Load from memory address into register. | Rd = mem[addr] |
LD Rd, Rk |
Load from address in register Rk. |
Rd = mem[Rk] |
ST addr, Rs |
Store register into memory address. | mem[addr] = Rs |
ST Rk, Rs |
Store value from Rs into memory at address in register Rk. |
mem[Rk] = Rs |
SHLD Rd, saddr |
Load from shared memory into register. | Rd = shared_mem[saddr] |
SHST saddr, Rs |
Store register into shared memory. | shared_mem[saddr] = Rs |
CSWAP addrA, addrB |
Compare-and-swap memory values. | If mem[addrA] > mem[addrB], swap them. Used for sorting. |
CMP Ra, Rb |
Compare and set flags. | Set Z/N/G flags based on Ra - Rb. |
BRGT target |
Branch if greater. | Jump to target if G flag set. |
BRLT target |
Branch if less. | Jump to target if N flag set. |
BRZ target |
Branch if zero. | Jump to target if Z flag set. |
JMP target |
Label or immediate. | Unconditional jump β sets PC to target. |
SYNC |
(no operands) | Global synchronization barrier β all threads must reach this point. |
SYNCB |
(no operands) | Block-level synchronization barrier. |
.github/workflows/ci.yml for CI and packagingMIT β See LICENSE.