DNN Runner

Forward-pass inference — iterative vs. unrolled static dataflow

Configuration

Input Vector (CSV)

Seed

Layers

Configure a network and click "Run forward pass".

Weights: He initialization — drawn uniformly from [−√(2/inSize), +√(2/inSize)]. Biases: uniform [−0.05, 0.05]. Forward pass: matmul + bias + element-wise activation. RNG: xorshift seeded by the seed field.

Cellular Automata

2D stencil computation — iterative vs. unrolled pipeline

Parameters

Width

Height

Steps

Seed

Rule

Initial State

Step

—

Alive

—

Iter cycles

—

Unrolled cycles

—

Run a simulation to see the grid.

Boundaries: Conway's Game of Life, Seeds, and Brian's Brain use full toroidal wrapping (both axes). Rule 30 and Rule 110 are 1D rules — the top row wraps horizontally; each step the rows shift down to show time as a scrolling history.

Kernel

Write a kernel — compose ops from the emulator API and see cycle output

Editor

buffer(values) — load a 1D array into a buffer

randn(shape, seed?) — random tensor, shape=[rows,cols] or [n]

matmul(W, x) — matrix × vector → vector

relu(x) / sigmoid(x) / tanh(x) — element-wise activations

add(a, b) — element-wise sum

scale(a, c) — multiply each element by scalar c

stencil(grid, rule, steps?) — run CA (rule: 'conway','rule30','rule110','seeds','brian_brain')

log(...args) — print to output log

Write a kernel and click "Run kernel".

Groq Emulator

This emulator explores Groq-like static dataflow computation — deterministic, pre-scheduled execution where computation graphs are compiled ahead of time and run without dynamic branching or runtime decisions.

Core abstractions

Buffer — SRAM-like memory bank holding tensors between compute stages.

Node — Computational unit (matmul, activation, stencil) with fixed input/output wiring.

Graph — DAG of nodes connected by buffers; statically scheduled before execution begins.

Cycle — Discrete time unit; each node fires at a deterministic cycle offset.

Iterative vs. unrolled

Iterative — Each layer runs sequentially. Total cost = Σ(per-layer cost). Simple, serial.

Unrolled — The computation graph is unrolled in time, exposing parallelism across spatial dimensions. Cost reduced by width-factor.