SystemsSystems & ML2026

Distributed Neural Network

Multi-layer perceptron built from scratch in C++17, accelerated with SIMD and distributed across workers over TCP.

Distributed Neural Network
01Problem

Training a neural network on a single core is slow and opaque. Distributing gradient computation across workers requires solving synchronization, IPC overhead, and numerical precision at once.

02Build

A multi-layer perceptron implemented from scratch in C++17 — custom linear algebra, activation functions, and SGD — accelerated with 256-bit SIMD intrinsics and parallelized across a parameter server–worker architecture with inter-process communication over TCP sockets.

03Result

4x reduction in computation time from SIMD vectorization; distributed design scales training across multiple workers with well-defined IPC boundaries.

Product Surface

Distributed Neural Network product surface

Technical Specification

Stack
  • C++17
  • CMake
  • SIMD
  • TCP
RoleSystems & ML
Year2026

Highlights

  • Custom linear algebra, activations, and SGD — no ML libraries
  • 4x speedup via 256-bit SIMD vector intrinsics
  • Parameter server–worker architecture with TCP-based IPC

Next

PintOS Operating System