The Journey from Software Developer to AI Researcher

This document is a personal narrative. It captures the actual trajectory of learning, building, and thinking that defines the path from software developer to AI researcher. Written from the inside, while the journey is happening.

The Starting Point

The beginning was a question that wouldn't go away: what actually happens when you type a sentence into an AI? Not the surface-level answer—not “it predicts the next token.” The real answer. The mathematics. The architecture. The reason it works.

Before this question, the background was pure software engineering. Years of building real systems—Go backends, React frontends, TypeScript components, PostgreSQL databases. Production-level work on a national-scale health platform serving over 100 clinics across Somalia. The kind of engineering that teaches you how to ship things that work under pressure.

But using AI tools every day created a growing discomfort: the gap between operating these systems and understanding them. That gap became the fuel for everything that followed.

I don't want to just use the tools. I want to understand what's underneath and push toward the frontier.

The First Deep Nights — Vectors

It started with the most basic question: what is a vector? Not the textbook definition—the real intuition. A list of numbers that encodes meaning. An arrow in high-dimensional space where direction captures what something means and magnitude captures how strongly.

Night after night, the foundation was built concept by concept. No rushing. No skipping. Each idea was wrestled with until it could be explained from memory, not recited from a page.

The key insight: every concept connected back to transformers. The dot product isn't just a math operation—it's how one word asks “how relevant are you to me?” The linear combination isn't just a formula—it's how the model blends information from every word it's paying attention to.

Everything in modern AI runs on vectors. Every word, every position, every attention score, every hidden state in a transformer IS a vector.

Word2Vec From Scratch

The first real implementation. Not importing a library and calling a function. Writing Word2Vec from scratch in pure Python and NumPy—every line, every gradient, every update step.

Complete Skip-gram implementation
Two-matrix architecture (W_in and W_out)
Forward pass with dot product and sigmoid
Negative sampling for efficiency
Manual gradient computation and backprop

After training on a small corpus, the model discovered that “king” and “queen” are similar—0.968 cosine similarity—nobody told it they were related. It figured it out from context alone. The analogy king - man + woman ≈ queen actually worked.

More importantly, it proved that meaning emerges from structure. Random numbers, nudged billions of times by a simple rule—push similar things closer, push different things apart—become meaningful representations. The same principle that powers GPT-4 and Claude. Just at a smaller scale.

The Transformer — Piece by Piece

With the vector foundation solid, the next phase was the transformer itself. Not as a black box. Not as a diagram on a slide. As a step-by-step data flow that could be traced by hand.

The full data flow, understood from memory:

Tokenization turns words into integer IDs. Each ID grabs its row from the embedding table—768 random numbers that, after training, encode meaning. Positional encoding adds order information. Then the critical step: three weight matrices create Query, Key, and Value vectors for each word.

The Query asks “what am I looking for?” The Key says“what do I contain?” The Value carries “what information do I have?” At initialization, all three matrices are random. Random times random equals random. Everything is garbage at this point.

Attention scores compute Q · K via dot product—how much should each word attend to every other word? Scaled by 1/√d_k to prevent softmax saturation. Softmax turns raw scores into probabilities. Then blending: attention weights times V vectors produce a new representation for each word, enriched with context. The final output matrix produces 50,000+ scores—one per vocabulary word.

This entire flow was described from memory. Not memorized—understood. Each step connects back to the vector math foundations.

The Mathematics — Seven Domains

Understanding the transformer at an intuitive level is one thing. Understanding it with the precision of someone who could have invented it requires serious mathematics. A comprehensive 7-domain roadmap was designed:

Linear Algebra — the language transformers speak
Probability & Statistics — attention weights
Measure Theory — rigorous probability
Matrix Calculus — backpropagation
Optimization — training convergence
Information Theory — cross-entropy loss
Numerical Analysis — FlashAttention

Each domain was mapped to exactly where it lives in the transformer. The roadmap includes academic-grade resources: Gilbert Strang, Axler, Boyd, Cover & Thomas, MIT OCW, 3Blue1Brown, Karpathy. A 12-week execution schedule ending with a capstone: reproduce every equation in “Attention Is All You Need” from first principles, without looking anything up.

The Philosophy — AI Season

A core insight that emerged during this journey: the old learning model is broken. Historically, learning required both understanding AND execution—you had to hold the mental model and simultaneously grind through the mechanics. The bottleneck wasn't the knowledge itself. It was the execution tax.

Now execution can be delegated to AI. The bottleneck shifts entirely to the quality of thinking. An “AI season”—30 to 90 days of locked-in, focused work—can produce what used to take years, if the output is defined precisely and the mind stays locked in.

Clear output definition is the new intelligence. Not IQ. Not credentials. Not even prior knowledge. If you can define precisely what done looks like—you can navigate there.

The learning speed formula:

// The Learning Speed Formula

output = depth_of_lockin

× quality_of_questions

× speed_of_iteration

// execution is no longer in the formula

What's Being Built

NEURON—a complete architecture specification for a local, browser-based AI research workbench. Not a tutorial app. A research operating system with free-floating, resizable panels across five workspace types: Explorer, Code Lab, Reader, Canvas, and Builder. Full tech stack designed, database schema complete, API routes mapped. Ready for implementation.

The Direction

The direction is clear: AI and research. Not a job title. A trajectory.

Transformer mastery — first principles
Low-level systems — C/C++, CUDA, GPUs
Distributed systems — training at scale
Original research — contributor, not student

This document will be updated as the journey continues. It is a living record—captured in real time, from the inside.