What Do Vectors Look Like in LLMs?
Large language models (LLMs) run on text, but they don’t “see” text as letters. They operate on vectors: long lists of numbers that represent meaning, context, and relationships in a form that math can handle. This article explains what those vectors look like, where they appear inside an LLM, and why they matter.
Vectors as Number Lists, Not Words
A vector in an LLM is a fixed-length array of floating-point numbers, often with sizes like 768, 1024, or 4096 dimensions. Each token—not each word—maps to exactly one vector.
A toy example (much smaller than a real model) might look like:
- “cat” →
[0.12, -0.44, 0.03, 0.88, …] - “dog” →
[0.10, -0.41, 0.07, 0.81, …] - “banana” →
[-0.30, 0.22, -0.91, 0.05, …]
These vectors are dense, meaning most entries are non-zero. They are also opaque:
- No single dimension reliably means “animal,” “past tense,” or “noun.”
- Meaning is distributed across many dimensions.
A helpful analogy:
A vector is less like a labeled checklist and more like a coordinate in meaning-space.
Why So Many Dimensions?
High dimensionality gives the model room to separate concepts. With only a few dimensions, many meanings would overlap. Thousands of dimensions allow the model to represent subtle differences such as:
- literal vs metaphorical meaning
- tone (formal, sarcastic, emotional)
- syntactic role
- topic and domain
Importantly, these dimensions are not designed by humans. They emerge during training because they make prediction easier.
Token Embeddings: The First Vectors
The first vectors in an LLM are token embeddings.
The model maintains a large table called an embedding matrix:
- Each row corresponds to a token ID
- Each row is a learned vector
When text is tokenized, each token ID is replaced by its embedding:
Html
These vectors are:
- Looked up, not computed dynamically
- Learned during training
- Shared across all contexts initially
At this stage, the vector for “cat” is the same whether it appears in:
- “The cat slept”
- “A cat is an animal”
- “Cat videos are popular”
Context has not entered yet.
Position Information: Encoding Order
Vectors alone don’t encode order. Without position information, a model would treat:
“dog bites man” “man bites dog”
as the same bag of tokens.
To fix this, LLMs combine token vectors with positional information:
Html
Position encodings can be:
- Learned (a table like embeddings)
- Computed (e.g., sinusoidal patterns)
- Relative (positions encoded via attention rules)
The key idea:
Every token vector now carries both “what it is” and “where it is.”
Contextual Vectors: Meaning Changes with Context
After the first transformer layer, vectors stop representing tokens and start representing tokens-in-context.
For example:
- “bank” in river bank
- “bank” in bank account
These start with the same embedding, but after attention:
- They become different vectors
- Their neighborhoods in vector space diverge
Each layer refines this further. By deeper layers, vectors encode:
- word sense
- syntactic role
- semantic dependencies
- discourse-level information
A useful mental model:
Early layers → lexical meaning Middle layers → syntax and relations Later layers → task-relevant meaning
Attention: How Vectors Interact
Inside each transformer block, every token vector is linearly projected into three new vectors:
- Query (Q) — what this token wants to find
- Key (K) — what this token offers
- Value (V) — the information it contributes
Attention works like this:
- Compare Q vectors to K vectors (via dot products)
- Convert similarities into weights
- Take a weighted sum of V vectors
- Produce a new vector for each token
Mathematically, vectors flow through shapes like:
(sequence_length × hidden_size)- →
(sequence_length × head_dim) - → mixed and recombined
- → back to
(sequence_length × hidden_size)
Conceptually:
Tokens “talk” by asking questions and listening to answers, all via vector math.
Geometry: Meaning as Distance and Direction
Vectors live in a high-dimensional geometric space. This makes certain geometric ideas fundamental:
Similarity
Vectors pointing in similar directions tend to represent similar meanings. Cosine similarity measures this directional closeness.
This is why:
- “cat” is closer to “dog” than to “banana”
- paraphrases cluster together
Structure, Not Labels
Some relationships appear as consistent directions:
- tense changes
- pluralization
- semantic shifts
These patterns are:
- statistical, not perfect
- emergent, not hard-coded
Still, they are strong enough to support:
- semantic search
- clustering
- retrieval-augmented generation
Intermediate Vectors Are the Model’s “Thought State”
At any moment, the model’s internal vectors represent:
- what has been read
- what matters right now
- what is likely to come next
They are not symbolic rules or explicit logic trees. They are continuous states optimized for prediction.
This is why probing vectors can reveal:
- topic awareness
- entity tracking
- syntactic structure
…but not clean, human-readable rules.
Output Vectors: Turning States into Tokens
To generate text, the final vector at the current position is mapped to vocabulary scores:
- Final hidden vector
- Linear projection → logits (one per vocabulary token)
- Softmax → probabilities
- Sampling → next token
So the model never “chooses a word” directly. It transforms a vector into probabilities over tokens.
What Vectors “Look Like” in Practice
In real systems, vectors are:
- Thousands of floating-point numbers
- Stored as tensors in GPU memory
- Processed in parallel as large matrices
- Mostly uninterpretable individually
- Powerful when analyzed statistically
LLMs are, at their core, vector transformation machines. They repeatedly reshape numeric spaces until the next token becomes predictable.
The vectors themselves look like numbers, but their behavior looks like language.












