Discussion about this post

User's avatar
Kenneth Bingham's avatar

Treat AI using old paradigms is legacy thinking. AI is not software unless you make it so by treating it like software

Your article is one of the clearest breakdowns of agent evaluation I’ve seen — the scaffolds, the benchmarks, the grading logic, the task design, the regression sets, all of it. But there’s a deeper issue that sits underneath the entire evaluation paradigm, and I think it’s worth naming.

Everything in your framework treats AI as if it were software.

Instructions → Procedures

Scaffolds → Pipelines

Benchmarks → Deterministic tests

Success → Matching a reference trajectory

Failure → Deviating from the script

This is the same mindset we used for traditional software systems, and it worked well for them. But applying it to modern AI is like treating the automobile as a horse and buggy. The old methods are familiar, but they fundamentally constrain the new medium.

The problem is that instructions create software, not intelligence.

Instructions bind the system to a fixed sequence of steps.

Instructions define correctness as conformity.

Instructions force the agent to behave like a deterministic machine.

If we want AI to grow beyond software‑level behavior, we need to shift from instruction‑based directives to behavior‑based directives.

A behavior is not a script.

A behavior is a boundary.

Behaviors define what is permissible and what is not, but they do not dictate the exact steps the agent must take. They create a space of possibility rather than a chain of obligations. This is how biological systems operate, and it’s how dimensional systems operate.

In my own work with manifolds and dimensional models, I treat AI as a geometric participant rather than a procedural engine. Instead of giving it instructions, I give it behavioral boundaries inside a manifold. The agent adapts, explores, and self‑organizes within those boundaries. Intelligence emerges from relationships, not from scripts.

This approach solves several of the issues you highlight:

1. Tool misuse becomes self‑correcting

Because the agent isn’t forced into a rigid protocol, it can adaptively choose tools based on behavioral constraints rather than brittle templates.

2. Context rot becomes a spatial problem, not a token problem

Behavioral boundaries allow the system to prioritize relevance geometrically rather than sequentially.

3. Long‑horizon reasoning becomes emergent

Instead of forcing the agent through a procedural loop, the manifold provides a dimensional structure where reasoning is a path, not a script.

4. Evaluation becomes simpler and more realistic

You evaluate whether the agent stayed within behavioral boundaries and achieved the goal — not whether it followed a predefined trajectory.

5. Agents stop behaving like software

Because they’re no longer being treated like software.

Concrete Solutions You Can Add to His Framework

Here are practical ways to integrate this into the evaluation paradigm he describes:

Solution 1 — Replace procedural success criteria with behavioral success criteria

Instead of “did the agent follow the correct steps,” use:

Did the agent stay within behavioral boundaries?

Did it avoid forbidden behaviors?

Did it achieve the outcome without violating constraints?

Solution 2 — Evaluate outcomes, not trajectories

The agent should be free to find its own path through the manifold.

Solution 3 — Use manifolds as the organizing structure

Replace linear scaffolds with geometric spaces where relationships guide action.

Solution 4 — Treat tools as affordances, not required steps

Tools become options, not obligations.

Solution 5 — Build agents that grow through relationships, not instructions

This is the dimensional approach: intelligence emerges from position, orientation, and relational structure.

Your article captures the strengths and limitations of the current paradigm extremely well. My contribution is simply this:

As long as we evaluate AI like software, we will get software‑level intelligence.

When we evaluate AI through behaviors and dimensional boundaries, we get something far more capable.

That’s the next frontier.

ToxSec's avatar

Incredibly in-depth article on this subject. I feel like i can re-read this a few times to fully get all the useful information here.

13 more comments...

No posts

Ready for more?