How should software development respond to how LLMs operate? We need truly explainable Artificial Intelligence with testable components.
While the title Does current AI represent a dead end? is clearly made to encourage debate, there is a case within this academic article that is particularly pertinent to software developers:
“Current AI systems have no internal structure that relates meaningfully to their functionality. They cannot be developed, or reused, as components. There can be no separation of concerns or piecewise development.”
My post today is just about using Large Language Models (LLMs) as part of your product solution, not using AI as a tool during development (e.g., AI coding tools like Cursor and Zed AI). Using LLMs to perform specific Software Development Lifecycle Activities (SDLAs) has problems, but the way we make something is usually distinct from what we sell to customers. So, in the graph below, we are looking at the top two parts:
Image from Carnegie Mellon University Software Engineering Institute
The problem at the moment with LLMs is that they are provided much like cars. You pay for the whole thing, and there is no expectation of seeing it as a set of composable pieces. The lack of decomposability with cars is not an issue because driving is a heavily controlled activity. Even if you could compose a car from components like Lego, it wouldn’t be tolerated on a public road.
This is probably exactly what the large tech firms want — they want to sell you a large product or service, not a set of composable pieces that are easier for others to create. This way, only a small set of large manufacturers get to stay in the game. Keeping LLMs mysterious keeps their value high.
LLMs go against the general concept of computing, where tasks can be broken down.
But this goes against the general concept of computing, where tasks can be broken down. A working software component, whether built internally or not, is itself composed of code that can be unit-tested. They must work together with other components in a reliable manner.
Even if a product uses an Oracle database, we all understand that persistence exists at a notional design level. At some point, the technical decision is made as to which type of storage to use. By then testing regimes are probably already in place. Meanwhile, innovation in databases continues, but it would never occur to a customer that the storage provider is somehow controlling the software.
In academic circles, the problem with the lack of decomposability is usually paired with the lack of explainability. We can summarise the other related business reasons that dog LLMs within delivered software.
We can’t separate the operation of an LLM from its training data.
At the moment, we can’t separate the operation of an LLM from its training data. We know that an LLM is trained, and yet that process is usually not open and the results are still expected to be taken as-is. This way of expecting a component to be “marinated” is fine for a stew but not really applicable for component development.
Security and privacy end up being concerns because there is no steel thread or provable way to stop an LLM differentiating which bits of itself should be hidden. We cannot intercept a neural network in any meaningful way from the outside and explain that certain bits of information are private and should not be revealed.
Legal ownership is still problematic. We can prove that the result of an operation performed by cold calculation is repeatable, and would have come out with the same answer at any time with the same input. Because LLMs carry training baggage with them that they can never relinquish, we simply cannot prove that they haven’t stolen prior art. And they probably have.
Firms that endeavor to control their carbon footprint are moving in the opposite direction from LLM creators, who need a mind-boggling amount of computing power to produce ever-diminishing improvements.
Now, just as this article isn’t about using LLMs to help development, it isn’t about just giving end users raw access to an LLM utility and just cynically shrugging and saying “here you go”. The text editor I’m using has some form of AI added on, and there are no guarantees provided about what it does. We all know these are generally tick-box exercises — features that must somehow appear but are not intrinsic to the offering.
I don’t think there is much future for LLMs to be introduced as services within products, except as the product itself.
For the reasons that I’ve set out, I don’t think there is much future for LLMs to be introduced as services within products, except as the product itself. But even this is a serious trap for any business. When Eric Yuan, founder of Zoom, presented the idea of AI clones attending meetings in Zoom he was rightly ridiculed for expecting this capability to emerge somehow “down the stack.” By outsourcing major innovation to an LLM vendor, he just handed control of his roadmap to another company.
That is all very well, but how should software development respond to this today? We all understand that a component should have an agreed job, or role, that it can be replaced and that it should be testable with other peer components. We also understand that if it is external, it should have been built with the same computing standards — and that we could rebuild it using these.
We shouldn’t try to change the rules of the game for short-term attention. The point is to design a process that delivers the functionality we need for our venture and then develop a platform that will allow the developers to build it sustainably.
As developers, we should want to keep the door open to truly explainable Artificial Intelligence with testable components.
As developers, we should want to keep the door open to truly explainable Artificial Intelligence with testable components. Where training is necessary, that must be monitored, reportable, repeatable, explicable and reversible. If we discover that an LLM believes something is true that isn’t, it must be immediately possible to fix this in a set of defined steps. If this description doesn’t make sense, then neither does computing with LLMs at the moment. But I see no reason, in theory, why this cannot change in the future.
My fear is that the difference can be like comparing the scientific method with faith in a holy relic. We know we can conduct a whole set of unworkable experiments (if I cut this relic up, are all the pieces equally hallowed?), but we also know we should never expect the two areas to ever reconcile.
This article was first published by The New Stack.