Dov Katz.
The CTO of Formic Robotics has spent a career working on the parts of autonomy that nobody likes to talk about. We asked him about the gap between what gets called autonomous and what actually is.
When most people picture a robot, they imagine something that decides things on its own. However, that is usually not what they are actually looking at. Much of what gets labeled as robotics in 2026, from the warehouse floor to the battlefield, is either pre-scripted automation running through a sequence a human defined in advance, or teleoperation with a person somewhere in the loop making real-time decisions. The machine is impressive, but the autonomy is borrowed.
Closing that gap is what Dov Katz has spent much of his career on. As CTO of Formic Robotics, where his team is building robotic systems for defense applications, and previously as an R&D leader at Meta, a postdoctoral fellow at Carnegie Mellon, and a PhD-trained computer vision researcher, he has watched the field develop an uneasy relationship with the word "autonomous." We sat down with him to talk about what real autonomy would actually require - and why he thinks the industry's current vocabulary is doing more harm than good.
Let's start with the framing. You have argued that most of what gets called autonomous is not really autonomous. What is being missed?
The word gets applied generously to systems that would fail the moment the script breaks or the operator steps away. That is the thing to notice. A lot of impressive demos work because someone has done a great deal of upfront work to make the environment cooperate with the robot. The cleverness is real, but it lives in the setup, not in the autonomy. True autonomy - the kind that can cope with a world nobody has pre-labeled - remains mostly unsolved. And closing that gap is harder than the industry often admits.
What does true autonomy actually require? Break it down.
Three capabilities have to work together. The robot has to perceive what is actually in front of it. It has to decide what to do about it. And it has to physically execute that decision in a way that respects the laws of physics and the constraints of its body. Each of these is difficult on its own. Unifying them is harder.
Perception is where my own technical roots run deepest. My doctoral work was in computer vision, and I have spent years on the question of how machines build useful internal representations of the world from sensor data. The field has made real progress. Object detection and scene understanding now work in ways that would have seemed remarkable a decade ago. But the gap between identifying that there is a door in an image and understanding that this particular door is different from the one in the training data, opens outward rather than inward, and is currently blocked by an obstacle - that is where most autonomy efforts break down.
Decision-making is where the assumptions hide. A robot operating in a factory can rely on the software being engineered to cooperate with it. A robot operating in the real world has to handle the fact that the world was engineered for humans, by humans, with no consideration for what a camera or a lidar can see. Decisions have to be made under genuine uncertainty, with incomplete information, and often in time windows where the right answer matters more than the optimal one.
And then there is physical action, which is the part people forget. A decision that cannot be executed safely on a real robot in a real environment is not really a decision. The gap between what a planning algorithm proposes and what a physical system can actually do - given the weight of its parts, the friction in its joints, the latency of its controllers, the compliance of its grippers - is where much of the clever autonomy work dies.
You have oriented Formic around unstructured environments specifically. Why are those the real test?
Because that is where the autonomy claim is actually tested. Defense robotics almost never operates in a structured backdrop. A warehouse floor is a laboratory. A combat zone, a disaster site, a contested maritime environment - those are the opposite of a laboratory. Nothing has been measured, labeled, cleaned, or arranged for the convenience of a machine. Lighting changes. Weather changes. Adversaries deliberately introduce conditions the system has never seen.
I am skeptical of demonstrations that work only because the environment is quietly cooperating. The interesting question is not whether a robot can do the task once, under ideal conditions, with a trained operator standing just off-camera. The question is whether it can reliably perform the task in a place that was never designed for it, with no one there to intervene.
That distinction is uncomfortable for parts of the industry, though, isn't it?
It is, and I understand why. Much of commercial robotics revenue rests on the cooperative-environment case. There is nothing wrong with that work - factory automation, last-mile logistics, structured inspection all have real economic value. My argument is just that calling those systems autonomous creates confusion about where the field actually is. The hard part is not doing a known task in a known place. The hard part is everything else.
So what would it actually take to close the gap?
The honest answer runs in several directions at once, and that is itself telling. It is not one breakthrough. It is a set of concurrent advances that have to land together.
Perception systems need to generalize better, handling conditions they were never explicitly trained on. That probably requires more work on how robots learn from their own experience, rather than relying entirely on pre-training. Decision-making needs to integrate uncertainty as a first-class element rather than as an afterthought, and to handle cases where the system has to act without fully understanding its environment. And the bridge between planning and physical execution needs to account for the messy reality of bodies operating in a world with friction, latency, and consequences.
Underneath it all is the question of how these systems learn. Simulation has become the default answer for training robotic policies. It is faster, cheaper, and safer than real hardware. But simulation is also where the cooperative-environment problem tends to reappear. A simulator that gets the physics wrong, or that cannot reproduce the chaotic sensory conditions of the real world, teaches a robot the wrong lesson. The policy looks brilliant in simulation and brittle in deployment. Closing that gap is something I have been actively working on, including through recent NVIDIA training in physics simulation. The fact that a CTO is still doing technical training at that level is part of the point - the field is moving too fast to lead from a distance.
You have framed this almost as much an ethical issue as a technical one. Why?
Because the stakes of overclaiming are not theoretical, especially in defense. A system advertised as autonomous but actually a remote control fails in ways that matter. None of what I am saying is an argument against the actual progress in robotics - the industry has earned most of its wins. But the gap between those wins and true autonomy remains wide, and pretending otherwise slows the field. Closing the gap, one capability at a time, is the only path to robots that can actually do what everyone keeps saying they can already do.