While concern has been expressed about AI-powered robots replacing human workers, a more likely scenario is humans and robots working in teams.
What is a cobot?
Cobots are collaborative robots that, rather than just sharing the same physical space as a human worker (for example on an assembly line), they team up with their human co-worker to complete a task, such as handling tools or parts.
EU researchers (Doncieux et al) argue that:
Robots therefore create opportunities for collaboration and empowerment that are more diverse than what a computer-only AI system can offer. A robot can speak or show pictures through an embedded screen, but it can also make gestures or physically interact with humans, opening many possible interactions for a wide variety of applications.
Implicitly acknowledging that robots are more ‘programmable’ than their human co-workers, the EU researchers identify the following key features which will be needed for robotic co-workers to successfully team with humans:
Zero-shot adaptability: the real world is complex and unpredictable and current AI can have trouble reacting to situations not found in their training data. For example, how a walking robot navigates a banana skin in its path. Cobots will need the ability to quickly, often reflexively, act in novel situations and get it right the first time (zero shot) to avoid injuring their human co-workers.
Ability to learn ‘on the job’: even the huge data sets currently used to train LLMs could never capture the complexity of the real world. Alternative training techniques are required for cobots to continuously learn ‘in the wild’. The researchers give the following example:
An alternative is to let the robot experience such relations through interactions with the environment and the observation of their consequences. A chair can be characterised by the sitting ability, so if the system can experience what sitting means, it can guess whether an object is a chair or not without the need to have a dataset of labelled images containing similar chairs.
Connecting verbal instructions and spatial awareness: Natural Language Processing capabilities of AI enable a simple, effective conversational interface between non-expert humans and cobots. To execute on those verbal instructions from a human, such as “go through that door”, the cobot has to connect words to its own sensorimotor flow to locate the door in its vicinity.
Managing physical interaction with humans: beyond the obvious need for cobots not to bump into human co-workers when moving around in a common space, collaboration between humans and robots may need to involve close physical contact. An example is exoskeletons used to support people requiring mobility assistance, which involves the robot constantly measuring and adjusting torque and other forces to maximise the support and avoid injuring the person wearing the device. Collaboration may also require humans and cobots to be able to exchange physical signals. For example, when a cobot and a human are moving a table together, force feedback enables the load to be distributed correctly between them.
Understanding human behaviour and intentions: the cobot will need to anticipate what the human co-worker is about to do next and plan its part in the work sequence. The researchers explain:
Take the simple example of a human handing an object to the robot. The common goal is that, in the final state, the robot is holding the object, whereas in the initial state the human is holding it. The goal must be shared right from the beginning of the interaction, for example through an explicit order given by the human. Alternatively, the robot might be able to determine the common goal by observing the human’s behaviour, which requires the robot to have the ability to deduce human intentions from their actions, posture, gestures.
Ability for humans to understand a robot’s behaviour and intentions: collaboration is not a one way street of cobots understanding what human co-workers are doing, the humans need to know the what, when, how and why of what the cobot is doing. AI is a black box to experts, let alone to humans who have to work on the factory floor next to robots. The cobot could verbalise its next steps and reasoning to its human team members, but as the researchers note, the cobot continuously chatting away in a work environment is likely to distract human co-workers . The researchers suggest that humans and cobots could develop a common non-verbal language of gestures or, if cobots have screens or flexible ‘skin’, facial expressions.
Socially aware cobots
A paper by US researchers (Chakraborti et al) argue that the challenge of robots and humans working together is primarily cognitive, rather than physical. They say that human-human teams are effective because:
…every team member maintains a cognitive model of the other teammates they interact with. These models not only capture their physical states, but also mental states such as the teammate intentions and preferences, which can significantly influence how an agent interacts with the other agents in the team.
Replicating that model in robot-human teams requires building a fundamentally different robot architecture. Early robots were behaviour-based, such as those used to weld cars on an assembly lines. Because they did not maintain a model of the outside world, they could not reason about their environment and were therefore purely reactive. More advanced robots are goal-based with a limited world model which gives some adaptability in plotting different pathways to their fixed goal. However, this world model is fixed and finite, which means it can handle the pre-programmed behaviour of other robots working in a team but not the complex, unpredictable real world, including human co-workers.
The US researchers say that successful human-robot collaboration requires robots built to a deeper ‘social agent’ model:
A deeper level of modelling is not only about the other agents, but also the other agents’ modelling of the agent itself. This includes, for example, the others’ expectation and trust of the agent itself. Such modelling allows the robot, for example, to infer the human expectation of its own behaviour and in turn choose behaviours that are consistent with this expectation. Expectation and trust, in particular, represent the social aspects of agent interactions since they are particularly relevant when agents form groups or teams together.
Example of human-robot collaboration
A team of humans and robots could allow robots to undertake the more dangerous and difficult tasks in search and rescue missions. For example, a first responder team consisting of a robot (R1) and a human (H1):
Based on the floor plan of the building in its search area, R1 realises that the team needs to use an entrance to a hallway to start the exploration. R1 notices that a heavy object blocks the entrance to the hallway. Based on its capability model of its H1 (that is, what H1 can and cannot lift) and H1’s goal, R1 decides to interrupt its current activity and move the block out of the way. H1 begins to head off to search an area of the building and points R1 to separately search another area. However, R1, which has access to the building structure information, proposes a different plan to split the search in a way that minimises human risk. In their separate searches, H1 discovers a victim and informs R1. R1 understands that H1 needs to get a medical kit to be able conduct triage on this victim as soon as possible but knows that H1 does not know where a medical kit is located. Since R1 has a medical kit already, but cannot deliver it due to other commitments, it places its medical kit along the hallway that it expects H1 to go through and informs H1 of the presence of the kit.
The example shows how the mental modelling of the human co-worker by a cognitive robotic teammate is critical to the fluent operation of the team:
The robot must anticipate the human co-workers’ intention and needs when confronted with tasks the human is performing independently the robot: R1 infers that the H1 will be finding a way to clear the object.
The robot needs to recognise the team context, including what is optimal for its own task may be suboptimal for the team if it is too narrowly focused on its own activities: R1 breaks off from its own task to help move the object which blocks the team.
The robot needs to take the action which best helps the humans: R1, knowing the strength limits of H1, decides that only it can lift the heavy object.
How the robot should behave is dependent on how much and what type of help the human requires, which in turn depends on the observations about the human teammates. This will require the robot to proactively make complex sensing plans that interact closely with its modelling and planning functions. Based on the robot’s video input of the direction in which H1 is stepping, R1 anticipates she is planning to search an area which R1 knows is at risk of collapse and the robot needs to propose to H1 a different plan.
The human and robot co-workers must be on the same page in sharing team tasks, but if the human co-worker’s plan is too costly (or even unsafe or infeasible) in the robot’s model, the robot must be able to provide and explain an alternative plan to the human co-worker. R1 and H1 still proceed to search separate areas as per H1’s original intention, but in accordance with R1’s safer allocation. In explaining its alternative plan, the robot has to be cognisant of the limitation on human cognitive load, sharing only necessary information is more practical between different teammates working on different parts of the team task: “danger Will Robinson” will be enough in some but not all cases.
The robot’s planning needs to take into consideration possible human participation in task completion: in the example, R1 plans to meet H1 partway for delivery of the medical kit.
Is the future with robots here now?
The US researchers conclude:
Acquiring of [mental models of how other humans will think and behave], taken for granted among human teammates through centuries of evolution, is perhaps the hardest challenge to be overcome to realise truly cognitive teaming. The difficulty of this problem is exacerbated by the fact that much...of these models cannot be learned from observations directly but only from continued interactions with the human.
Meeting that challenge might not be as distant as it sounds. Meta recently released V-JEPA 2, billed as “the first world model trained on video that enables state-of-the-art understanding and prediction, as well as zero-shot planning and robot control in new environments”.

Peter Waters
Consultant