A job caused multiple agents to fail.
This error means that a job failed on 3 separate agents and the job caused the agents to become non-responsive. When this occurs, Electric Make fails the job and aborts the build. The build proceeds only if you have set -k, -i, or other options that allow the build to continue after an error.
This ElectricAccelerator behavior is designed to prevent a single command from bringing down an entire cluster.
This message could be displayed if any of the following occurs:
- someone restarts agent machines without shutting down their agents first
- an agent/host loses network connectivity
- any other non-agent connectivity issue
For possible hints about why you received this message, examine the Messages tab on the Cluster Manager UI for the affected Agents when running that build.
This is an example of job annotation for a job that triggered EC1073. Refer to the "timing" elements to find the agents involved in the failure.
<job id="J02532fa8" thread="f8932b70" type="rule" name="all"
file="Makefile" line="2"> ...
<output>ERROR EC1073: Job caused multiple agents to fail.
<timing invoked="1.122699" completed="1.161900" node="someagenthost-1"/>
<timing invoked="1.164346" completed="1.241695" node="someagenthost-2"/>
<timing invoked="1.244078" completed="1.321730" node="someagenthost-3"/>
There may be clues in earlier XML elements of this form:
<message ...>Lost connection to agent someagenthost-...</message>
Another possible cause of EC1073 is a build step that reboots the machine it runs on when executed. For example, putting the Linux "reboot" command in the body of a rule.
Always shut down agents before restarting the agent machines. Resolve any connectivity issues.