KBEA-Error Code EC1073

EC1073

Summary

A job caused multiple agents to fail.

Description

This error means that a job failed on 3 separate agents and the job caused the agents to become non-responsive. When this occurs, Electric Make fails the job and aborts the build. The build proceeds only if you have set -k, -i, or other options that allow the build to continue after an error.

This ElectricAccelerator behavior is designed to prevent a single command from bringing down an entire cluster.

Reasons

This message could be displayed if any of the following occurs:

  • someone restarts agent machines without shutting down their agents first
  • an agent/host loses network connectivity
  • any other non-agent connectivity issue

For possible hints about why you received this message, examine the Messages tab on the Cluster Manager UI for the affected Agents when running that build.

Example

This is an example of job annotation for a job that triggered EC1073. Refer to the "timing" elements to find the agents involved in the failure.

   <job id="J02532fa8" thread="f8932b70" type="rule" name="all"
file="Makefile" line="2"> ...
     <output>ERROR EC1073: Job caused multiple agents to fail.
     </output>
     <timing invoked="1.122699" completed="1.161900" node="someagenthost-1"/>
     <timing invoked="1.164346" completed="1.241695" node="someagenthost-2"/>
     <timing invoked="1.244078" completed="1.321730" node="someagenthost-3"/>
     <failed code="1"/>
   </job>

There may be clues in earlier XML elements of this form:

  <message ...>Lost connection to agent someagenthost-...</message>

Example

Another possible cause of EC1073 is a build step that reboots the machine it runs on when executed. For example, putting the Linux "reboot" command in the body of a rule.

Fixes

Always shut down agents before restarting the agent machines. Resolve any connectivity issues.

Have more questions? Submit a request

Comments

Powered by Zendesk