Buildbarn Learning

Posted by 皮皮潘 on 07-15,2022

Introduction

BuildBarn is a Distributed Remote Cache & Execution Platform for Bazel

Abbreviated Term

  1. CAS: Content Addressed storage

    CAS is a method for storing fixed content such as log, mail and so on as objects and providing fast access to that content.

    CAS prevents fixed data from being duplicated or modified once it has been stored, providing write once, read many WORM data access.

    CAS assigns a content address which is calculated based on the content to each object when storing the object. For duplicated object, as it has the same address as the original one, so when it is inserted to the cas, the cas just create a pointer to the original one.

  2. AC: Action Cache

    AC is also a CAS.

    AC stores the map from Action Digest to Action Result

    Action Result contains the digest of blob in the CAS

  3. ICAS: Indirect CAS, CAS not only store the blob content, but alse store the reference to the outer asset, when worker fetch the reference, it will retry the fetch operation against the outer asset

  4. ISCC: Initial Size Class Cache

Bazel Core Precudure

  • Loading phase. First, load and evaluate all extensions and all BUILD files that are needed for the build. The execution of the BUILD files simply executes the code of macros and instantiates rules (each time a rule is called, it gets added to a graph) for the real logic is in the impl field. This is where macros are evaluated. -> Create a Rule Graph
  • Analysis phase. The code of the rules is executed (their implementation function), and actions are instantiated. An action describes how to generate a set of outputs from a set of inputs, such as "run gcc on hello.c and get hello.o". You must list explicitly which files will be generated before executing the actual commands. In other words, the analysis phase takes the graph generated by the loading phase and generates an action graph. -> Create an Action Graph
  • Execution phase. Actions are executed, when at least one of their outputs is required. If a file is missing or if a command fails to generate one output, the build fails. Tests are also run during this phase. -> Execute the Action

The client is responsible for loading and analysis, while the server is responsible for execution

Deployment Architecture

bb-overview.png

  • bb_scheduler: A service that receives requests from bb_storage to queue build actions that need to be run.

  • bb_worker: A service that requests build actions from bb_scheduler and orchestrates their execution. This includes downloading the build action's input files and uploading its output files.

    Worker use the poll mode to fetch execution task from scheduler rather than push mode from scheduler

  • bb_runner: A service that executes the command associated with the build action. bb_worker and bb_runner always appear in pair

    In order to make the execution strategy of bb_worker pluggable and capable of supporting privilege separation, bb_worker calls into a runner service to invoke the desired command after setting up inputs accordingly.

  • bb_storage: As the CAS and Action Cache, it should be emphasized that the bb_storage can be sharded

  • bb_frontend: A component named as bb_storage: RPC demultipexing in the graph, it is a stateless gateway that dispatch the request to the bb_storage shard and the bb_scheduler. It is a part of bb_storage project.

  • bb_clientd: A client-side daemon that implements a FUSE fs to as a tool for exploring the Content Addressable Storage, as a playground, perform Remote Builds without the Bytes

The whole architecture contains Cache Server (CAS + AC) and Execution Engineer (Scheduler + Worker + Runner)

The Execution Engineer MAY execute the action multiple times, potentially in parallel.

Bazel Remote API & Data Structure

Bazel Remote API is the Bridge between Bazel Client and any Platform implements Remote Exectution

Execution API

Before executing an action, the client must first upload all of the inputs, the action and the command along with the action into the CAS. The client then calls Execute with an action_digest referring to them. The Server will run the action and return a stream immediatley and eventually return the result.

The Execute API returns a stream of Operation which is defined in google.longrunning.Operation. Operation messages describe the resulting execution, with eventual response which is defined by ExecuteResponse. The metadata on the operation is of type ExecuteOperationMetadata.

rpc Execute(ExecuteRequest) returns (stream Operation) // Start an execution on one action

message ExecuteRequest {
	string instance_name;
	bool skip_cache_lookup;
	Digest action_digest;
}

message ExecuteResponse {
	ActionResult result;
	bool caced_result;
	map<string, Logfile> server_logs;
	string message;
}

message ExecutionStage {
	enum Value {
		UNKOWN;
		CACHE_CHECK;
		QUEUED;
		EXECUTING:
		COMPLETED;
	}
}

// The metadata will be contained in stream Opreration
message ExecuteOperationMetadata {
	ExecutionStage.Value stage;
	Digest action_digest;
	// If set, the client can use this resource name this `ByteStream.Read` to stream the stdout
	string stdout_stream_name;
	string stderr_stream_name;
}
rpc WaitExecution(WaitExecutionRequest) returns (stream Operation) // Wait for an execution operation to complete

Action Cache API

The Action Cache API is used to query whether a given action has already been performed and, if so, retrieve its result. The action cache addresses the Action Result by a digest of the encoded Action which produced them

rpc GetActionResult(GetActionResultRequest) returns (ActionResult) // Retrive a cached exectuion result
rpc UpdateActionResult(UpdateActionResultRequest) returns (ActionResult) // Upload a new execution result

CAS API

The CAS is used to store the inputs to and output from the execution service

The CAS API provides the upload and download APIs for the CAS

rpc FindMissingBlobs(FindMissingBlobsRequest) returns (FindMissingBlobsResponse) // Determine if blobs are present in the CAS. Be invoked before client updates the blob
rpc BatchUpdateBlobs(BatchUpdateBlobsRequest) returns (BatchUpdateBlobsResponse) // Upload many small blobs at once
rpc Write(stream WriteRequest) returns (WriteResponse) // Upload large blob by stream (From Google API)
rpc BatchReadBlobs(BatchReadBlobsRequest) returns (BatchReadBlobsResponse) // Download many small blobs at once
rpc Read(ReadRequest) returns (stream ReadResponse) // Download large blob by strem (From Goolge API)
rpc GetTree(GetTreeRequest) returns (stream GetTreeResponse) // Fetch the entire directory tree rooted at a node

Capabilities API

The Capabilities service may be used by remote execution clients to query various server properties, in order to self-configure or return meaningful error messages.

rpc GetCapabilities(GetCapabilitiesRequest) returns (ServerCapabilities)

Digest

A content digest for a given blob consists of the size of the blob and its hash.

Digest is the Unique ID to the CAS.

message Digest {
	string hash
	int64 size_bytes
}

Action

An Action captures all the information about an execution which is required to reproduce it

message Action {
	Digest commmand_digest;
	Digest input_root_digest; // The digest of the root Directory for the input
	Duration timeout;
	bool do_not_cache;
}

Command

A Command is the actual command executed by a worker running an Action and specifications of its environment

message Command {
	repeated string arguments; // The arguments to the command. The first argument specifies the command to run.
	repeated EnvironmentVariable environment_variables
	repeated string output_paths; // A list of the output paths that the client expects to retrieve from the action
	string working_directory;	
}

Directory

A Directory represents a directory node in a file tree, containing zero of more children (File Nodes, Directory Nodes, Symlink Nodes)

To fetch a Directory which contains various nodes, we should use its DirectoryNode's digest to fetch from CAS. Similarly, we should fetch a File by its FileNode's digest

message Directory {
	repeated FileNode files;
	repeated DirectoryNode directories;
	repeated SymlinkNode symlinks;
    Node Properties node_properties;
}

// A `FileNode` represents a single file
message FileNode {
	string name;
	Digest digest;
	bool is_executable;
}

// A `DirectoryNode` represents a child of a Directory which is itself a `Directory`
message DirectoryNode {
	string name;
	Digest digest;
}

// A `SymlinkNode` represents a symbolic link.
message SymlinkNode {
	string name;
	string target; // The target path of the symlink.
}

Action Result

An ActionResult represents the result of an Action being run.

message ActionResult {
	repeated OutputFile output_files; // Same as `FileNode`
	repeated OutputSymlink output_symlinks; // Same as `SymlinkNode`
	repeated OutputDirectory output_directories; // Same as `DirectoryNode`
	int32 exit_code;
	ExecutedActionMetadata execution_metadata;
}