What is Bazel
Bazel is an artifact-based build system rather than task-based build system (such as Make, Maven and Gradle)
As a Build system, Bazel's most important purposes are:
- Manage dependencies
- Transform the source code into executable binaries
- Allow machines to create builds automatically
Build System Evolution
As the scale continues to expand, The build system produced the following evolution:
- Compiler
- Shell Script
- Task-Based Build System
- Artifact-Based Build System
It is easy to understand the difference between compiler and shell script while their advantages and disadvantages is also obvious. So I will no longer explain the two build system and let's focus on task-based build system and artifact-based build system
The task-based build system is the build systems that Engineer can define any script as a task and the System can only execute the task but lack of the entire information for building such as Ant
While engineer has a supreme right on the build procedure, the dark side of task-based build systems is as follows:
- Difficulty of parallelizing build steps: multi tasks may share and rewrite the same resources although they don't have dependency each other
- Difficulty performing incremental builds: tasks can do anything (May be the task writes a time stamp or download a file ), there's no way in general to check whether they've already been done or need to be done again
- Difficulty maintaining and debugging scripts
For the artifact-based build system, engineers still tell the system what to build, but the build system determines how to build it (Declarative Build File). The difference between task-based build system and artifact-based build system is same as that between Imperative Language and Functional Language
The artifact-based approach leaves the build system in charge of its own execution strategy (How) so that it knows more information about building process than task-based approach, so that the former can make stronger guarantees about parallelism, incremental builds and correctness.
Bazel as Artifact-Based Build System
Action is the lowest-level composable unit in Bazel: an action can do whatever it wants so long as it uses only its declared inputs and outputs, and Bazel takes care of scheduling actions and caching their results as appropriate. Every action is isolated from every other action via a filesystem sandbox so that it prevents from the problem in task-based build system that different tasks write into same file. Effectively, each action can see only a restricted view of the filesystem that includes the inputs it has declared and any outputs it has produced.
Bazel ensures the dependency's integrity by checking the third-party dependency's digest and it also decides when to redownload the dependency depending by comparing the cached dependency's digest to the desired digest in manifest
The benefits of smaller build targets really begin to show at scale because they lead to faster distributed builds and a less frequent need to rebuild targets. The advantages become even more compelling after testing enters the picture, as finer-grained targets mean that the build system can be much smarter about running only a limited subset of tests that could be affected by any given change.
Dependency Management
Bazel does not automatically download transitive dependencies and it requires a global file (WORKSPACE) that lists every single one of the repository’s external dependencies including transitive dependencies. That is, Bazel will not load external projects' WORKSPACE file. So it is better to extract the dependencies from the WORKSPACE and wrap them into a marco names as xxx_deps
, so that the other project that imports the project can quickly import the transitive dependencies by calling the xxx_deps
marco
Core Precudure
Bazel's build procedure can be summarized as three phases:
- Loading phase. First, load and evaluate all extensions and all
BUILD
files that are needed for the build. The execution of theBUILD
files simply executes the code of macros and instantiates rules (each time a rule is called, it gets added to a graph) while the real logic is in the impl field. This is where macros are evaluated. -> Create a Target (Rule Instance) Graph - Analysis phase. The code of the rules is executed (their
implementation
function), and actions are instantiated. An action describes how to generate a set of outputs from a set of inputs, such as "run gcc on hello.c and get hello.o". You must list explicitly which files will be generated before executing the actual commands. In other words, the analysis phase takes the graph generated by the loading phase and generates an action graph. -> Create an Action Graph - Execution phase. Actions are executed, when at least one of their outputs is required. If a file is missing or if a command fails to generate one output, the build fails. Tests are also run during this phase. -> Execute the Action
Corresponding to the three phase, there exists three query
types:
query
runs on the post-loading phase Target Graphaquery
is action graph query which operates on the post-analysis Configured Target Graph and exposes information about Actions, Artifacts and their relationshipscquery
also runs on the post-loading phase Target Graph, but compared toquery
, thecquery
properly handles configurations such asselect statment
and doesn't provide all of the possibility of the originquery
Execution Phase
In the common case, Bazel performs the following operations against Buildbarn when executing a build action:
- ActionCache.GetActionResult() is called to check whether a build action has already been executed previously. This call extracts an ActionResult message from the AC. If such a message is found, Bazel continues with step 5.
- Bazel constructs a Merkle tree of Action, Command and Directory messages and associated input files. It then calls ContentAddressableStorage.FindMissingBlobs() to determine which parts of the Merkle tree are not present in the CAS.
- Any missing nodes of the Merkle tree are uploaded into the CAS using ByteStream.Write().
- Execution of the build action is triggered through Execution.Execute(). Upon successful completion, this function returns an ActionResult message.
- Bazel downloads all of the output files referenced by the ActionResult message from the CAS to local disk using ByteStream.Read().
By enabling the Builds without the Bytes feature using the --remote_download_minimal
command line flag, Bazel will no longer attempt to download output files to local disk. This feature causes a significant drop in build times and network bandwidth consumed. This is especially noticeable for workloads that yield large output files. Buildbarn should attempt to support those workloads.
Output Directory Layout
<workspace-name>/ <== The workspace directory
bazel-my-project => <...my-project> <== Symlink to execRoot
bazel-out => <...bin> <== Convenience symlink to outputPath
bazel-bin => <...bin> <== Convenience symlink to most recent written bin dir $(BINDIR)
bazel-testlogs => <...testlogs> <== Convenience symlink to the test logs directory
/home/user/.cache/bazel/ <== Root for all Bazel output on a machine: outputRoot
_bazel_$USER/ <== Top level directory for a given user depends on the user name:
outputUserRoot
install/
fba9a2c87ee9589d72889caf082f1029/ <== Hash of the Bazel install manifest: installBase
_embedded_binaries/ <== Contains binaries and scripts unpacked from the data section of
the bazel executable on first run (such as helper scripts and the
main Java file BazelServer_deploy.jar)
7ffd56a6e4cb724ea575aba15733d113/ <== Hash of the client's workspace directory (such as
/home/some-user/src/my-project): outputBase
action_cache/ <== Action cache directory hierarchy
This contains the persistent record of the file
metadata (timestamps, and perhaps eventually also MD5
sums) used by the FilesystemValueChecker.
action_outs/ <== Action output directory. This contains a file with the
stdout/stderr for every action from the most recent
bazel run that produced output.
command.log <== A copy of the stdout/stderr output from the most
recent bazel command.
external/ <== The directory that remote repositories are
downloaded/symlinked into.
server/ <== The Bazel server puts all server-related files (such
as socket file, logs, etc) here.
jvm.out <== The debugging output for the server.
execroot/ <== The working directory for all actions. For special
cases such as sandboxing and remote execution, the
actions run in a directory that mimics execroot.
Implementation details, such as where the directories
are created, are intentionally hidden from the action.
All actions can access its inputs and outputs relative
to the execroot directory.
<workspace-name>/ <== Working tree for the Bazel build & root of symlink forest: execRoot
_bin/ <== Helper tools are linked from or copied to here.
bazel-out/ <== All actual output of the build is under here: outputPath
local_linux-fastbuild/ <== one subdirectory per unique target BuildConfiguration instance;
this is currently encoded
bin/ <== Bazel outputs binaries for target configuration here: $(BINDIR)
foo/bar/_objs/baz/ <== Object files for a cc_* rule named //foo/bar:baz
foo/bar/baz1.o <== Object files from source //foo/bar:baz1.cc
other_package/other.o <== Object files from source //other_package:other.cc
foo/bar/baz <== foo/bar/baz might be the artifact generated by a cc_binary named
//foo/bar:baz
foo/bar/baz.runfiles/ <== The runfiles symlink farm for the //foo/bar:baz executable.
MANIFEST
<workspace-name>/
...
genfiles/ <== Bazel puts generated source for the target configuration here:
$(GENDIR)
foo/bar.h such as foo/bar.h might be a headerfile generated by //foo:bargen
testlogs/ <== Bazel internal test runner puts test log files here
foo/bartest.log such as foo/bar.log might be an output of the //foo:bartest test with
foo/bartest.status foo/bartest.status containing exit status of the test (such as
PASSED or FAILED (Exit 1), etc)
include/ <== a tree with include symlinks, generated as needed. The
bazel-include symlinks point to here. This is used for
linkstamp stuff, etc.
host/ <== BuildConfiguration for build host (user's workstation), for
building prerequisite tools, that will be used in later stages
of the build (ex: Protocol Compiler)
<packages>/ <== Packages referenced in the build appear as if under a regular workspace
Bazel Clean
We can use bazel clean
command to clean the workspace. In detail, bazel clean
does an rm -rf
on the outputPath
and the action_cache
directory. It also removes the workspace symlinks in the project. What's more, we can use The --expunge
option to clean the entire outputBase which is under ~/.cache/bazel/_${user}_bazel/${md5 workspace}
, including external repositories, tool chains and son on.