Monorepo? Think Twice
  • Tawhid Hannan
    Retweet
    Created
    Last Updated
  • Tags:

    I'll lead by saying that monorepos can be pretty awesome. They enable developers to make cross-cutting changes with ease. You may want this capability to prevent services from straggling behind the pack and be able to check that nothing is running old or insecure code. There's a lot of good there, and we shouldn't throw that away. With that said, there are some aspects to think about with a monorepo, and there's some interesting discussion to have here.

    How ready is your build system?

    A 'manyrepo' makes building and deploying applications usually pretty safe. A singular repository will have build scripts tailored to it's purpose. It doesn't take much to make it only rebuild when it must, enabling us to keep 'waste' down. In and of itself, it's simple. In a monorepo configuration, we still want to build only what we need to. Anything else slows the build down, and that hurts us when trying to keep feedback loops tight.

    I've been distant from the Node.JS ecosystem for a while, so I'm not up to date with more recent goings-on. I can, though, relate my experiences and challenges at the time. The popular tools for managing monorepos in the JS ecosystem used a particular technique. As an example, if package A and package B both depend on package C, these tools will store package C in the root node modules directory for the project. This has some pretty nice wins when it comes to pulling dependencies for your monorepo. Instead of 50 copies of a utility library, such as Lodash, you have instead one or two copies. This makes downloading dependencies for the monorepo quicker, as we're downloading fewer files. Each package may also have co-located node_modules folders for dependencies that aren't shared. When run on your machine, this works without a hitch.

    The issue comes when you want to containerise a JS application built in a monorepo. Let's say you're using Docker to containerise your Node.JS applications and run them in production. The dependencies for the Node.JS application must be available in the container. Generally, that means copying the node_modules folders that your package relies on. With hoisting, these tools store dependencies at various levels of your monorepo. The end result is that your build script must copy files from various levels of the repository. You could use bundling tools such as Webpack, but serverside that makes the build process complex in other ways.

    That's not all. To make the process of building images quicker, Docker leverages image layer caching. Image layer caching speeds up builds through a simple heuristic - if the output of a particular step in a Dockerfile is the same, there's no need to re-run it. To confirm that copied files are the same, Docker computes a checksum for the files and uses the checksum to represent the layer output. When the files haven't changed, the checksum is the same, and Docker can skip the copy step. If you're copying a large number of files, as you tend to with Node.JS services, this can save a lot of time.

    But the hoisting of dependencies makes it challenging to leverage the Docker cache. Changes in one project or service can cause shared node_modules directories to change. This means that when Dockers check if the files for the application have changed, it's picking up changes from another project. Docker then re-runs the step, wasting time and CPU cycles. It looks the Yarn 2 architecture enables the creation of plugins to make managing this easier, which is great.

    Now, the Node.JS monorepo ecosystem will, over time, improve this process even further. Containers and especially Docker are more prevalent than they were five years ago. Docker is too big for someone to not be incredibly motivated into solving the problem. With that said, if you were using Node.JS before these plugins were a thing, or perhaps you're not using Yarn 2 yet, the process needs a bit of investment. Bits of jank here and there add up and can slow down development workflows significantly.

    The point here is not to fixate on Docker or Node.JS. The Node.JS ecosystem builds applications in different ways to the Golang ecosystem. You may not use Docker! A rule of thumb here is to audit the things that make up your build/deploy system today and understand how you can maintain build isolation in a monorepo. Figure out the workflows your developers follow.

    How ready is your codebase?

    A monorepo makes cross-cutting changes easier.

    But ask yourself, is that what you need to optimise for? In an architecture spanning many services, managing change across them becomes tricky. Let's say for instance a developer identifies a critical issue or vulnerability in an internal package. They patch it and release the patch to the package, and now we want to verify that all dependants are using the new code. A monorepo makes that trivial. You may not be at that scale though, and a good exercise to carry out would be to characterise the kinds of changes you do make to your current codebase. How frequently are cross-cutting changes being made? To which parts? For what reason?

    You may find that you've got many cross-cutting changes, but it's more due to a lack of modularity or an over re-use of a particular package or dependency. A trick I like to use here is to think of the 'fast-changing' and the 'slow-changing' parts of a codebase. If you're moving to a monorepo because keeping a 'fast-changing' component(s) in sync is painful, it may be too many things are relying on it. By restructuring the codebase, you can lower the rate of change in the code, and through that, make the need for a monorepo less pressing.

    Calling It

    A monorepo is used in many companies for good reason, enabling some powerful workflows, and when it clicks it's fantastic. With that said, a monorepo is a commitment. Many of the places that popularised the monorepo also throw a lot of developer time into making it smooth. A monorepo might be the best thing for your project or team, but you need to figure out your workflows and what you want out of it. A slip-up here can lead to a bloated, slow and all-around frustrating experience for builds, in service of a workflow edge case.