Understanding Dependency Injection

By Laza Upatising on 2021-03-24 Home

This article continues a series of articles where I will attempt to bridge the gap between computer science in college and software engineering as practiced in industry. The intended audience is a junior engineer. If you haven't read it yet, you can also check out Good Commit Messages.

There are a lot of articles about dependency injection. What this article brings to the table is a straightforward explanation of dependency injection without relying on frameworks and grounded in code.

Key Concepts

Dependency injection can be best understood in conjunction with other key concepts. For completeness, we will briefly cover them.

Modules

Software systems are broken down into modules. A module generally refers to code that revolves around a single concern. For the purposes of understanding dependency injection, we assume a module to be a single class, which is often the case in object-oriented languages. In practice, modules can take many other forms. A module depends on other modules to achieve its desired functionality. We refer to these other modules as dependencies.

As an example, take a User class, which represents a user in a web application.

class User {
  public:
    // Loads User with id from the datastore.
    bool Load(uint64 id);

    // Verifies that new_username is acceptable, and persists the new username into storage.
    // Returns whether the operation succeeded.
    bool UpdateUsername(std::string new_username);

    // Verifies that new_password is acceptable, and persists the new password into storage.
    // Returns whether the operation succeeded.
    bool UpdatePassword(std::string new_password);
};

In this example, we can see that the User class depends on an underlying datastore module in order to persist and load user information.

Interfaces

Interfaces provide a well defined way to interact with modules. An interface definition captures all possible ways to interact with a module. In many object-oriented languages, interfaces are defined using an abstract base class. By using an abstract base class, the interface definition only exposes method signatures while leaving out implementation details.

Let us continue with the User example. As mentioned previously, the User class relies on an underlying datastore in order to persist and load user information. The datastore dependency can be modeled with a DatastoreInterface.

class DatastoreInterface {
  public:
    // Stores data in association with id. Returns whether the operation succeeded.
    virtual bool Store(uint64 id, std::string data) = 0;

    // Loads data associated with id. Returns an empty string if not found.
    virtual std::string Load(uint64 id) = 0;
};

// Continued from the example in the Modules section above.
class User {
  public:
    // Initializes User with the given datastore interface.
    explicit User(DatastoreInterface* datastore);

    // Other public methods: Load, UpdateUsername, UpdatePassword.
};

Unit Testing

Unit tests are the smallest tests in an application's suite of tests that provide value. They are focused on a small 'unit' of the application. Unit tests are commonly used verify the public methods of a class, or the interface of a module. Unit tests are generally focused on ensuring individual units of code are well behaved, and provide rapid feedback during software development.

Unlike integration tests, which verifies behavior of connected modules, unit tests are focused on the functionality within a module. Often times, a unit test exercises and verifies a single method of a class.

What is Dependency Injection?

Dependency injection is a design principle where we structure modules in a way such that the module's dependencies can be provided and interchanged. At run time initialization, the module's dependencies are provided, or 'injected', resulting in a fully functional module.

The design principle is abstract, as one can draw any imaginary line to bound a module's responsibilities. We will spend the next couple of sections providing concrete examples in order to illustrate how to apply the design principle as well as highlight some major benefits of structuring modules this way.

`User` Module Implementations

Coming back to the User module example, below is a 'good' implementation of the User module. The module explicitly takes in the DatastoreInterface dependency, and delegates all datastore related tasks to that dependency.

class User {
  public:
    explicit User(DatastoreInterface* datastore) : datastore_(datastore) {}

    bool Load(uint64 id) {
      bool success = Deserialize(datastore_->Load(id));
      if (success) id_ = id;
      return success;
    }

    bool UpdateUsername(std::string new_username) {
      bool acceptable = UsernameAcceptable(new_username);
      if (!acceptable) return false;
      username_ = new_username;
      return datastore_->Store(id_, /*data=*/Serialize());
    }

    bool UpdatePassword(std::string new_password) {
      bool acceptable = PasswordAcceptable(new_password);
      if (!acceptable) return false;
      password_hash_ = Hash(new_password);
      return datastore_->Store(id_, /*data=*/Serialize());
    }

  private:
    // Returns whether the password is at least 6 characters long.
    bool PasswordAcceptable(std::string password);
    // Returns whether the username is at least 3 characters long.
    bool UsernameAcceptable(std::string username);

    // Serializes this User object into a string suitable for storage.
    std::string Serialize();

    // Deserializes the given string into this User object.
    // Returns whether the operation was successful.
    bool Deserialize(std::string in);

    // Hashes str.
    std::string Hash(std::string str);

    uint64 id_;
    std::string username_;
    std::string password_hash_;

    DatastoreInterface* datastore_;
};

We see that the implementation of the various public methods deal solely with business logic related to User, delegating data storage tasks to the DatastoreInterface. Moreover, User only relies on the public methods of DatastoreInterface. Indeed, the type system forces us to only use the DatastoreInterface's methods - there are no other methods to call!

When implementing modules that are designed around well defined interfaces and dependency injection, we are forced to write focused code that relies only on publicly available interfaces of dependencies. To further illustrate this point, here is a bad example of a User module that relies directly on an underlying datastore implementation.

class User {
  public:
    bool Load(uint64 id) {
      std::ostream sql;
      sql << "SELECT * FROM Users WHERE id = " << id;
      return Deserialize(SQLite::Execute(sql.str()));
    }

    bool UpdateUsername(std::string new_username) {
      bool acceptable = UsernameAcceptable(new_username);
      if (!acceptable) return false;
      std::ostream sql;
      sql << R"(UPDATE Users SET username = ")"
          << SanitizeSQL(new_username)
          << R"(" WHERE id = )" << id_;
      return SQLite::Execute(sql.str());
    }

    bool UpdatePassword(std::string new_password) {
      bool acceptable = PasswordAcceptable(new_password);
      if (!acceptable) return false;
      std::ostream sql;
      sql << R"(UPDATE Users SET password_hash = ")"
          << Hash(new_password)
          << R"(" WHERE id = )" << id_;
      return SQLite::Execute(sql.str());
    }

  private:
    // ...
};

We can already begin to see some issues arising from directly relying on the datastore, such as increased complexity introduced by SQL sanitization. Additionally, we can easily imagine extending the module to rely on specific behaviors of SQLite. As the module's functionality grows, it will increasingly become brittle and complex as we mix user business logic with datastore related logic.

Real World Scenarios

Let's run through some real world scenarios that may come up and compare how the good and bad implementations above handle them in order to further highlight the benefits of dependency injection.

Our web application has exploded in popularity and users are experiencing significant load times. Performance profiling indicates that the loading of User objects is a significant performance bottleneck. Introducing a caching layer for User data can significantly improve performance.
- In the good example, we write a new DatastoreInterface implementation that first consults a cache. We then pass the new interface implementation to User's constructor. The change introduces a new caching DatastoreInterface implementation with associated tests. The change concisely addresses the business need.
- In the bad example, we are forced to rework the User::Load method. We not only have to implement loading data from the cache, but we need to parse and understand the User related business logic in User::Load, potentially introducing additional changes to the business logic.
SQLite is running into scaling issues, and we would like to migrate to PostgreSQL.
- In the good example, we write a new DatastoreInterface wrapper around PostgreSQL and pass the object to the User's constructor.
- In the bad example, we need to replace all SQLite::Execute calls with the compatible PostgreSQL calls. Moreover, we need to verify that all SQL issued is compatible with PostgreSQL.
We would like to audit our code for SQL injection attacks.
- In the good example, we would need to investigate the various implementations of DatastoreInterface that deals directly with issuing SQL. The auditor only needs to understand the SQL statements issued by the various DatastoreInterface implementations.
- In the bad example, we need to look through all methods of the User class and verify that the SQL statements do not interpolate raw user input. Moreover, the auditor needs to read and verify User related business logic. For example, the auditor needs to understand that id_ has a uint64 type which can be safely interpolated into SQL, whereas new_username could be a user input string and should be sanitized.

We will now touch upon another major advantage of designing a module around dependency injection: testing.

Unit Testing

Let's start with the question "How do I write the best unit test for the User module?" while keeping in mind the following desirable properties of unit tests:

Hermetic - unit tests should not break when an underlying dependency or state changes.
Focused - individual unit tests verify the behavior of a single piece of functionality, often a single public method.
Fast - unit tests should provide rapid actionable feedback as they are typically part of a edit-build-test cycle.

In many systems, pulling in a database in a unit test is highly undesirable. Databases are sophisticated pieces of software that make tests non-hermitc and slow. Bringing in a database means that our unit tests are implicitly exercising and verifying the behavior of the underlying database. Database implementation issues can cause User module unit tests to fail, resulting in flaky tests.

The ideal unit tests for the User module would verify only the business logic around User:

The logic around verifying whether a new password is acceptable.
The logic around verifying whether a new username is acceptable.
The logic around deserializing a user object from a string returned from the data store.

Dependency injection allows us to inject a mock DatastoreInterface object into User's unit tests. By using a mock object, we allow the unit tests to focus only on verifying the behavior of User, rather than also depending on the behavior of the underlying datastore.

void TestUserDeserialization() {
  MockDatastoreInterface datastore;
  datastore.OnLoad(/*id=*/1).Returns("{serialized user object}");
  User u(&datastore);
  assert(u.Load(1));
  assert(u.username() == "test_username");
}

void TestUserValidPassword() {
  MockDatastoreInterface datastore;
  datastore.ExpectCallToStore
    (/*id=*/1, /*data=*/"{serialized user object with hashed '111111' password}");

  User u(&datastore);
  assert(u.UpdatePassword("111111"));
}

void TestUserInvalidPassword() {
  // Test fails if Store is called on datastore without a prior ExpectCallToStore statement.
  MockDatastoreInterface datastore;
  User u(&datastore);
  assert(!u.UpdatePassword("bad"));
}

When attempting to write unit tests for the bad example above, we see that the tests are polluted with SQL code.

void TestUserValidPassword() {
  User u;
  u.set_id(1);
  assert(u.UpdatePassword("111111"));

  assert(SQLite::Execute("SELECT hashed_password FROM Users WHERE id = 1") == Hash("111111"));
}

void TestUserValidPassword() {
  SQLite::Execute("INSERT INTO Users (id, username, hashed_password) VALUES (1, 'username', 'hashed_pass')");

  User u;
  u.set_id(1);
  assert(u.UpdatePassword("bad"));

  // Assert the user's hashed_password has not changed.
  assert(SQLite::Execute("SELECT hashed_password FROM Users WHERE id = 1") == "hashed_pass");
}

If we run through the same set of Real World Scenarios listed above, we see that tests for the implementation that is designed around dependency injection do not need to change at all! The public interface between the User and DatastoreInterface modules is unchanged and therefore the test are unaffected. However, the tests for the bad example will need major reworking when introducing a caching layer, or swapping databases.

Module Boundaries

Underlying the observation that dependency injection brings significant benefits to module design are well thought out boundaries for a module's responsibilities.

As an example, suppose we introduce a new business requirement that users from the '@high-security.org' domain must have passwords longer than 8 characters, whereas normal users must have passwords longer than 6 characters. One could draw the boundary of responsibility for the User module to not include the PasswordAcceptable check, delegating the acceptability check to a new PasswordAcceptable module.

At the extreme, one could turn all the private methods of the 'good' example (PasswordAcceptable, UsernameAcceptable, Serialize, Deserialize, and Hash) into their own separate modules and have User depend on those. This split involves writing significant amount of additional code, as each new module should be defined by a public interface. The decision of limiting a module's responsibilities and introducing a new dependency is a balancing act between extensibility on the one hand, and increased development velocity on the other.

Conclusion

Dependency injection is an abstract concept revolving around the design of software systems. The design principle allows us to build systems out of modular components, injecting and interchanging dependent components as the system evolves. Used judiciously, dependency injection brings multiple benefits to a software project such as extensibility, modularity, and testability. However, like all things in software engineering, there is a delicate balancing act between well designed and over designed.

Experience and battle scars gained through debugging and refactoring complex codebases can help us appreciate the various benefits offered by dependency injection, as well as understand some of the drawbacks. I hope the User example in this article offered a real-world glimpse into how one would design around dependency injection.

This is my first attempt at explaining dependency injection in an accessible way - please let me know how I can improve this article! Send me a message at blog@ this domain.

Understanding Dependency Injection

Key Concepts

Modules

Interfaces

Unit Testing

What is Dependency Injection?

User Module Implementations

Real World Scenarios

Unit Testing

Module Boundaries

Conclusion

`User` Module Implementations