How to Break a Big Ball of Mud?

Many non-trivial systems end up as a big ball of mud, not because developers are lazy or reckless, but because it is very hard to avoid that outcome without proper tooling. For example, if your architecture rules are spread by word of mouth or some articles in your company wiki, there is no way of knowing if the code actually conforms to any of your architecture rules. If rules are broken, most of the times developers are not aware of that. That will lead to the erosion of architectural boundaries (if they ever existed) and more and more cyclic dependency groups. In the beginning the cyclic groups start small, but they grow like cancer in your codebase. I actually did some research on that by tracking some open source projects over time. That research confirmed my assumption – if you do not address the problem of ever growing cyclic dependency groups things will only get worse over time, in some cases much worse.

A dependency graph of a big ball of mud

A big class cycle in Apache Cassandra containing over 1500 classes

There is a reasonable chance that you are working on a big ball of mud right now and wonder how you can improve the situation. And at some point you have to do something, because this kind of structural erosion is a giant burden on developer productivity. Remember that developers spend most of their time reading code. If the code is hard to understand, if dependencies are hard to understand, the developer will need a lot more time to complete a task and the risk of introducing regression bugs is multiple times higher than normal.

If your biggest class cycle is smaller than 100 elements you might get away with visual analysis and some simulated refactorings. I have recorded a video that showcases a good example of how to do that. But if you have a more severe case with hundred’s or even thousand’s of classes you need a better strategy. In this article I will discuss a few ideas that will help you to improve the situation. Keep in mind, that this process will take time, it is not something that you can do in a couple days. It will require a coordinated effort that will span months. But in the end it will be worth it and will increase developer productivity significantly.

Here are the ideas, which I will explain in more details below:

  • Which cycles do I need to address first
  • Identify classes that would benefit from interface substitution
  • Categorize classes using annotations
  • Avoid backsliding by introducing enforceable architectural boundaries

Which cycle groups to address first

When your system can be described as a big ball of mud, you will have different categories of cyclic dependencies. The most important categories are cycles between source files (components in Sonargraph) and cycles between namespaces/packages. Considering that many namespace/package cycles are mainly caused by underlying component cycles, it is best to start with the component cycles. An improvement there will automatically translate into improvements with namespace/package cycles.

Identify classes that would benefit from interface substitution

Here we try to find classes that contribute a lot of coupling, i.e. classes with lost of incoming and outgoing dependencies. We can actually measure their contribution to the cohesion of the cycle group by multiplying the number of incoming dependencies with the number of outgoing dependencies. We call the resulting number “Coupling Score”. The worst culprits in our big ball of mud are the classes with the highest coupling score.

With Sonargraph it is easy to identify the culprits. Open the cycle view on your biggest componet cycle group and right click to bring up he context menu. Select “Show in Cycle Element Metrics View” and the metrics of the cycle group elements will appear on the bottom of the screen:

The cycle element metrics view sorted by coupling score descending

Here we can see immediately that there are quite a few classes in Cassandra with very high coupling scores. So the culprits are easily identified. But what can we do about that? A good strategy is to add interfaces for the classes where it makes sense. We would use Robert C. Martins “Dependency Inversion Principle”, which is known for reducing coupling. But which classes would be good candidates? Here we have to analyze incoming dependencies in more detail. When an incoming dependency only contains calls to non-static methods and type references and the target itself is a class, then we call that an interfaceable incoming dependency. It means we can replace the target of those dependencies with an interface to the class. That makes especially sense when the class itself has a lot of outgoing dependencies. So we created a second score called “Interface Score”, which we calculate by multiplying the number of interfaceable incoming dependencies (3rd column in screenshot above) with the number of outgoing dependencies. And here we can see that the class ColumnFamilyStore is not only the worst coupling culprit, but also would be the best candidate for interface substitution. We can also see that the class DatabaseDescriptor would be a poor candidate, although it has a very high coupling score.

I recommend to substitute interfaces for the top 3 to 5 top classes on the interface score. Before you do that it might be a good idea to create a baseline in Sonargraph to enable us to measure improvement. If everything goes well we should at least see a reduction of the value of “Structural Debt Index” on the component level. This value is displayed on the bottom of the “Structure” section of the Sonargraph dashboard. This metrics tells us how difficult it would be to untangle all cycles in your system. The more dependencies you have inside of cycle groups, the higher the value.

The structure section of the Sonargraph dashboard

Now create a feature branch in your version control system and add an interface for the first candidate. Modern IDE’s can do that quite well and will automatically substitute the interface where it is appropriate. After you did that refresh the Sonargraph metrics and ensure that the value of structural debt index (20,657 in the screenshot above) went down. If you created a baseline before the difference will be displayed instead of “n/a”. If the value did not change or went up you can undo the change. Also make sure you commit the changes to your feature branch, so that they are easy to undo if needed. Now repeat that for the other candidates you have identified.

Usually this technique will soften up your big ball of mud a bit, so that it becomes easier to disentangle.

Categorize types involved in the cycle group

Big cycle groups with hundred’s or thousand’s of members are a very good indicator for the breakdown of architecture. The minimum level of architecture any application should have is basic technical layering. with layers like “ui”, “model”, “controller”, “persistence” etc. Having layering implies that dependencies can only move downwards. If you have strict layering they can only move to the next layer beneath. What usually happens in those big cycle groups is that they are full of layering violations. So removing those violations should help with breaking up a big cycle group into a few smaller cycle groups.

To address the problem you will have to categorize all members of the cycle group according to the layer they belong to. If you are lucky you will be able to at least partially rely on naming conventions or the package tree, but in most cases that will only be possible for relatively small number of elements. If everybody had followed the rules you would not have to deal with a big ball of mud in the first place. So a failsafe way to do the categorization is annotate classes (attributes in C#) with their layer. That is a bit of work, but it will be very useful down the road.

Here for example is a Java annotation you could use:

/**
 * Annotation to document the architectural layer a class belongs to.
 * This helps identify the technical role and responsibility of classes in          
 * the system architecture.
 */
package com.company;

@Target({ElementType.TYPE})
@Retention(RetentionPolicy.SOURCE)
public @interface Layer {
    
    Layer value();
    
    enum Layer {
        ENTITY,
        DAO,
        SERVICE,
        CONTROLLER,
        DTO,
        UTILITY,
        UNKNOWN,
    }
}

Now you have to do the gruesome work of looking at every singe class in the cycle and annotate it. You can speed it up a bit, if some classes follow naming conventions in a proper way. These particular classes will not have to be explicitly annotated. You might have noticed that we added UNKNOWN as a layer. This is reserved for the case where it is difficult to categorize a class, because it does not follow the usual patterns. Those classes at the end will probably have to be rewritten or removed to fit with the architecture.

Now you can use Sonargraph’s architecture DSL (domain specific language) to lay out the architecture:

artifact Service
{
    include "JavaHasAnnotationValue: com.company.Layer: SERVICE"
    include "**Service" // assuming that is the naming convention
    connect to Controller, DTO
}

artifact Controller
{
    include "JavaHasAnnotationValue: com.company.Layer: CONTROLLER"
    include "**Controller" // assuming that is the naming convention
    connect to DTO, DAO
}

artifact DAO
{
    include "JavaHasAnnotationValue: com.company.Layer: DAO"
    include "**DAO" // assuming that is the naming convention
}

artifact DTO
{
    include "JavaHasAnnotationValue: com.company.Layer: DTO"
    include "**DTO" // assuming that is the naming convention
}

public artifact Entity
{
    include "JavaHasAnnotationValue: com.company.Layer: ENTITY"
    include "**Entity" // assuming that is the naming convention
}

public artifact Utility
{
    include "JavaHasAnnotationValue: com.company.Layer: UTILITY"
}

unrestricted artifact Unknown
{
    include "JavaHasAnnotationValue: com.company.Layer: UNKNOWN"
}

That basically describes the architecture. We made the entity and utility layers public, so every layer is allowed to have dependencies to these two. Otherwise allowed dependencies are controlled by the connect statements. In most layers we also use name patterns to allow catching classes that follow proper naming conventions. This is of course not needed if you decide to annotate every single class. We also marked the Unknown layer with unrestricted. That means it can access all the other layers, while we mark dependencies to it as errors.

If you activate that architecture all the dependencies that break our layering will be displayed in red and will also create architecture violation issues. Now the real work begins, and that is removing those layering violations and rewrite the classes assigned to Unknown. After that is done, your big cycle group will have split in a few much smaller groups, a big improvement compared to the original situation.

If you want to continue the improvement you can add another categorization run, this time by business domain. But that would be a topic for another article.

How to avoid backsliding

If you are using Sonargraph-Build, I first would add these rules to break the build:

  • No package cycles
  • No component cycles with 5 or more elements
  • No new architecture violations

You should ignore all existing cycle groups in your Sonargraph model, so that the rule only triggers for new cycle groups. In this case the build would still break if you add new members to existing ignored cycle groups, which I think is a good thing.

Then extend the architectural model we have defined before to cover your whole application. If you are fancy start by cutting by business domain first and then by layer. But even a simple layering is already quite helpful.

Thank you for reading this article to the end. If you have a comment please use the comment section below or contact us via email.

Leave a Reply

Your email address will not be published. Required fields are marked *