Research [Strei2014] and other sources (e.g. [Pizz2013]) have shown that typical software code bases contain 5-10% “dead code”, i.e. code that can be removed without reducing the functionality.
Streamlining the code base by identifying and removing dead code has several benefits:
- Less maintenance cost: Whilst dead code is less likely to be changed frequently, it still has to be understood and might be effected by refactorings.
- Smaller footprint: Less code makes the development environment faster, the build and deployment processes are more efficient, and the size of runtime artifacts are smaller.
- Better precision for calculated metrics: Dead code contributes to software metrics, e.g. “average test coverage” might be improved by tests for unused code and therefore creating false confidence.
Dead code grows in projects for the following reasons:
- Only few developers check in their IDE if some element is still in use, when they remove a reference to it.
- Identifying reliably that a public class or method is “dead code” is not a trivial task and requires deep knowledge about the code base.
- Removing seemingly dead code can easily lead to new bugs therefore developers are usually reluctant to remove them.
It is likely that more dead code exists in large and long running projects with a high fluctuation of developers.
Detecting dead code is a good use case to illustrate Sonargraph Explorer’s powerful scripting API and to demonstrate how it can be used to efficiently detect dead code within a Java project including public classes, methods and fields.
Introduction of Concepts
Before I describe the development of the script in detail, I would like to introduce and differentiate the following concepts:
- Unused code: Code that contributes to functionality that is not used by the end user. Detection of unused code requires monitoring the application at runtime and collecting usage information about classes that are instantiated and the methods that are invoked. The classes/methods that don’t appear in the logs are only candidates for “unused code”, because they might be used in exceptional cases that did not happen during the time of monitoring.
- Useless (or unnecessary) code: Code that does not contribute anything to a functionality (no-operations, writing to a file that is never read, etc). To my knowledge it is not possible to detect useless code automatically and reliably on class and method level. Tools like Eclipse, Findbugs and PMD can detect useless (or “unnecessary”) code on the micro-level by analyzing the AST (abstract syntax tree) [PMD].
- Statically unreachable code: Code that is not statically referenced is a superset of “dead code”. It is easy to detect with Sonargraph Explorer: We just need to check if a type/method/field has no incoming dependencies. Note that there is code that is statically unreachable, but is nevertheless in use as it is dynamically invoked at runtime.
- Dead code: Code that is nowhere used within the application.
The following figure shows how these concepts relate (click to enlarge). Note that the sizes of the ellipses have been chosen arbitrarily and do not reflect the actual percentage of the overall code base.
Since Sonargraph Explorer does static code analysis, the focus of this blog post is to first determine “statically unreachable code” and then to adjuste the configuration iteratively to narrow it down to “dead code”.
Structure of the Script
The full script is shown at the end of this post and I will refer to individual sections while I explain its functionality. It is provided as part of the Java Quality Model within Sonargraph Explorer.
The main “action” is executed by the IJavaVisitor and the onType(), onMethod() and onField() visitor methods.
Statically unreachable code can easily be determined by using the API method ElementAccess.getReferencingElementsRecursively(Aggregator, boolean, boolean, IDependencyKind…), see lines 166-175 in the script and check the online JavaDoc [ScriptAPI] for details.
Without additional configuration, this leads to a high number of false positives and I will demonstrate, how those false positives can be eliminated. Unfortunately, this process cannot be automated as it requires deep knowledge about the code base.
The following types need to be tolerated:
- Tolerate top-level types: Top-level types that are the entry points to the application usually are nowhere referenced but are of course necessary and cannot be removed. The names of those types are configured in the list of “toleratedClassNames”. I use regular expression matching in the “onType” closure so I don’t have to list each class individually. Identified tolerated types are added to a specific result node for easier verification of the findings (see lines 15-22, 155-164).
- Tolerate types used via reflection: A lot of classes are also instantiated dynamically via reflection and are referenced in configuration files (web.xml, plugin.xml, beans.xml, properties files, etc). Again, those types are added to the “toleratedClassNames” as regular expressions (see lines 15-22, 155-164).
- Tolerate types based on inheritance: If you know that all your top-level types extend certain base classes or implement specific interfaces, you can test for this inheritance, too. This probably results in fewer patterns than listing each type individually (see lines 29, 125-145).
- Tolerate generated code: Some code is generated and should be excluded from the analysis, e.g. classes generated by Java to XML Binding (JAXB). The information about unused generated classes should be used to reduce the generated code.
On the next level, the usage of methods is checked. If a method is tolerated, it is checked if its parent (the enclosing type) needs to be removed from the list of “dead” types.
- Tolerate API methods and methods used by frameworks: Public methods might not be referenced anywhere, but are still being used dynamically by a framework (e.g. public “bean” methods of an Ant task) or even by other projects (see lines 241-255). This is straight forward and also based on checking specific naming patterns.
- Tolerate overriding methods: Inheritance / virtual method calls make it impossible to know which methods can be considered as “dead”. Therefore all “overriding” methods are tolerated (see line 188). Note that this is not based on the @Override annotation!
- Tolerate main methods: Main methods need to be considered as entry points and need to be tolerated (see line 227).
- Tolerate methods based on inheritance: It can be efficient to specify that all methods must be tolerated that are contained in classes extending certain base classes. Of course, that is possible, too (see lines 210-225).
Processing of fields:
- Tolerate serialVersionUID fields: Those constants are used by the Java serialization functionality (see lines 295-305).
- Check if a field is being read: It is not enough to check if a field has no incoming dependency. The field’s value must be read somewhere, otherwise it is a dead store (see lines 337-348).
- Tolerate fields based on naming convention: As with types and methods, it is possible to tolerate fields explicitly based on a naming pattern (see lines 307-322).
On all levels it is possible to tolerate elements with certain annotations: Some classes, methods and fields have annotations that mark them to be used by a framework, e.g. @Resource. This is done via a method defined within the script (see lines 81-98) and used for typees in lines 147-153, for methods in lines 195-207 and for fields in lines 280-293.
It is obvious that a consistent naming convention makes the configuration of the script easier. As a positive side-effect of configuring the script, you might want to move all classes that are instantiated via reflection or a framework into a separate package, so that developers can easily recognize their nature.
Recommended Approach
I recommend the following procedure to actually eliminate dead code:
- Never take the result of the script for granted. If a false positive is detected, adjust the script’s configuration.
- Get a second opinion and never delete code on your own.
- Check for comments in the code or in its test method.
- Check your SCM to identify the “owner of the code”. This might be the person who created the artifact or modified it most frequently. This person might be able to provide more information.
- Work incrementally and test frequently.
Results
The percentage of dead code is given at the end of the console output and also provided as a metric. The default upper threshold is set to 5%, but you can adjust that in the script configuration. An issue is created for each “potentially dead” element.
The following screenshot shows the results found for my little sample test project (click to enlarge).
The script helped us to identify ~2 % of dead code within the code base of Sonargraph Explorer and takes less than 30 seconds to process a ~300 KLOC project running on a standard laptop (Intel i7-4712HQ @2.3 GHz, 16 GB RAM).
Let us know how much dead code you find!
Known limitations:
- Since the script only tests for incoming dependencies locally on each element, isolated groups of circular dependent elements won’t be detected. This will be implemented in the future.
- Sonargraph Explorer is not aware of the structure of external code, including the JDK. For example Sonargraph Explorer does not know that java.lang.Integer implements indirectly java.io.Serializable interface.
Resources:
[Strei2014]: “Dead Code Detection On Class Level”, by Fabian Streitel, Daniela Steidl, Elmar Jürgens, 2014
[Pizz2013]: “Static analysis: Leveraging source code analysis to reign in application maintenance cost”, by Pete Pizzutillo, 2013
[PMD]: PMD rules category “unnecessary”
[ScriptAPI]: JavaDoc for Sonargraph Explorer Script API
Script source for DeadCode.scr (as contained within Sonargraph Explorer)
/**
* Script that detects potentially dead types, methods and fields.
* It does not detect isolated cycles of dead code. It could be improved by visiting the model several times and check if there are elements left that are only referenced by previously detected dead code.
*
* IMPORTANT!
* The code identified as "potentially dead" is only a hint - only YOU as the architect / developer of the system know if the code is referenced via dependency injection / reflection.
* Delete the identified code with great care.
*/
/////////////////// Start of configuration ////////////////////////////////////////
toleratedClassNames = new ArrayList<>();
toleratedClassNames.add("\\S+\\.package-info");
//Application classes (examples)
toleratedClassNames.add("com\\.hello2morrow\\.sonargraph\\.build\\.application\\.SonargraphBuildApplication");
toleratedClassNames.add("com\\.hello2morrow\\.sonargraph\\.build\\.client\\.ant\\.SonargraphReportTask");
//UI classes referenced by plugin.xml (examples)
toleratedClassNames.add("com\\.hello2morrow\\.sonargraph\\.ui\\.standalone\\.application\\.\\S+");
toleratedClassNames.add("com\\.hello2morrow\\.sonargraph\\.ui\\.\\S+\\.commandhandler\\.\\S+");
toleratedClassNames.add("com\\.hello2morrow\\.sonargraph\\.ui.standalone.wizard.\\S+");
toleratedClassNames.add("com\\.hello2morrow\\.sonargraph\\.ui.\\S+CommandHandler");
//Instantiated via reflection
//toleratedClassNames.add("x\\.y\\.z\\.ByReflectionClassNamePattern");
//Tolerate classes that extend certain base classes
toleratedIfExtends = new ArrayList<>();
//toleratedIfExtends.add("x\\.y\\.z\\.SuperClassNamePattern");
//Tolerate classes with specific annotations
toleratedTypeAnnotations = new ArrayList<>();
toleratedTypeAnnotations.add("javax.annotation.Resource");
//Method names
toleratedMethodNames = new ArrayList<>();
//toleratedMethodNames.add('\\S+\\$_\\S+_closure\\d+.\\S+()'); //groovy closures
mainMethodPattern = "\\S+\\.main\\(String\\[\\]\\)";
//Tolerate methods with specific annotations
toleratedMethodAnnotations = new ArrayList<>();
toleratedMethodAnnotations.add("org.junit.Test");
toleratedMethodAnnotations.add("javax.annotation.Resource");
List<String> toleratedMethodsOfSubClasses = new ArrayList<>();
toleratedMethodsOfSubClasses.add('com.hello2morrow.foundation.propertyreader.BeanPropertyReader\$BeanAdapter');
//Field names
toleratedFieldNames = new ArrayList<>();
//toleratedFieldNames.add("x\\.y\\.z\\.ClassNamePattern\\.FieldNamePattern"
//Tolerate fields with specific annotations
toleratedFieldAnnotations = new ArrayList<>();
toleratedFieldAnnotations.add("javax.annotation.Resource");
//Tolerate fields required for serialization
serializableClassList = new ArrayList<>();
serializableClassList.add("java.lang.Exception");
serializableClassList.add("java.io.Serializable");
serialVersionUIDFieldName = "serialVersionUID";
/////////////////// End of configuration ////////////////////////////////////////
deadTypes = new ArrayList<>();
deadMethods = new ArrayList<>();
deadFields = new ArrayList<>();
toleratedTypes = new ArrayList<>();
toleratedMethods = new ArrayList<>();
toleratedFields = new ArrayList<>();
//Create nodes for the "tree tab"
NodeAccess deadTypeNode = result.addNode("Potentially dead types");
NodeAccess deadMethodsNode = result.addNode("Potentially dead methods");
NodeAccess deadFieldsNode = result.addNode("Potentially dead fields");
NodeAccess toleratedTypesNode = result.addNode("Tolerated types");
NodeAccess toleratedMethodsNode = result.addNode("Tolerated methods");
NodeAccess toleratedFieldsNode = result.addNode("Tolerated fields");
//function to check if a dependency to an annotation exists
def hasAnnotation(ProgrammingElementAccess element, List toleratedAnnotations)
{
// println "Checking for annotations on element: " + element;
boolean annotationPresent = element.getOutgoingDependencies(Aggregator.TYPE, true, JavaDependencyKind.HAS_ANNOTATION).find
{
AggregatedDependencyAccess dep ->
// println " outgoing annotation dependency: " + dep;
for (String annotationClass : toleratedAnnotations)
{
if (dep.getTo().getName().equals(annotationClass))
{
return true;
}
}
return false;
}
return annotationPresent;
}
IJavaVisitor v = javaAccess.createVisitor();
//Check on source files and skip all external and Groovy source files since too much reflection is going on...
v.onSourceFile {
SourceFileAccess sourceFileAccess ->
if (sourceFileAccess.isExternal() || sourceFileAccess.getFile() == null || sourceFileAccess.getFile().getName().endsWith(".groovy"))
{
//println "Skipping source file: " + sourceFileAccess.getName();
return;
}
v.visitChildren(sourceFileAccess);
}
//Check for dead types
v.onType
{
JavaTypeAccess type ->
if(type.isExternal() || type.isExcluded())
{
return;
}
//Add elements so they show up in the elements tab, so we know what has been processed.
result.addElement(type)
boolean isToleratedExtends = type.getOutgoingDependencies(Aggregator.TYPE, true, JavaDependencyKind.EXTENDS).find
{
AggregatedDependencyAccess dep ->
//println " outgoing dependency: " + dep;
for (String pattern : toleratedIfExtends)
{
if (dep.getTo().getName().matches(pattern))
{
return true;
}
}
return false;
}
if (isToleratedExtends)
{
println "Tolerated type (extends): " + type
toleratedTypes.add(type);
v.visitChildren(type)
return;
}
if (hasAnnotation(type, toleratedTypeAnnotations))
{
println "Tolerated type (annotation): " + type
toleratedTypes.add(type);
v.visitChildren(type)
return;
}
for (String pattern : toleratedClassNames)
{
if (type.getName().matches(pattern))
{
println "Tolerated type (pattern): " + type
toleratedTypes.add(type);
v.visitChildren(type);
return;
}
}
List usingTypes = type.getReferencingElementsRecursively(Aggregator.TYPE, true, false)
int numberOfIncomingDependencies = usingTypes.size()
if(numberOfIncomingDependencies == 0)
{
//println "Dead type detected: " + type
deadTypes.add(type);
//we continue checking on methods and fields, because classes with main(String[]) methods will be removed from the dead types list and we want to find
//unused methods and fields in those classes as well.
}
v.visitChildren(type);
}
//Check for dead methods
v.onMethod
{
JavaMethodAccess method ->
if (method.isExternal() || method.isExcluded() || method.isInitializer() || !method.isDefinedInEnclosingElement())
{
return;
}
if (method.isOverriding())
{
//println " method " + method + " is overriding";
return;
}
JavaTypeAccess type = method.getParent();
if (hasAnnotation(method, toleratedMethodAnnotations))
{
println "Tolerated method (annotation): " + method
toleratedMethods.add(method);
if(deadTypes.remove(type) != null)
{
println "Type " + type + " contains tolerated method (annotation) " + method + " and is therefore not dead code."
if (!toleratedTypes.contains(type))
{
toleratedTypes.add(type);
}
}
return;
}
for (String toleratedSuperClass : toleratedMethodsOfSubClasses)
{
if (type.typeOf(toleratedSuperClass))
{
toleratedMethods.add(method);
if (deadTypes.remove(type) != null)
{
println "Type " + type + " is a tolerated subclass of " + toleratedSuperClass + " and is therefore not dead code";
if (!toleratedTypes.contains(type))
{
toleratedTypes.add(type);
}
}
return;
}
}
if (method.toString().matches(mainMethodPattern))
{
toleratedMethods.add(method);
if (deadTypes.remove(type) != null)
{
println "Type " + type + " has a main method and is therefore not dead code";
if (!toleratedTypes.contains(type))
{
toleratedTypes.add(type);
}
return;
}
}
for (String pattern : toleratedMethodNames)
{
if (method.toString().matches(pattern))
{
toleratedMethods.add(method);
//println " method is tolerated as it matches pattern " + pattern + ", " + method.getName();
if(deadTypes.remove(type) != null)
{
println "Type " + type + " contains tolerated method (pattern) " + method + " and is therefore not dead code."
if (!toleratedTypes.contains(type))
{
toleratedTypes.add(type);
}
}
return;
}
}
List usingTypes = method.getReferencingElementsRecursively(Aggregator.TYPE, false, false);
int numberOfIncomingDependencies = usingTypes.size()
if(numberOfIncomingDependencies > 0)
{
//println " method " + method + " has " + numberOfIncomingDependencies + " incoming dependencies" ;
return;
}
deadMethods.add(method);
}
//check for dead fields
v.onField
{
JavaFieldAccess field ->
if (!field.isDefinedInEnclosingElement())
{
return;
}
JavaTypeAccess type = (JavaTypeAccess) field.getParent();
if (hasAnnotation(field, toleratedFieldAnnotations))
{
//println "Tolerated field (annotation): " + field
toleratedFields.add(field);
if(deadTypes.remove(type) != null)
{
println "Type " + type + " contains tolerated field (annotation) " + field + " and is therefore not dead code."
if (!toleratedTypes.contains(type))
{
toleratedTypes.add(type);
}
}
return;
}
for(String serializableClass : serializableClassList)
{
if (type.typeOf(serializableClass))
{
if (field.getShortName().equals(serialVersionUIDFieldName) && field.isStatic())
{
toleratedFields.add(field);
return;
}
}
}
for (String pattern : toleratedFieldNames)
{
if (field.toString().matches(pattern))
{
toleratedFields.add(field);
//println "Tolerated field (pattern): " + field
if(deadTypes.remove(type) != null)
{
println "Type " + type + " contains tolerated field (pattern) " + field + " and is therefore not dead code."
if (!toleratedTypes.contains(type))
{
toleratedTypes.add(type);
}
}
return;
}
}
if (type.isEnum() && field.isPublic())
{
List using = field.getReferencingElementsRecursively(Aggregator.ELEMENT, true, false);
using.remove(type); //if enum constant needs to override method, we have an incoming dependency
if (!using.isEmpty())
{
//println "Enum constant $field.name is used within enum class $type.name";
return;
}
}
else
{
List using = field.getReferencingElementsRecursively(Aggregator.ELEMENT, false, false, JavaDependencyKind.READ_FIELD, JavaDependencyKind.READ_FIELD_INLINE);
if (!using.isEmpty())
{
return;
}
for (JavaFieldAccess subclassField : field.getReferencingElementsRecursively(Aggregator.FIELD, false, false, JavaDependencyKind.VIA_SUBTYPE))
{
//println "detected field used by subclass: " + subclassField.getName()
using = subclassField.getReferencingElementsRecursively(Aggregator.ELEMENT, false, false, JavaDependencyKind.READ_FIELD, JavaDependencyKind.READ_FIELD_INLINE);
if (!using.isEmpty())
{
return;
}
}
}
//println " dead field: " + field;
deadFields.add(field);
}
//Traverse the model
coreAccess.visitModel(v)
//Sort
deadTypes.sort{it.getNameWithSignature()};
deadMethods.sort{it.getNameWithSignature()};
deadFields.sort{it.getNameWithSignature()};
toleratedTypes.sort{it.getNameWithSignature()};
toleratedMethods.sort{it.getNameWithSignature()};
toleratedFields.sort{it.getNameWithSignature()};
long numberOfStatementsInDeadCode = 0;
for(TypeAccess type : deadTypes)
{
//Add child node for the detected type
result.addNode(deadTypeNode, type);
numberOfStatementsInDeadCode += type.getNumberOfStatementsMetric().intValue();
println "Unused type: " + type.getName();
//Create warning type issue
result.addWarningIssue(type, "Potentially dead type", "Type has no incoming dependencies")
}
for(MethodAccess method : deadMethods)
{
if (deadTypes.contains(method.getParent()))
{
//We don't want to add methods for unused types -> this will screw up the "% of dead code" metric
continue;
}
//Add child node for the detected method
result.addNode(deadMethodsNode, method);
numberOfStatementsInDeadCode += method.getNumberOfStatementsMetric().intValue();
//println "Unused method: " + next;
//Create warning type issue
result.addWarningIssue(method, "Potentially dead method", "Method has no incoming dependencies")
}
for (JavaFieldAccess next : deadFields)
{
if(deadTypes.contains(next.getParent()))
{
continue;
}
result.addNode(deadFieldsNode, next);
result.addWarningIssue(next, "Potentially dead field", "Field has no incoming dependencies");
//We simply assume that a field declaration is one statement
numberOfStatementsInDeadCode++;
}
for (JavaTypeAccess type : toleratedTypes)
{
result.addNode(toleratedTypesNode, type);
}
for (JavaMethodAccess method : toleratedMethods)
{
result.addNode(toleratedMethodsNode, method);
}
for (JavaFieldAccess field : toleratedFields)
{
result.addNode(toleratedFieldsNode, field);
}
float percentageOfDeadCode = numberOfStatementsInDeadCode * 100.0 / coreAccess.getNumberOfStatementsMetric();
def metricId = coreAccess.getOrCreateMetricId("Percentage of dead code", "Potentially dead code (%)", "Percentage of potentially dead code", true, 0.0, new Float(parameterUpperThreshold));
result.addMetricValue(metricId, coreAccess, percentageOfDeadCode);
println "\nNumber of statements (total): " + coreAccess.getNumberOfStatementsMetric();
println "Number of statements (dead code): " + numberOfStatementsInDeadCode
println "Percentage of (potentially) dead code: " + percentageOfDeadCode + "%"
println "\nNOTE: Check the result carefully and edit the configuration inside the script to avoid false positives!"