A lot of people try to avoid refactoring, whether it's because they are scared of breaking their code, because they don't see immediate benefit, or they just pretend that there's no time for it. In reality though, refactoring can be extremely beneficial both short-term and long-term and there's a lot of reasons to spend some time on it. The most commonly mentioned ones would probably be finding bugs, making software easier to understand or improving its design.
The real reasons in my opinion, though, that should push you to mess with your (or maybe legacy) terrible code should be that refactoring helps us lose fear from our own code and the helps us avoid the feeling of shame when we have to show our code to somebody else.
So, in this article we will go over some easy ways to refactor and considerably improve your code without risking breaking everything at the same time, so that you don't have to be scared of your own code or be ashamed to show it to your colleagues.
The Easiest Improvements
To ease you into the refactoring of the feared codebase we will start with simple, yet very significant improvements that literary cannot break anything - first one being reformatting of code.
There are couple of ways how to approach fixing code formatting. You can either fix things like wrong indentation or missing {}
around code-blocks as you go or you can take little more systematic approach. The more systematic approach would constitute creating a style-guide and then applying it (and ideally enforcing it).
I said that we should create a style-guide, but for most programming languages there already is a set of standard conventions such as PEP8 for Python or Google Java Style Guide for Java, which you can adopt and use tools or IDE plugins to help you find and fix all formatting issues.
Regardless of which approach you choose, it's important to (carefully) choose one style and stick to it and also make sure everybody in the team follows same guide to keep things consistent.
Another safe area for refactoring is comments. It's good to have them as long as they're relevant, up-to-date and useful. Most of the time though, you will find a lot of comments that should be removed. Those can be redundant comments or docs that describe something self-explanatory, commented-out code or obsolete comments that should be updated (or removed if not relevant anymore). Of these three I really want to highlight the commented-out code, because a lot of people will say: "But I might need it later!" - no you won't, stop lying to yourself, just delete it now and if you will actually need it in the future, then you'll be able to find it in git
history.
When it comes to comments, just remember that comments do not make up for bad code.
Last thing for this section is dead code. It creates clutter and the longer you leave it in the codebase the harder it becomes to remove because no one knows what it does or whether it's needed or not. Nowadays, any decent IDE will tell you about unreachable dead code so you don't need to be scared about removing it, so just get rid of it, there's no reason to keep it in there.
Low Hanging Fruit
Oftentimes it's easy to find things to refactor, like with the simple improvements shown above. If you however need a few examples of crappy code to look for, then here's a list:
Spending a lot of time looking for good variable, class or function name is in my opinion a time very well spent. It's also easy thing to refactor that can greatly improve readability of code, without risking any damage to fragile code. So, how to choose good names? In general, use nouns for classes - preferably non-generic (no,
Data
is not a good name for class); verbs (or whole phrase) for a function and for variables?Well, there's not really a single rule for naming variables, but generally, anything descriptive and unambiguous should work. To be more specific and technical - you should always avoid using single character names like
x
ora
. For boolean variable names use a "question", e.g.isReady
orhasName
. Oftentimes people don't like to use long variable names, but it's better to use long name, rather then using meaningless name (but concise names are obviously better). One last general recommendation would be to stick to single letter case - whethercamelCase
,snake_case
or anything else, choose one and stick to it (and preferably use convention and nomenclature of specific programming language).- Similarly to variables, also constants can help your code to become more readable. Therefore, try to replace any magical numbers/values with sensible named constants and use
CAPITALIZED_WITH_UNDERSCORES
letter case. - Another thing ripe for refactoring are negated conditionals which are confusing, hard to read and error-prone. They're easy to remove though, as it's usually enough to reverse the bodies of
if
/else
block. As I mentioned a few times already, your IDE might be able to spot negated conditionals for you and suggest a fix. - We already spoke about proper code-block formatting, but I want to mention here one thing I really hate to see and that is when people omit braces/body of conditionals and loops, for example (note: code snippets throughout this article use various languages because all the tips are language agnostic):
If you omit braces like this you might save one line of code, but really you're just setting yourself up for some nasty, long debugging. On top of that, if you need to set breakpoint in such one-liners, you will find out that you can't really put the breakpoints in properly because you can only set them on whole line and not just the part in a conditional or loop.if (checkStuff(value)) doStuff(otherValue); // Terrible if (checkStuff(value)) doStuff(otherValue); // Better, but still bad if (checkStuff(value)) { // Can put breakpoint on this line doStuff(otherValue); // and also on this line }
- Last but not least is vertical separation - or in other words - big distance between definition of variables and their usage. Variables, but also functions and methods should be defined as close as possible to the place where they are used to make it easier to read the code. Big vertical distance in definition of local variables and its usage might be a sign of function that is too big and should broken up into smaller ones.
The most basic rule: Make only tiny changes and retest.
Digging Deeper
Now that you went through some of the code and have little more confidence modifying it, you might want to dig deeper and look for more specific flaws in the code. One of the flaws you will definitely find will be duplicate code. Simple rule of thumb with code duplication is that if you repeat same piece of code 3 times, it's time extract it to separate function or variable.
Another common flaw might be excessive use of boolean parameters. Passing boolean parameters to a function might indicate that this function is doing more than one thing and therefore should be split into (at least) two smaller ones.
Speaking of function parameters - you might sometime encounter a bad practice of modifying input parameters. Arguments are function input and that's it - use them as input and don't change their value. If you see this flaw in a code it's best to copy the value into new variable and avoid modifying the parameter.
Last issue related to parameters that you might want to look out for are functions with too many parameters. Up to 3 parameters are usually fine, but seeing any more than that should raise a suspicion. You should inspect such function and make decision whether it's OK to use so many arguments or whether it's better to break up and/or simplify the function.
Moving on to variables, there are 2 basic issues that you might encounter, first being redundant temporary variables. This would look something like this:
def greet(person):
first_name = person.name
last_name = person.surname
return f"Hello {first_name} {last_name}!"
Using this kind of variable is not useful in any way. It makes the code longer and doesn't explain or improve readability in any way so we should remove it and inline it like so:
def greet(person):
return f"Hello {person.name} {person.surname}!"
The other common issue would be opposite of the above - that is - code lacking variables, for example:
function isLeapYear(year) {
return ((year % 4 == 0) && (year % 100 != 0)) || (year % 400 == 0);
}
This piece of code - unlike the previous issue - would benefit from a few extra variables that would help to explain what the code does. It would also simplify debugging, as it would be easier to set breakpoints and watch for each variable. Refactored version can look for example like this:
function isLeapYear(year) {
var isFourthYear = ( year % 4) == 0
var isHundrethYear = (year % 100) == 0
var isFourHundredthYear = (year % 400) == 0
return ((isFourthYear && !isHundrethYear) || isFourHundradthYear);
}
Last candidate for refactoring in this section is very easy to spot - it is complex/composite conditional. We've all seen those huge multi-line conditionals with a few &&
and ||
grouped together. They're ugly, hard to parse, hard to debug and hard to modify. The simplest way to get rid of it - or rather abstract it - is to extract it into separate checking function. Example of that looks like so:
if (checkValidity(card) || checkCredit(user, card) && isActive(user)) { // Bad
doStuff(user, value);
}
else {
doOtherStuff(user);
}
if (isEligibility(user)) { // Better
doStuff(user, value);
}
else {
doOtherStuff(user);
}
Legacy Code
To be able to effectively refactor our code we need to have a decent test coverage, otherwise we're risking breaking the existing functionality. In previous sections I kinda assumed that we have tests and therefore can modify code and run tests suite to verify that everything is OK.
But how do we refactor code that doesn't have proper test coverage - or in other words how do we refactor Legacy Code? In cases where we can't rely on tests, I recommend only messing with what has to be changed, which would be code that has bugs that need fixing or part of codebase where we're adding new feature. In these areas start by writing tests for existing legacy code and only then proceed with any code modifications. I would personally start with first refactoring the old code to familiarize myself with it and to make it more "welcoming" to the necessary changes.
From this point you can proceed as with any other code that needs refactoring - take it one step at the time, start with small simple changes and retest frequently.
Leverage Tooling
Refactoring and in general taking good care of your code can be time-consuming, which can be problem on projects with tight schedule or when deadlines are approaching. These are the times when people throw all the good practices out of the window and try to "just ship it" as soon as possible.
To alleviate this pressure and make it easier to keep refactoring and improving the code even when there seems to be no time for it, try to use tools that can do some of the work for you. These can be linters like pylint
, code checkers like Checkstyle or code coverage tools like Sonarqube.
Speaking about code coverage and tests - also make sure your test suite is fast, otherwise you will end up dropping the slow tests or not running the suite at all.
In ideal world, all the above should be automated using some kind of CI/CD pipeline, which would run code checks with mandatory quality gates, report any code style inconsistencies, any possible bugs or issues before it even gets build and deployed.
Conclusion
This article is not an exhaustive list of "all things refactoring". For more extensive list and examples you should definitely check out books like Clean Code (especially chapter Smells and Heuristics) by uncle Bob or book called Refactoring by Martin Fowler.
Regardless of which guides or books on this topic you decide to follow, it eventually all boils down to a few simple rules. Always try to leave the code little cleaner then before - this way you can slowly but surely refactor and improve overall quality without too much effort and time spend. At the same time don't put off refactoring because doing it later usually mean never, which ends up producing terrible mess of a legacy code.