How to Learn a New Code Base

Learning a new code base is hard. When you move to a new job, switch to a new team, or otherwise take on a project already in progress, you'll need to come up to speed as quickly as possible on a code base that's entirely new to you. After doing this a couple of times, I've found some things that work and some things that don't. Maybe some of these tips can help you the next time you inherit a code base and are at a loss for how to proceed.

The Mindset

When picking up a new code base, it's helpful to have the right mindset, and the right mindset is one of acceptance. Accept the fact that all of the code is going to be new. You don't know anything about it, yet. You may have decades of experience with programming and have seen a million lines of code, but you haven't seen this code. The more experience you have, the easier it will be to make associations with your prior knowledge, but you shouldn't make any assumptions about how this code base works. Let the code speak for itself, and treat this exercise as a pure learning experience.

Accept that it will be overwhelming at times, and that's okay. There will be an avalanche of information to explore, process, and understand. You'll be struggling to gain a toe-hold on the cliff of understanding, and trying to pull yourself up with your own skills and effort. Be patient and don't get frustrated. Persistence will give you the familiarity you need to be comfortable and effective with this new terrain.

Get an Overview

The first thing you're going to want to do with a new code base is try to orient yourself. Try to find the important parts of the system and how they're connected together. Try to identify the major structures and hierarchies and separate them from the multitude of details and specific objects in the system. There will likely be a lot of specific, low-level code that implements the features of the system, many of them fairly similar. These code sections can be mentally grouped together and put aside for the moment. For example, an embedded system is going to have a number of interfaces to different peripherals, an operating system is going to have hundreds of different drivers, and a web app is going to have lots of routes and helper functions. The details of these things are not important in the beginning, and focusing on them will only serve to confuse you. How those details are organized in the larger system is what's important right now.

Your goal should be to build up a foundation of knowledge to hang all of the workings of the system on. Once you have a good mental model of the system's organization, then you can start working out the details. If the code base is millions of lines of code, hopefully you have a mentor to help guide you in the right direction while exploring the code's architecture, otherwise it may take you a long time to get your bearings. If the code base is less than a hundred thousand lines of code, hopefully you don't have a mentor so you are forced to figure it out for yourself, otherwise it may take you a long time to ween yourself off of your dependence on your mentor. Smaller code bases are completely manageable for one person to learn without help, and you'll become self-sufficient faster if you start out that way.

As for the actual mechanics of gaining an overview of the code base, I haven't found documentation to be all that helpful. When documentation is even available—and in most of my experiences it is not—it tends to over-generalize and describe things too abstractly. It focuses on the run-time behavior of the code and all of the features of the code instead of what you really need to know—how the system is physically organized and connected together at the code level. Unit tests aren't much use for this purpose, either. They're so specific to the low-level functionality that they're testing that they fail to give you any kind of overview of the code base at large. Unit tests are great for seeing how to set up and do specific things with the code, but not for finding your way around when everything is completely new to you.

What I've found works best to get an overview of a code base is to start with the files and directory structure of the code. An organized directory structure and good file names can say a lot about how a software system is put together. Code files that do related things tend to be grouped together in directories, and file names should identify the basic functionality of the code they contain. Browsing through the directory structure can reveal how you should work with the code base, and more importantly, where to find things. Often times, finding the right file for the functionality that you need to work on is half the battle. If the files and directory structure aren't well organized, you'll have a harder time of it, but this strategy is still useful. You may also come up with some good ways to organize things better, and that line of thinking will also help you organize the layout of the code base in your head.

Once you know your way around the file system, more or less, you should look for the main entry point to the system. This entry point is where the system will be initialized at run time, and it's a treasure trove of information on how the system is organized. Reading the initialization code and stepping through it, maybe mentally while reading the code or literally by loading it into a debugger, will help you better understand how the system is built up at run time and how everything is connected together. Sometimes the main entry point will instantiate a bunch of objects, build up global tables or singletons, and hook them all up before setting a main loop in motion. Other times it will spin up a bunch of threads or tasks and a main message loop to coordinate communication between them. These threads will be the main functions of the system, and it's worth your time to become familiar with how they operate and interact.

This part of learning a new code base is much like learning a new level in a real-time strategy (RTS) game like Starcraft. You start out with a small view of the level where your newly-established base is set. You can see a limited area immediately around your base, and the rest of the map is black. One of the first things you'll want to do is send off a unit as a scout to see what's around. You're looking for obstacles, resources, and enemy bases. Knowledge is power, and to get it you need to explore. That's all getting an overview of a new code base is, doing your best to explore it and appreciating the sights that you'll see.

Learn By Doing

Getting an overview of the code base is all about exploration. The rest of learning the code base is all about experimentation. You have to be willing to try things, change things, and break things. Be sure to take good notes and learn from your mistakes. Oh, and you are using version control, aren't you? It will definitely make your experiments easier to roll back, more productive for you, and less harmful to the code base.

One of the best things you can do to get more familiar with a system after you have a reasonably good idea of how it's organized is to go in and fix a bug. Ask your team (or if that's not possible, figure out for yourself) what's broken and pick a bug that looks manageable to start with. Then dive in and try to fix it. While getting an overview gave you breadth of the system, this exercise is going to give you depth. You'll probably have to figure out how at least a few parts of the system interact and understand the detailed implementation of one section of the code base to fix the bug. It's a great way to expand your knowledge of the system while making a real contribution to your team.

Beyond fixing bugs you can also try adding new features to the code. I wouldn't start with a far-reaching feature that requires major changes to the architecture, but there's always plenty of small features that can be added and are good exercises for learning the code base in more depth. Try to fit new features into the existing code as neatly as possible. The goal here is not to build an entire new wing off of the existing structure that looks nothing like the code that's already there. You want to be as true to the current architecture as possible and see how well you can understand how things currently work when you're adding the new feature.

One more way to experiment with the code base is to pick a reasonably sized file, preferably somewhere in the middle of the architecture, and rewrite it. Note that I did not say rewrite the entire system, just one file. If you pick one that's fairly connected to the rest of the system, something that implements part of the core functionality, you'll learn a lot about how the system works. The reason I suggest rewriting it instead of reading it is that you tend to miss a lot of the subtleties of code when you're only reading it. If you rewrite it, you have to pay more attention to method calls, data structures, algorithms, and everything else that makes the code work. If you rewrite some code and keep the system working, you'll be well on your way to learning the system more fully. Unless the rewrite is also a significant improvement, you'll probably want to roll back your changes when you're done, lest you tick off all of your teammates with unnecessary changes.

Finally, when you're fixing bugs or adding features, you're likely to have questions about how to do things. When you have questions, try to do as much research as you can around the question so that instead of asking a generic "How do I do this?" question, you can ask a very specific question with plenty of context. A well-researched question would sound something like, "I'm trying to do x, and from what I understand of the system, I should be able to do it like this…<explanation>… I'm planning to implement it like so…<explanation>…, and I've found this and this location that seem to be good places to hook in my code. I'm stuck on where this code gets called or where this configuration is set up." That type of question gives the other developer something to chew on. You may be totally off-base and have to switch directions, but it shows that you were thinking and putting in a good-faith effort. You'll also have learned much more about the code base than if someone else always immediately tells you what to do without any effort on your part. You internalize things better if you work them out and discover them for yourself. Many times, doing this work on the question will lead you to the answer without even having to ask anyone for help.

Learning a new code base doesn't have to be terrifying or overwhelming. If you put yourself in the right mindset and accept that there's a lot you'll have to learn, you can knuckle down and prepare for the work ahead. With some patience, the exploration and experimentation can be rewarding and even fun. Make the most of it, and don't worry. You'll come up to speed before you know it.

Lucid Mesh

Search This Blog