Navigating large repositories: How I learned to read a codebase with nearly a million lines of code
1:55 PM - 2:20 PM EST
It’s often said that reading code is a more necessary skill than writing code. Why is that? Well, before you make even a small addition to a codebase, you’ve got to know the context: what other files is this function used in? Why does it take parameters of a certain type? Once I improve its performance, how can I test it?
The answer to each of those questions requires careful reading. When the codebase gets big, however, reading the code gets complicated. There are codebases out there that boast upwards of a million lines of code.
Apache Kafka has close to that many lines. I’ve just made my first contribution to the Streams project, and I’ve learned a few things about navigating large codebases. I’ll share them with you, including detailed tips for GitHub search and IntelliJ, as well as the general problem-solving principles behind understanding large structured amounts of code. We’ll learn how to take a proactive approach to identify good first issues, how to read tests, and how to ask for help efficiently.
By the end of my talk, you’ll feel more comfortable not just contributing to Apache Kafka but to any open-source project, large or small.