Working with Large Code Bases

An advanced Software Engineering Course that teaches students how to navigate large code bases, sometimes with the aid of an AI Assistant.

Public Course Materials

Course Description

Introduces students to the skills for working with large, existing code bases with and without GenAI assistance. The first half of the course focuses on program comprehension, debugging, and testing without IDE-based AI assistants, such as Github Copilot. The second half of the course focuses on using IDE-based AI assistants and agents to make correct, high-quality modifications to code bases. These skills prepare students for the reality of professional software development, where they will need to comprehend, modify, debug, and test code in a large code base with and without GenAI tools. These skills are assessed through homeworks, oral exams, and proctored skill demonstrations. By the end of the course, students will have added or upgraded 6 features in an open-source code base written in Python and completed a larger team project where they propose and implement a larger feature addition to the code base.

Prerequisites: A “traditional” software engineering course where they learn about the software development life cycle and basic version control.

Course Language: Python as well as interacting with GenAI programming tools

Book for the course: Working with Large Code Bases with and without GenAI

What students will learn

Course Topics: Program Comprehension (code navigation, using a debugger, reverse diagramming), Unit and Integration Testing, Project Management, Code Quality, and AI for SWE

Accessing the Course

Link to Public Materials (lecture handouts and homeworks), Course Syllabus

Creators: Gerald Soosairaj, Anshul Shah, Jerry Yu, and Thanh Tong

License: CC BY-NC-SA 4.0

Research publications about the course:

Experience report describing the course motivation and design:

Anshul Shah, Jerry Yu, Thanh Tong, and Adalbert Gerald Soosai Raj. 2024. Working with Large Code Bases: A Cognitive Apprenticeship Approach to Teaching Software Engineering. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2024).

The course content is based on findings about students’ struggles to comprehend large code bases:

Anshul Shah, Thomas Rexin, Anya Chernova, Gonzalo Allen-Perez, William G. Griswold, and Adalbert Gerald Soosai Raj. 2025. Needles in a Haystack: Student Struggles with Working on Large Code Bases. In Proceedings of the 2025 ACM Conference on International Computing Education Research V.1 (ICER '25). (Best Paper Award)

Anshul Shah, Anya Chernova, Elena Tomson, Leo Porter, William G. Griswold, and Adalbert Gerald Soosai Raj. 2025. Students' Use of GitHub Copilot for Working with Large Code Bases. In Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSETS 2025).

Anshul Shah, Thomas Rexin, Elena Tomson, William G Griswold, Leo Porter, and Adalbert Gerald Soosai Raj. 2025. Evolution of Programmers' Trust in Generative AI Programming Assistants. In Proceedings of the 25th Koli Calling International Conference on Computing Education Research (Koli Calling '25).

Popular press articles about the course:

A UC San Diego feature about teaching students how to calibrate their trust in GenAI assistants while working with large code bases. In the course, we cover basics of natural language processing, including how LLMs probabilistically predict the next token to output and how non-determinism impacts GenAI usage.

The course creators have given several invited talks at universities such as California Polytechnic-San Luis Obispo and UC Davis about implementing the course. The creators have also publicly shared the course materials with colleagues at Stanford University and universities in Germany and Saudi Arabia (which has now implemented a course based on this course).

Units

The total course is designed for a 10-week quarter. The course includes two 80 minute lectures weekly. Materials may be adapted to other schedules.

Unit 1: Program Comprehension

Unit 1 teaches students how to navigate between and comprehend code in a large code base using features of high-powered IDEs (such as VSCode). Students learn how to conduct structurally-guided code searches, rather than relying on opportunistic search strategies like scrolling and browsing filenames.

Unit 1 Module 1: Code Navigation in High-Powered IDEs
Unit 1 Module 2: Using a Debugger for Code Comprehension
Unit 1 Module 3: Reverse Diagramming
Unit 1 Module 4: Unit Tests for Comprehension

Unit 2: Unit Testing

Unit 2 teaches students about the architecture of unit tests, introducing concepts such as mocking and test coverage to understand how unit tests are applied in a large code base.

Unit 2 Module 1: Life Cycle of a Unit Test
Unit 2 Module 2: Mocking
Unit 2 Module 3: Test Code Coverage

Unit 3: Project Management

Unit 3 emphasizes the importance of project management, such as reviewing teammates’ pull requests, setting and maintaining code style standards, and task decomposition. Students directly apply these concepts in the group project they complete in the last half of the course.

Unit 3 Module 1: Branch Management
Unit 3 Module 2: Code Review and Pull Requests
Unit 3 Module 3: Code Quality and Design
Unit 3 Module 4: Task Management and Planning (Decomposition)

Unit 4: AI-Assisted Software Development

Unit 4 exposes students to the different ways to use IDE-based AI assistants and agents to work with existing code bases. Students are first exposed to basic concepts of natural language processing–the underlying technique behind the AI tools they will use—to help them calibrate their trust and reliance on these tools. Then, students learn about the specific features within Github Copilot for working with large code bases, such as using the codebase as context for a query, creating code explanations to aid in comprehension, and using AI agents for eliciting requirements, reviewing code, and creating documentation.

Unit 4 Module 1: AI Literacy for IDE-based Assistants
Unit 4 Module 2: Code Comprehension with Github Copilot
Unit 4 Module 3: AI Agents for SWE