Skip to content

Buổi 05: Codebase Onboarding — Đọc Hiểu Dự Án Trong 1 Giờ 🔍

Thành quả: Onboard 1 open-source project thật, tạo skeleton index + architecture diagram


🎯 Mục Tiêu

  1. Đọc hiểu bất kỳ codebase nào trong 1 giờ
  2. Master cm-codeintell: Skeleton Index + Architecture Diagram + CodeGraph
  3. Biết đặt câu hỏi đúng khi join team
  4. Tạo documentation tự động với cm-dockit
  5. Setup semantic search cho large codebases (cm-deep-search)

📖 Phần 1: Vấn Đề — The "New Project" Fear

Kịch bản thực tế

Ngày đầu đi làm:
Manager: "Em clone repo này, tuần tới start feature mới nhé"
Repo: 847 files, 12 npm packages, no README, last commit: 6 tháng trước
Em: 😱 "Bắt đầu từ đâu?"

Traditional approach:
Day 1: Mở random file, đọc không hiểu
Day 2: Hỏi senior, senior đang bận
Day 3: Run project, 15 errors, Google từng cái
Day 4-5: Bắt đầu hiểu sơ sơ
Week 2: Mới dám sửa code
→ 2 TUẦN onboarding ❌

VibeCoding approach:
Hour 1: cm-codeintell → skeleton + architecture → understand structure
Hour 2: cm-dockit → auto-generate knowledge base
Hour 3: Read AGENTS.md + key config → understand conventions
Hour 4: Try 1 small change → verify understanding
→ 1 NGÀY onboarding ✅

📖 Phần 2: cm-codeintell — Code Intelligence

Skeleton Index

Skeleton Index = bản tóm tắt cực kỳ compressed của codebase, giữ lại:

  • File structure
  • Export/function signatures
  • Class hierarchies
  • Key imports
bash
# Generate skeleton
cm index skeleton

# Output: .cm/skeleton.md
# 95% token compression — 847 files → 1 readable file

Architecture Diagram

bash
# Auto-generate Mermaid architecture
cm index architecture

# Output: .cm/architecture.mmd

Ví dụ output:

CodeGraph (Advanced)

bash
# Build AST knowledge graph
codegraph build

# Query: most connected modules
codegraph query "most-connected"
# → auth.service.ts (27 connections)
# → db.client.ts (23 connections)
# → error.handler.ts (19 connections)

# Query: dead code
codegraph query "unreachable"
# → src/utils/legacy-helper.ts (0 imports!)
# → src/middleware/old-auth.ts (0 imports!)

📖 Phần 3: 5 Steps Onboarding Protocol

Step 1: Bird's Eye View (10 min)

bash
# 1a. Cấu trúc thư mục
tree -L 2 -I 'node_modules|.git|dist|.next'

# 1b. Package.json insights
cat package.json | jq '{
  name, version, main,
  scripts: .scripts,
  deps: (.dependencies | keys | length),
  devDeps: (.devDependencies | keys | length)
}'

# 1c. Key config files
ls -la *.config.* .env* tsconfig.json docker*

Step 2: Skeleton Scan (15 min)

bash
# Generate & read skeleton
cm index skeleton
cat .cm/skeleton.md

# Generate & view architecture
cm index architecture
# Open .cm/architecture.mmd in Mermaid viewer

Step 3: Entry Points (10 min)

bash
# Find entry files
grep -rn "listen\|createServer\|createApp" src/ --include="*.ts" --include="*.js"

# Find route definitions
grep -rn "router\.\|app\.\(get\|post\|put\|delete\)" src/ --include="*.ts"

# Find database models
find src -name "*.model.ts" -o -name "*.entity.ts" -o -name "*.schema.ts"

Step 4: Critical Paths (15 min)

Đọc 3 files quan trọng nhất:

  1. Entry point (main server file)
  2. Most connected module (from CodeGraph)
  3. Auth/middleware (security layer)
bash
# AI-assisted code reading
cat src/server.ts | gemini "Explain this entry point:
- What middleware is configured?
- What routes are registered?
- What's the error handling strategy?
- Any anti-patterns?"

Step 5: Test & Modify (10 min)

bash
# Run existing tests
npm test

# Make 1 small safe change
# e.g., Add a log message
# Verify it works → You understand the code flow!

📖 Phần 4: cm-dockit — Auto Documentation

bash
# Generate full knowledge base
cm-dockit analyze .

# Output structure:
docs/
├── architecture.md    # High-level architecture
├── api-reference.md   # All API endpoints
├── data-models.md     # Database schema
├── deployment.md      # Deploy instructions
├── dev-guide.md       # Developer onboarding
└── personas/          # User persona docs

Questions Template Cho Team

Khi join team, hỏi ĐÚNG câu hỏi:

markdown
## Onboarding Questions

### Architecture
1. Có architecture diagram không? (Nếu không → cm-codeintell tạo)
2. Design patterns nào đang dùng? (MVC? Clean Architecture? DDD?)
3. Có monorepo hay poly-repo?

### Development
4. Branch strategy? (GitFlow? Trunk-based?)
5. Code review process? (Ai review? Bao lâu?)
6. Test coverage target? (Unit? E2E?)
7. CI/CD pipeline flow?

### Infrastructure
8. Deploy schedule? (Daily? Weekly? On-demand?)
9. Environments? (Local → Staging → Production?)
10. Monitoring? (Error tracking? Performance monitoring?)

IF source files > 200 OR doc files > 50:
  → Setup semantic search (qmd)
  → Index once, query many times
  → BM25 ranking cho relevant results
bash
# Setup qmd indexing
cm-deep-search setup

# Query
qmd search "authentication middleware"
qmd search "database connection pool"
qmd search "error handling pattern"

🧪 Lab: Real Project Onboarding

Task: Onboard Express.js Open-Source Project (45 min)

bash
# 1. Clone a real project
git clone https://github.com/hagopj13/node-express-boilerplate.git
cd node-express-boilerplate

# 2. 5-Step Protocol
# Step 1: Bird's eye (5 min)
tree -L 2 -I 'node_modules|.git'
cat package.json | jq '.scripts'

# Step 2: Skeleton scan (10 min)
# (Manual or cm-codeintell)
find src -name "*.js" | head -30
grep -rn "module.exports" src/ | head -20

# Step 3: Entry points (5 min)
cat src/index.js
cat src/app.js

# Step 4: Critical paths (10 min)
cat src/middlewares/auth.js | gemini "Explain auth flow"
cat src/routes/v1/ | head -20

# Step 5: Test & modify (10 min)
npm install
npm test
# Add a test log → verify

Deliverable: Write your onboarding doc

markdown
# [Project Name] — Onboarding Summary

## Tech Stack: [list]
## Architecture: [pattern]
## Entry Point: [file]
## Database: [type + ORM]
## Auth Strategy: [JWT/Session/OAuth]
## Key Files:
1. [file1 — purpose]
2. [file2 — purpose]
3. [file3 — purpose]
## Test Coverage: [% or "no tests"]
## My Assessment: [observations]

🎓 Tóm Tắt

StepTimeTool
Bird's Eye10 mintree, package.json
Skeleton15 mincm-codeintell
Entry Points10 mingrep, find
Critical Paths15 minAI-assisted reading
Test & Modify10 minnpm test + small change
Total60 minFull understanding

⏭️ Buổi tiếp theo

Buổi 06: Strategic Analysis — Phân Tích 4 Chiều 🔍

Powered by CodyMaster × VitePress