Skip to main content

Sourcery Integration for Automated Code Review

Objective: Integrate Sourcery GitHub Action for automated code review and fixes.

Description: Set up Sourcery to automatically review Python code changes, suggest improvements, and optionally apply fixes to maintain high code quality standards across the Smart RAG project.

Dependencies: GitHub repository with Actions enabled, SOURCERY_TOKEN secret configured

Details:

  • Configure Sourcery GitHub Action workflow
  • Set up triggers for push and pull request events
  • Configure Python-specific rules and checks
  • Enable targeted directory scanning
  • Implement security and performance checks

Status: Done

Test Strategy:

# Test workflow syntax
yamllint .github/workflows/sourcery.yml
yamllint .sourcery.yaml

# Create a test PR to trigger Sourcery
git checkout -b test/sourcery-integration
echo "print('test')" > test_sourcery.py
git add test_sourcery.py
git commit -m "Test Sourcery integration"
git push origin test/sourcery-integration

Sourcery Integration Architecture

flowchart TD
subgraph "GitHub Events"
PR[Pull Request]
PUSH[Push to Branch]
end

subgraph "Sourcery Workflow"
TRIGGER[Workflow Triggered]
CHECKOUT[Checkout Code]
PYTHON[Setup Python 3.11]
REVIEW[Run Sourcery Review]
FIX[Apply Fixes]
COMMENT[Comment on PR]
REPORT[Generate Report]
end

subgraph "Code Analysis"
QUALITY[Code Quality Check]
SECURITY[Security Scan]
PERF[Performance Analysis]
ML[ML-Specific Checks]
end

PR --> TRIGGER
PUSH --> TRIGGER
TRIGGER --> CHECKOUT
CHECKOUT --> PYTHON
PYTHON --> REVIEW
REVIEW --> QUALITY
REVIEW --> SECURITY
REVIEW --> PERF
REVIEW --> ML
QUALITY --> FIX
FIX --> COMMENT
COMMENT --> REPORT

Configuration Files

1. GitHub Actions Workflow (.github/workflows/sourcery.yml)

The workflow is configured to:

  • Trigger on pushes to main, develop, feature/, and hotfix/ branches
  • Trigger on pull requests (opened, synchronized, reopened)
  • Use Python 3.11 to match the project version
  • Enable automatic fixes with fix: 'true'
  • Request review from PR authors
  • Include inline configuration for Python-specific settings

Key features:

  • Automatic PR comments with review summary
  • Artifact upload for detailed reports
  • Custom rules for RAG system patterns
  • Security checks for common vulnerabilities
  • Performance suggestions for async code and caching

2. Project Configuration (.sourcery.yaml)

Detailed configuration includes:

Code Quality Settings

  • Minimum confidence threshold: 0.8
  • Maximum complexity: 10
  • Maximum method length: 50 lines
  • Minimum quality score: 7.5

Path Configuration

include:
- "backend/**/*.py"
- "src/**/*.py"
- "ml/**/*.py"
- "scripts/**/*.py"

exclude:
- "**/migrations/**"
- "**/tests/**"
- "**/__pycache__/**"
- "**/venv/**"

Custom Rules

  1. Structured Logging: Convert f-strings to structured logging
  2. Async Context Managers: Use async with for resource management
  3. Vector DB Optimizations: Batch operations for vector insertions
  4. Security Patterns: Detect hardcoded secrets and SQL injection risks

ML/RAG-Specific Checks

  • Data leakage detection
  • Model versioning validation
  • Embedding consistency checks
  • Context window limit validation

Usage Guidelines

For Developers

  1. Before Committing: Run Sourcery locally

    sourcery review --diff "git diff"
  2. In Pull Requests:

    • Wait for Sourcery to complete its analysis
    • Review suggested changes in the PR comments
    • Accept or reject automatic fixes
    • Address any security or quality issues
  3. Configuration Updates:

    • Modify .sourcery.yaml for project-wide settings
    • Update workflow file for CI/CD changes
    • Add custom rules for project-specific patterns

For Maintainers

  1. Monitor Code Quality:

    • Check Sourcery reports in PR artifacts
    • Track quality metrics over time
    • Adjust thresholds based on project needs
  2. Security Oversight:

    • Review security findings regularly
    • Update security patterns as needed
    • Ensure secrets are never committed
  3. Performance Optimization:

    • Act on async improvement suggestions
    • Implement caching where recommended
    • Monitor batch operation opportunities

Best Practices

  1. Code Quality

    • Maintain quality score above 7.5
    • Keep methods under 50 lines
    • Use type hints for all public functions
  2. Security

    • Never hardcode API keys or secrets
    • Use parameterized queries for database operations
    • Follow OWASP guidelines for web security
  3. Performance

    • Prefer async/await for I/O operations
    • Use batch operations for database and vector store
    • Implement caching for expensive computations
  4. RAG-Specific

    • Validate embedding dimensions
    • Check retrieval context limits
    • Monitor vector store performance

Troubleshooting

Common Issues

  1. Workflow Not Triggering

    • Verify branch names match workflow triggers
    • Check if SOURCERY_TOKEN secret is set
    • Ensure GitHub Actions are enabled
  2. False Positives

    • Add specific patterns to ignore list
    • Adjust confidence thresholds
    • Use inline comments to suppress warnings
  3. Performance Impact

    • Reduce parallel jobs if needed
    • Exclude large generated files
    • Enable incremental analysis

Debug Commands

# Validate workflow syntax
act -l

# Test workflow locally
act -j sourcery-review

# Check Sourcery configuration
sourcery review --check-config

# Run Sourcery on specific files
sourcery review src/graphrag/rag_system/main.py

Integration Benefits

  1. Automated Code Review: Consistent code quality checks on every change
  2. Security Scanning: Early detection of vulnerabilities
  3. Performance Insights: Suggestions for optimization opportunities
  4. RAG-Specific Validation: Custom rules for ML/AI patterns
  5. Team Productivity: Reduced manual review time

Future Enhancements

  1. Custom Plugins: Develop RAG-specific Sourcery plugins
  2. Metrics Dashboard: Integrate with monitoring tools
  3. Auto-merge: Enable for high-confidence fixes
  4. Learning Mode: Train on project-specific patterns
  5. IDE Integration: Setup for VS Code and PyCharm