404 lines
11 KiB
Markdown
404 lines
11 KiB
Markdown
# Analysis Guide
|
||
|
||
CQaS provides multi-dimensional analysis of Python code, evaluating quality, security, maintainability, and technical debt. This guide helps you understand what each metric means and how to act on the results.
|
||
|
||
### Analysis Dimensions
|
||
|
||
CQaS evaluates code across five primary dimensions:
|
||
|
||
1. **Quality**: Overall code quality including complexity, style, and structure
|
||
2. **Security**: Vulnerability detection and security best practices
|
||
3. **Maintainability**: How easy the code is to modify and extend
|
||
4. **Complexity**: Cognitive and cyclomatic complexity measurement
|
||
5. **Technical Debt**: Accumulated shortcuts and suboptimal implementations
|
||
|
||
## Quality Metrics
|
||
|
||
### Overall Quality Score (0-100)
|
||
|
||
The overall quality score is a weighted composite of multiple factors:
|
||
|
||
- **Complexity Score (18%)**: Based on cyclomatic complexity
|
||
- **Maintainability Score (18%)**: _Enhanced_ IEEE maintainability index
|
||
- **Readability Score (16%)**: Code style and naming conventions
|
||
- **Style Score (14%)**: PEP8 compliance
|
||
- **Duplication Score (12%)**: Code duplication percentage
|
||
- **Debt Score (12%)**: Technical debt ratio
|
||
- **Security Score (10%)**: Security vulnerability assessment
|
||
|
||
#### Score Ranges
|
||
|
||
| Range | Category | Description |
|
||
| ------ | --------- | ----------------------------------- |
|
||
| 90-100 | Excellent | High-quality, production-ready code |
|
||
| 75-89 | Good | Well-structured with minor issues |
|
||
| 60-74 | Fair | Acceptable but needs improvement |
|
||
| 40-59 | Poor | Significant quality issues present |
|
||
| 0-39 | Critical | Major refactoring required |
|
||
|
||
### Maintainability Index
|
||
|
||
Based on the IEEE maintainability index formula, considering:
|
||
|
||
- **Halstead Volume**: Program vocabulary and length
|
||
- **Cyclomatic Complexity**: Number of independent paths
|
||
- **Lines of Code**: Physical code size
|
||
- **Comment Ratio**: Documentation coverage
|
||
|
||
#### Calculation Details
|
||
|
||
```
|
||
MI = 171 - 5.2 × ln(HalsteadVolume) - 0.23 × CyclomaticComplexity
|
||
- 16.2 × ln(LinesOfCode) + 50 × sin(√(2.4 × CommentRatio))
|
||
```
|
||
|
||
This formula is enhanced with additional factors to account for nesting depth and code duplication.
|
||
|
||
#### Maintainability Categories
|
||
|
||
| Score | Category | Action Required |
|
||
| ------ | --------- | ------------------------------ |
|
||
| 85-100 | Excellent | Maintain current practices |
|
||
| 70-84 | Good | Minor improvements |
|
||
| 55-69 | Fair | Moderate refactoring |
|
||
| 25-54 | Poor | Significant restructuring |
|
||
| 0-24 | Legacy | Complete rewrite consideration |
|
||
|
||
### Readability Metrics
|
||
|
||
#### Components
|
||
|
||
1. **Line Length Analysis**
|
||
- Average line length
|
||
- Maximum line length
|
||
- Long line count (>120 characters)
|
||
|
||
2. **Naming Convention Scoring**
|
||
- Variable naming (snake_case)
|
||
- Function naming (snake_case)
|
||
- Class naming (PascalCase)
|
||
|
||
3. **Documentation Quality**
|
||
- Comment ratio
|
||
- Docstring coverage
|
||
- Documentation quality assessment
|
||
|
||
4. **Code Structure**
|
||
- Nesting depth analysis
|
||
- Type hint coverage
|
||
|
||
#### Readability Score Calculation
|
||
|
||
The readability score combines:
|
||
|
||
- Line length penalties (40% weight)
|
||
- Naming convention compliance (30% weight)
|
||
- Documentation quality (20% weight)
|
||
- Code structure (10% weight)
|
||
|
||
### Halstead Metrics
|
||
|
||
Developed by Maurice Halstead, these metrics measure program complexity through operator and operand analysis:
|
||
|
||
#### Key Metrics
|
||
|
||
- **Vocabulary (η)**: η1 (unique operators) + η2 (unique operands)
|
||
- **Length (N)**: N1 (total operators) + N2 (total operands)
|
||
- **Volume (V)**: N × log₂(η)
|
||
- **Difficulty (D)**: (η1/2) × (N2/η2)
|
||
- **Effort (E)**: D × V
|
||
- **Time (T)**: E / 18 seconds
|
||
- **Bugs (B)**: V / 3000 (Halstead's formula)
|
||
|
||
Where:
|
||
|
||
- Volume is the information content of the program
|
||
- Difficulty is how hard the program is to understand
|
||
- Effort is the mental effort required to implement
|
||
- Time is the estimated implementation time
|
||
- Bugs are the predicted number of bugs
|
||
|
||
## Security Analysis
|
||
|
||
### Vulnerability Categories
|
||
|
||
CQaS detects 8 primary vulnerability types:
|
||
|
||
#### 1. SQL Injection (Base CVSS: 8.1)
|
||
|
||
Detects common SQL faults using insecure SQL and alike.
|
||
|
||
#### 2. Command Injection (Base CVSS: 9.8)
|
||
|
||
- `os.system()` usage
|
||
- `subprocess` calls with `shell=True`
|
||
- Dynamic command execution patterns
|
||
|
||
#### 3. Code Injection (Base CVSS: 9.3)
|
||
|
||
- `eval()` and `exec()` usage
|
||
- Dynamic code compilation
|
||
- Unsafe attribute manipulation
|
||
|
||
#### 4. Hardcoded Secrets (Base CVSS: 7.5)
|
||
|
||
- Password patterns in code
|
||
- API key detection
|
||
- Token and secret identification
|
||
- Base64/hex encoded secrets
|
||
|
||
#### 5. Weak Cryptography (Base CVSS: 7.4)
|
||
|
||
- MD5/SHA1 usage, deprecated/broken encryption algorithms
|
||
- Insecure random number generation
|
||
|
||
#### 6. Dangerous Imports (Base CVSS: 4.3)
|
||
|
||
- Unsafe deserialisation modules (`pickle`, `dill`)
|
||
- Command execution modules
|
||
- Deprecated security-sensitive modules
|
||
|
||
#### 7. Unsafe Deserialisation (Base CVSS: 8.8)
|
||
|
||
- Pickle/cPickle usage
|
||
- Dynamic object loading
|
||
- Untrusted data deserialisation
|
||
|
||
#### 8. Template Injection (Base CVSS: 8.5)
|
||
|
||
- Dynamic template generation
|
||
- User input in template contexts
|
||
- Server-side template injection patterns
|
||
|
||
### CVSS Scoring
|
||
|
||
CQaS uses CVSS v3.1 scoring methodology:
|
||
|
||
| CVSS Score | Severity | Priority |
|
||
| ---------- | -------- | --------------- |
|
||
| 9.0-10.0 | CRITICAL | Immediate fix |
|
||
| 7.0-8.9 | HIGH | High priority |
|
||
| 4.0-6.9 | MEDIUM | Medium priority |
|
||
| 0.1-3.9 | LOW | Low priority |
|
||
| 0.0 | NONE | Informational |
|
||
|
||
#### Security Score Calculation
|
||
|
||
The security score (0-100) is calculated by:
|
||
|
||
1. Weighting issues by severity (Critical: 2.0x, High: 1.5x)
|
||
2. Normalising against maximum possible impact
|
||
3. Converting to 0-100 scale (100 = no issues)
|
||
|
||
### Confidence Levels
|
||
|
||
Each security issue includes a confidence rating:
|
||
|
||
- **HIGH**: Strong evidence of vulnerability
|
||
- **MEDIUM**: Likely vulnerability, may need verification
|
||
- **LOW**: Potential issue, requires investigation
|
||
|
||
## Code Complexity
|
||
|
||
### Cyclomatic Complexity
|
||
|
||
Measures the number of linearly independent paths through code:
|
||
|
||
#### Complexity Drivers
|
||
|
||
- Conditional statements (`if`, `elif`)
|
||
- Loops (`for`, `while`)
|
||
- Exception handlers (`try/except`)
|
||
- Boolean operators (`and`, `or`)
|
||
- Comprehensions (counted with lower weight)
|
||
|
||
#### Complexity Categories
|
||
|
||
| Range | Category | Risk Level |
|
||
| ----- | ------------ | ------------- |
|
||
| 1-10 | Simple | Low risk |
|
||
| 11-20 | Moderate | Medium risk |
|
||
| 21-50 | Complex | High risk |
|
||
| 50+ | Very Complex | Critical risk |
|
||
|
||
### Cognitive Complexity
|
||
|
||
Measures how difficult code is to understand, focusing on:
|
||
|
||
- Nesting depth (higher weight for deeper nesting)
|
||
- Control flow breaks
|
||
- Recursive calls
|
||
- Complex boolean logic
|
||
|
||
### Complexity Hotspots
|
||
|
||
CQaS identifies complexity hotspots:
|
||
|
||
#### Function Hotspots
|
||
|
||
- Functions with cyclomatic complexity > 10
|
||
- High cognitive complexity functions
|
||
- Functions with excessive parameters
|
||
|
||
#### Class Hotspots
|
||
|
||
- Classes with > 20 methods
|
||
- Excessive inheritance depth
|
||
- Large class size (LOC)
|
||
|
||
## Technical Debt
|
||
|
||
### Debt Ratio Calculation
|
||
|
||
Technical debt ratio is calculated using weighted factors:
|
||
|
||
- **Complexity Debt (30%)**: Based on excess complexity
|
||
- **Duplication Debt (20%)**: Code duplication percentage
|
||
- **Security Debt (35%)**: Security issue impact
|
||
- **Style Debt (15%)**: PEP8 compliance issues
|
||
|
||
### Debt Time Estimation
|
||
|
||
Estimated time to fix technical debt:
|
||
|
||
#### Base Time Estimates
|
||
|
||
- **Complexity Issues**: 15 minutes per excess complexity point
|
||
- **Duplication**: 5 minutes per duplicated block
|
||
- **Security Issues**: 30 minutes per issue (severity-weighted)
|
||
- **Style Issues**: 2 minutes per PEP8 violation
|
||
|
||
#### Size Factor
|
||
|
||
Time estimates are adjusted by a logarithmic size factor based on lines of code.
|
||
|
||
### Debt Categories
|
||
|
||
| Debt Ratio | Category | Action |
|
||
| ---------- | -------- | --------------------- |
|
||
| 0-5% | Low | Acceptable debt level |
|
||
| 5-10% | Medium | Monitor and improve |
|
||
| 10-20% | High | Active debt reduction |
|
||
| 20%+ | Critical | Immediate attention |
|
||
|
||
## Code Health Indicators
|
||
|
||
### Dead Code Detection
|
||
|
||
CQaS identifies potentially unused code:
|
||
|
||
#### Detection Categories
|
||
|
||
1. **Unused Functions**
|
||
- Functions not called anywhere
|
||
- Excludes special methods (`__init__`, `__str__`, etc.)
|
||
- Excludes test functions and entry points
|
||
|
||
2. **Unused Classes**
|
||
- Classes never instantiated or referenced
|
||
- No inheritance usage detected
|
||
|
||
3. **Unused Imports**
|
||
- Imported modules not referenced in code
|
||
- Star imports flagged as potentially problematic
|
||
|
||
4. **Unused Variables**
|
||
- Variables assigned but never used
|
||
- Excludes single-letter variables and constants
|
||
|
||
#### Confidence Levels
|
||
|
||
- **HIGH**: Strong evidence of dead code
|
||
- **MEDIUM**: Likely unused, may have dynamic usage
|
||
- **LOW**: Potentially unused, needs investigation
|
||
|
||
### Code Duplication
|
||
|
||
#### Analysis Scope
|
||
|
||
1. **Within-file Duplication**
|
||
- Similar code blocks in same file
|
||
- Repeated patterns and structures
|
||
|
||
2. **Cross-file Duplication**
|
||
- Identical or similar code across files
|
||
- Copy-paste detection
|
||
|
||
#### Duplication Metrics
|
||
|
||
- **Duplicate Block Count**: Number of duplicated sections
|
||
- **Duplicated Lines**: Estimated lines affected
|
||
- **Duplication Percentage**: Percentage of total code
|
||
|
||
#### Thresholds
|
||
|
||
| Percentage | Level | Action |
|
||
| ---------- | ---------- | -------------------- |
|
||
| 0-3% | Acceptable | Normal level |
|
||
| 3-7% | Moderate | Consider refactoring |
|
||
| 7-15% | High | Active deduplication |
|
||
| 15%+ | Critical | Urgent refactoring |
|
||
|
||
### Import Analysis
|
||
|
||
#### Import Categories
|
||
|
||
1. **Standard Library**: Python built-in modules
|
||
2. **Third-party**: External dependencies
|
||
3. **Local**: Project-specific modules
|
||
4. **Circular**: Problematic circular dependencies
|
||
|
||
#### Import Health Indicators
|
||
|
||
- **Import Diversity**: Ratio of unique modules to total imports
|
||
- **Dependency Depth**: Analysis of import chains
|
||
- **Circular Dependencies**: Detection and reporting
|
||
- **Unused Imports**: Imported but unreferenced modules
|
||
|
||
## Interpreting Scores
|
||
|
||
### Score Correlation
|
||
|
||
Understanding how different scores relate:
|
||
|
||
#### High Quality + Low Security
|
||
|
||
- Well-structured code with security vulnerabilities
|
||
- Focus on security remediation
|
||
|
||
#### Low Quality + High Security
|
||
|
||
- Secure but poorly structured code
|
||
- Prioritize refactoring and code organisation
|
||
|
||
#### Balanced Scores
|
||
|
||
- Indicates overall healthy codebase
|
||
- Continue current practices
|
||
|
||
### Trending Analysis
|
||
|
||
When running CQaS regularly:
|
||
|
||
#### Improving Trends
|
||
|
||
- Quality scores increasing over time
|
||
- Decreasing technical debt ratio
|
||
- Fewer high-severity security issues
|
||
|
||
#### Declining Trends
|
||
|
||
- Increasing complexity without refactoring
|
||
- Growing technical debt
|
||
- Accumulating security vulnerabilities
|
||
|
||
### Priority Matrix
|
||
|
||
Use this matrix to prioritize improvements:
|
||
|
||
| Quality | Security | Priority | Action |
|
||
| ------- | -------- | -------- | ------------------------- |
|
||
| Low | Low | Critical | Comprehensive refactoring |
|
||
| Low | High | High | Focus on code structure |
|
||
| High | Low | High | Address security issues |
|
||
| High | High | Low | Maintain current quality |
|