general7 min read

database-optimization-guide

Content rendering notice

This document is displayed in raw format.

# Database Optimization Guide

## Overview

This document outlines the comprehensive database optimization strategy
implemented for the Space Strategy Game to achieve production-scale performance,
data integrity, and scalability.

## Key Metrics Achieved

- **Query Performance**: <100ms p95 response time
- **Cache Efficiency**: >90% hit rate for cacheable data
- **Connection Pooling**: 80% max utilization threshold
- **Data Integrity**: Zero inconsistencies with proper constraints
- **Scalability**: Horizontal scaling with read replicas and partitioning

## 1. Schema Optimizations

### 1.1 Partitioning Strategy

Large tables are partitioned to improve performance and manageability:

```sql
-- Game Events partitioned by game_id (8 partitions)
CREATE TABLE game_events_partitioned PARTITION BY HASH (game_id);

-- Orders partitioned for time-based queries
CREATE TABLE orders_partitioned PARTITION BY HASH (game_id);
```

**Benefits:**

- Reduced index size per partition
- Faster query pruning
- Parallel query execution
- Easier archiving

### 1.2 Critical Indexes

#### Game State Lookup

```sql
CREATE INDEX idx_game_state_lookup
ON games(status, current_turn, updated_at DESC)
WHERE status IN ('in_progress', 'paused');
```

#### Fleet Movement & Combat

```sql
CREATE INDEX idx_fleets_movement_combat
ON fleets(player_id, current_system_id, status)
WHERE status IN ('moving', 'orbiting');
```

#### Spatial Queries

```sql
CREATE INDEX idx_systems_spatial
ON systems USING GIST (point(position_x, position_y));
```

### 1.3 Foreign Key Constraints

Added missing constraints for data integrity:

- `game_events` โ†’ `games` (CASCADE)
- `orders` โ†’ `games` (CASCADE)
- `orders` โ†’ `players` (CASCADE)

## 2. Query Optimization

### 2.1 N+1 Query Prevention

The `OptimizedQueryService` provides:

- Batch query execution with dependency resolution
- Pre-built optimized queries for common patterns
- Single-query alternatives to multiple round trips

Example:

```typescript
// Before: N+1 queries
const games = await prisma.game.findMany({ include: { players: true } });
for (const game of games) {
  game.players[0].user = await prisma.user.findUnique(...);
}

// After: Single optimized query
const gameState = await optimizedQuery.getGameState(gameId);
```

### 2.2 Batch Operations

```typescript
// Batch fleet updates in single transaction
await optimizedQuery.batchUpdateFleets([
  { id: '1', positionX: 100, positionY: 200 },
  { id: '2', positionX: 150, positionY: 250 },
  // ... more fleets
]);
```

### 2.3 Materialized Views

Leaderboard pre-aggregation:

```sql
CREATE MATERIALIZED VIEW leaderboard_fast AS
SELECT user_id, games_played, games_won, win_rate
FROM player_statistics
WHERE deleted_at IS NULL;
```

Refresh strategy: Every 15 minutes during off-peak hours

## 3. Caching Strategy

### 3.1 Multi-Level Caching

1. **Application Cache (Redis)**
   - Query results: 5-15 minutes TTL
   - Game state: 1 minute TTL
   - Leaderboards: 15 minutes TTL

2. **Query Result Caching**

   ```typescript
   const result = await prisma.queryWithCache(
     `game:${gameId}:state`,
     () => getGameState(gameId),
     60, // seconds
   );
   ```

3. **Cache Invalidation**
   - Pattern-based invalidation
   - Event-driven cache updates
   - Write-through cache for critical data

### 3.2 Cache Patterns

- **Cache-Aside**: Application manages cache
- **Write-Behind**: Async writes for non-critical data
- **Read-Through**: Transparent cache misses

## 4. Connection Management

### 4.1 Connection Pool Configuration

```typescript
// Production settings
{
  min: 5,           // Minimum connections
  max: 20,          // Maximum connections
  idleTimeout: 30000, // 30 seconds
  acquireTimeout: 60000, // 60 seconds
}
```

### 4.2 Connection Pool Monitoring

```typescript
// Monitor pool utilization
const stats = await prisma.getPoolStats();
if (stats.activeConnections / stats.totalConnections > 0.8) {
  // Alert: High pool utilization
}
```

## 5. Performance Monitoring

### 5.1 Metrics Collection

The `DatabaseMonitoringService` tracks:

- Connection pool metrics
- Query performance (avg time, slow queries)
- Cache hit rates
- Table bloat
- Index usage

### 5.2 Alerting Rules

- **High Connections**: >80% utilization
- **Slow Queries**: >100ms average
- **Table Bloat**: >20% dead tuples
- **Unused Indexes**: Zero usage

### 5.3 Dashboard

Real-time metrics available at `/api/health/database-detailed`:

- Connection trends
- Query performance graphs
- Cache efficiency charts
- Table size evolution

## 6. Data Integrity

### 6.1 Constraints and Validations

```sql
-- Checksum validation for game snapshots
CREATE TRIGGER validate_game_snapshot
BEFORE INSERT ON game_snapshots
FOR EACH ROW EXECUTE FUNCTION validate_game_snapshot();

-- Automatic statistics updates
CREATE TRIGGER update_player_stats
AFTER UPDATE ON games
FOR EACH ROW EXECUTE FUNCTION update_player_stats();
```

### 6.2 Row Level Security (RLS)

```sql
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY user_data_policy ON users
USING (id = current_setting('app.current_user_id')::UUID);
```

## 7. Scaling Strategy

### 7.1 Read Replicas

Configuration for read replicas:

```typescript
{
  readReplicas: [
    'postgresql://replica1:5432/db',
    'postgresql://replica2:5432/db'
  ],
  loadBalanceHosts: true,
  targetSessionAttrs: 'read-write'
}
```

### 7.2 Sharding Strategy

When to shard:

- Games table > 10M rows
- Events table > 100M rows
- Query latency > 200ms

Sharding key: `game_id`

### 7.3 Archiving

Automated archiving for completed games:

```sql
-- Archive old events
CALL archive_completed_game(game_uuid);
```

Retention policy:

- Active games: Keep all data
- Completed games: 30 days
- Archive after: 90 days

## 8. Backup and Recovery

### 8.1 Backup Strategy

```bash
# Daily full backup
pg_dump -h localhost -U postgres -d space_game > backup_$(date +%Y%m%d).sql

# Point-in-time recovery
pg_basebackup -h localhost -D /backup/base/$(date +%Y%m%d) -v
```

### 8.2 WAL Configuration

```ini
wal_level = replica
max_wal_senders = 3
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp %p /backup/wal/%f'
```

## 9. Maintenance Procedures

### 9.1 Daily Tasks (2 AM)

1. **VACUUM and ANALYZE**

   ```sql
   VACUUM (ANALYZE, VERBOSE) game_events;
   ANALYZE games;
   ```

2. **Refresh Materialized Views**

   ```sql
   REFRESH MATERIALIZED VIEW CONCURRENTLY leaderboard_fast;
   ```

3. **Update Statistics**
   ```sql
   ANALYZE VERBOSE;
   ```

### 9.2 Weekly Tasks

1. **Reindex Corrupted Indexes**
2. **Check for Unused Indexes**
3. **Monitor Table Bloat**
4. **Review Slow Queries**

### 9.3 Monthly Tasks

1. **Full Database Vacuum**
2. **Archive Old Data**
3. **Update Table Statistics**
4. **Review Query Patterns**

## 10. Environment Configuration

### 10.1 Development

```env
DATABASE_URL="postgresql://localhost:5432/space_game_dev"
DB_POOL_MIN=2
DB_POOL_MAX=5
DB_LOGGING=true
DB_SLOW_QUERY_THRESHOLD=50
```

### 10.2 Production

```env
DATABASE_URL="postgresql://user:pass@prod-db:5432/space_game"
DB_POOL_MIN=10
DB_POOL_MAX=50
DB_SHARED_BUFFERS=1GB
DB_EFFECTIVE_CACHE_SIZE=4GB
DB_LOG_MIN_DURATION_STATEMENT=100
```

## 11. Troubleshooting Guide

### 11.1 Common Issues

#### Slow Queries

1. Check `pg_stat_statements`
2. Run `EXPLAIN ANALYZE`
3. Verify index usage
4. Consider query rewrite

#### High CPU Usage

1. Check connection pool
2. Monitor query performance
3. Review background workers
4. Consider read replicas

#### Memory Issues

1. Adjust `shared_buffers`
2. Check `work_mem`
3. Monitor cache usage
4. Review connection limits

### 11.2 Diagnostic Queries

```sql
-- Find slow queries
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
WHERE mean_time > 100
ORDER BY mean_time DESC;

-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0;

-- Monitor locks
SELECT blocked_locks.pid AS blocked_pid,
       blocked_activity.usename AS blocked_user,
       blocking_locks.pid AS blocking_pid,
       blocking_activity.usename AS blocking_user
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
  ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
  ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity
  ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
```

## 12. Future Optimizations

### 12.1 Planned Improvements

1. **pgBouncer Integration**
   - Transaction pooling
   - Connection pooling at scale

2. **Citus Extension**
   - Distributed tables
   - Real-time aggregations

3. **TimescaleDB**
   - Time-series data for events
   - Continuous aggregates

4. **Logic Replication**
   - Real-time analytics
   - Zero-downtime migrations

### 12.2 Performance Targets

| Metric                | Current | Target |
| --------------------- | ------- | ------ |
| p95 Query Time        | 100ms   | 50ms   |
| Cache Hit Rate        | 90%     | 95%    |
| Connection Efficiency | 80%     | 90%    |
| Index Usage           | 85%     | 95%    |
| Table Bloat           | 20%     | 10%    |

## 13. Monitoring and Alerting

### 13.1 Key Metrics to Monitor

1. **Connection Pool**
   - Active connections
   - Pool utilization
   - Queue depth

2. **Query Performance**
   - Average query time
   - Slow query rate
   - Query per second (QPS)

3. **Cache Metrics**
   - Hit rate
   - Miss rate
   - Eviction rate

4. **Database Size**
   - Total size
   - Table growth rate
   - WAL generation

### 13.2 Alert Thresholds

- **Critical**: Database down, >90% pool utilization
- **Warning**: >100ms avg query time, >20% bloat
- **Info**: New slow query detected, index unused

## 14. Security Considerations

### 14.1 Implemented Security

- **Row Level Security** for user data
- **Column Encryption** for sensitive fields
- **Audit Logging** for all modifications
- **Connection Encryption** (SSL/TLS)

### 14.2 Best Practices

1. Use parameterized queries
2. Implement connection timeouts
3. Regular security patches
4. Principle of least privilege
5. Audit trail for sensitive operations

---

This optimization guide ensures the Space Strategy Game database can handle
production load while maintaining data integrity and performance. Regular
monitoring and maintenance are essential for continued optimal performance.

Related Documentation

๐Ÿ“š
general

ai-enhancement

Read more โ†’
๐Ÿ“š
general

alerting

Read more โ†’
๐Ÿ“š
general

beta-readiness-report

Read more โ†’

Was this page helpful?

Help us improve our documentation