# Playwright Web Proxy

A production-grade, scalable web proxy powered by Playwright with real-time DOM streaming capabilities. Built with Node.js/TypeScript and designed to handle 200+ concurrent browser sessions.

## Features

- **Browser Automation**: Uses Playwright (Chromium) for accurate page rendering
- **Real-time Streaming**: WebSocket-based DOM change streaming to clients
- **Session Management**: Persistent sessions with cookie/localStorage support
- **Device Emulation**: Desktop and mobile browsing modes
- **Anti-Bot Mitigation**: UA randomization, header spoofing, timezone spoofing
- **Security**: JWT authentication, rate limiting, IP banning
- **Scalability**: Browser pooling with autoscaling support
- **Monitoring**: Prometheus metrics and Grafana dashboards
- **Production-Ready**: Docker and Kubernetes deployment configurations

## Architecture

```
┌─────────────────┐
│   Client UI     │ (Next.js)
│   (Port 3001)   │
└────────┬────────┘
         │
         │ HTTP/WebSocket
         │
┌────────▼────────────────────────────────┐
│         Express API Server              │
│         (Port 3000)                     │
│                                         │
│  ┌──────────────────────────────────┐  │
│  │  Authentication & Rate Limiting  │  │
│  └──────────────────────────────────┘  │
│                                         │
│  ┌──────────────────────────────────┐  │
│  │     BrowserPool Manager          │  │
│  │  - Session lifecycle             │  │
│  │  - Resource management           │  │
│  │  - Cleanup & idle timeout        │  │
│  └──────────────────────────────────┘  │
│                                         │
│  ┌──────────────────────────────────┐  │
│  │   WebSocket Streaming Server     │  │
│  │  - Real-time DOM updates         │  │
│  │  - Navigation events             │  │
│  │  - User interactions             │  │
│  └──────────────────────────────────┘  │
└─────────────────────────────────────────┘
         │
         │
┌────────▼────────────────────────────────┐
│      Playwright Browser Instances       │
│  - Chromium with anti-detection         │
│  - Session isolation                    │
│  - Link rewriting                       │
└─────────────────────────────────────────┘
```

## Tech Stack

### Backend
- **Node.js 18+** with TypeScript
- **Playwright** for browser automation
- **Express.js** for REST API
- **WebSocket (ws)** for real-time streaming
- **JWT** for authentication
- **Prometheus** for metrics
- **Winston** for logging

### Frontend
- **Next.js 14** with React 18
- **TypeScript**
- **Tailwind CSS**
- **Zustand** for state management
- **Axios** for API calls

### Infrastructure
- **Docker** for containerization
- **Kubernetes** with Helm charts
- **Redis** for session storage
- **Prometheus & Grafana** for monitoring

## Prerequisites

- Node.js 18 or higher
- npm or yarn
- Docker (for containerized deployment)
- Kubernetes cluster (for production deployment)

## Quick Start

### Local Development

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd booking-proxy-claude
   ```

2. **Install dependencies**
   ```bash
   # Backend
   npm install

   # Frontend
   cd client
   npm install
   cd ..
   ```

3. **Configure environment**
   ```bash
   cp .env.example .env
   # Edit .env with your configuration
   ```

4. **Run the backend**
   ```bash
   npm run dev
   ```

5. **Run the frontend (in a separate terminal)**
   ```bash
   cd client
   npm run dev
   ```

6. **Access the application**
   - Frontend UI: http://localhost:3001
   - Backend API: http://localhost:3000
   - Health Check: http://localhost:3000/health
   - Metrics: http://localhost:3000/metrics

7. **Login with demo credentials**
   - Email: `demo@example.com`
   - Password: `demo123`

## Docker Deployment

### Using Docker Compose (Recommended for Testing)

1. **Build and start all services**
   ```bash
   docker-compose up -d
   ```

2. **View logs**
   ```bash
   docker-compose logs -f proxy
   ```

3. **Stop services**
   ```bash
   docker-compose down
   ```

### Using Docker (Standalone)

1. **Build the image**
   ```bash
   docker build -t playwright-proxy:latest .
   ```

2. **Run the container**
   ```bash
   docker run -d \
     -p 3000:3000 \
     -p 3001:3001 \
     -e SESSION_SECRET=your-secret \
     -e JWT_SECRET=your-jwt-secret \
     --name playwright-proxy \
     playwright-proxy:latest
   ```

## Kubernetes Deployment

### Prerequisites
- Kubernetes cluster (1.24+)
- Helm 3
- kubectl configured

### Deploy using Helm

1. **Create namespace**
   ```bash
   kubectl create namespace playwright-proxy
   ```

2. **Install with Helm**
   ```bash
   helm install playwright-proxy ./helm/playwright-proxy \
     --namespace playwright-proxy \
     --set config.sessionSecret=your-session-secret \
     --set config.jwtSecret=your-jwt-secret \
     --set redis.auth.password=your-redis-password
   ```

3. **Check deployment status**
   ```bash
   kubectl get pods -n playwright-proxy
   kubectl get services -n playwright-proxy
   ```

4. **Access the service**
   ```bash
   # Get the LoadBalancer IP
   kubectl get service playwright-proxy -n playwright-proxy
   ```

### Scaling for 200 Concurrent Users

1. **Adjust Helm values**
   ```yaml
   # values-production.yaml
   replicaCount: 3

   resources:
     limits:
       cpu: 4000m
       memory: 4Gi
     requests:
       cpu: 2000m
       memory: 2Gi

   autoscaling:
     enabled: true
     minReplicas: 3
     maxReplicas: 10
     targetCPUUtilizationPercentage: 70
     targetMemoryUtilizationPercentage: 80

   config:
     maxConcurrentSessions: 200
   ```

2. **Deploy with custom values**
   ```bash
   helm upgrade --install playwright-proxy ./helm/playwright-proxy \
     --namespace playwright-proxy \
     -f values-production.yaml
   ```

## Configuration

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `PORT` | HTTP server port | 3000 |
| `HOST` | Server host | 0.0.0.0 |
| `NODE_ENV` | Environment | development |
| `MAX_CONCURRENT_SESSIONS` | Maximum browser sessions | 200 |
| `BROWSER_TIMEOUT_MS` | Navigation timeout | 300000 |
| `IDLE_TIMEOUT_MS` | Session idle timeout | 600000 |
| `SESSION_SECRET` | Session encryption key | *required* |
| `JWT_SECRET` | JWT signing key | *required* |
| `REDIS_URL` | Redis connection URL | redis://localhost:6379 |
| `LOG_LEVEL` | Logging level | info |
| `METRICS_ENABLED` | Enable Prometheus metrics | true |

### Anti-Bot Configuration

| Variable | Description | Default |
|----------|-------------|---------|
| `ENABLE_HEADER_SPOOFING` | Spoof HTTP headers | true |
| `ENABLE_UA_RANDOMIZATION` | Randomize user agents | true |
| `ENABLE_TIMEZONE_SPOOFING` | Randomize timezone | true |

## API Documentation

### Authentication

#### POST `/api/auth/login`
Login and get JWT token

**Request:**
```json
{
  "email": "demo@example.com",
  "password": "demo123"
}
```

**Response:**
```json
{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}
```

### Proxy Operations

All proxy endpoints require `Authorization: Bearer <token>` header.

#### POST `/api/proxy/session/create`
Create a new browser session

**Request:**
```json
{
  "deviceMode": "desktop",
  "url": "https://www.booking.com"
}
```

**Response:**
```json
{
  "sessionId": "uuid-v4",
  "deviceMode": "desktop",
  "currentUrl": "https://www.booking.com"
}
```

#### POST `/api/proxy/session/:sessionId/navigate`
Navigate to a URL

**Request:**
```json
{
  "url": "https://www.booking.com"
}
```

#### GET `/api/proxy/session/:sessionId/html`
Get current page HTML

#### GET `/api/proxy/session/:sessionId/screenshot`
Get page screenshot (PNG)

#### DELETE `/api/proxy/session/:sessionId`
Close browser session

### WebSocket API

Connect to `ws://localhost:3000/ws`

**Authentication:**
```json
{
  "type": "auth",
  "token": "your-jwt-token"
}
```

**Subscribe to session:**
```json
{
  "type": "subscribe",
  "sessionId": "uuid-v4"
}
```

**Navigate:**
```json
{
  "type": "navigate",
  "url": "https://www.booking.com"
}
```

**Receive updates:**
```json
{
  "type": "update",
  "update": {
    "type": "navigation",
    "timestamp": 1234567890,
    "data": {...}
  }
}
```

## Monitoring & Metrics

### Prometheus Metrics

Access metrics at `http://localhost:3000/metrics`

**Available metrics:**
- `proxy_sessions_created_total` - Total sessions created
- `proxy_sessions_active` - Active sessions count
- `proxy_navigation_duration_seconds` - Navigation time histogram
- `proxy_websocket_connections_active` - Active WebSocket connections
- `proxy_auth_attempts_total` - Authentication attempts

### Grafana Dashboards

Start Grafana with monitoring profile:
```bash
docker-compose --profile monitoring up -d
```

Access Grafana at http://localhost:3002 (default password: admin)

## Security Considerations

### Production Checklist

- [ ] Change default `SESSION_SECRET` and `JWT_SECRET`
- [ ] Enable HTTPS/TLS for all endpoints
- [ ] Configure proper CORS origins
- [ ] Set up firewall rules
- [ ] Enable Redis authentication
- [ ] Implement proper user management
- [ ] Set up monitoring and alerting
- [ ] Regular security updates
- [ ] Implement backup strategy
- [ ] Configure resource limits

### Rate Limiting

The application includes built-in rate limiting:
- **API endpoints**: 100 requests per 15 minutes
- **Authentication**: 5 attempts per 15 minutes
- **Session creation**: 10 per minute

### IP Banning

Failed authentication attempts trigger automatic IP banning:
- **Threshold**: 5 failed attempts
- **Ban duration**: 1 hour (configurable)

## Performance Optimization

### Resource Requirements per Session

- **CPU**: ~100-200m per session
- **Memory**: ~100-200MB per session
- **Disk**: Minimal (logs only)

### Scaling Guidelines

For 200 concurrent sessions:
- **3-5 pods** with HPA
- **8-16GB RAM** per pod
- **4-8 CPU cores** per pod
- **Redis** with persistence enabled
- **Load balancer** with session affinity

## Troubleshooting

### Common Issues

**Issue: Browser timeout errors**
- Increase `BROWSER_TIMEOUT_MS`
- Check network connectivity
- Verify resource limits

**Issue: High memory usage**
- Reduce `MAX_CONCURRENT_SESSIONS`
- Lower `IDLE_TIMEOUT_MS` for faster cleanup
- Enable browser auto-cleanup

**Issue: WebSocket disconnections**
- Check load balancer timeout settings
- Verify WebSocket support in ingress
- Enable session affinity

### Logs

View application logs:
```bash
# Docker Compose
docker-compose logs -f proxy

# Kubernetes
kubectl logs -f deployment/playwright-proxy -n playwright-proxy

# Local
tail -f logs/app.log
```

## Development

### Project Structure

```
.
├── src/                    # Backend source code
│   ├── core/              # Core classes (BrowserPool, BrowserSession)
│   ├── routes/            # API routes
│   ├── middleware/        # Express middleware
│   ├── websocket/         # WebSocket server
│   ├── utils/             # Utilities (logger, metrics)
│   ├── config/            # Configuration
│   └── types/             # TypeScript types
├── client/                 # Frontend Next.js app
│   ├── app/               # Next.js app directory
│   ├── components/        # React components
│   └── store/             # State management
├── helm/                   # Kubernetes Helm charts
├── monitoring/            # Prometheus/Grafana configs
├── Dockerfile             # Docker configuration
└── docker-compose.yml     # Docker Compose setup
```

### Running Tests

```bash
npm test
```

### Building

```bash
# Backend
npm run build

# Frontend
cd client && npm run build
```

## Cost Optimization

### Cloud Provider Estimates (AWS)

For 200 concurrent users:
- **EC2/EKS**: 3x c5.2xlarge instances (~$300/month)
- **Load Balancer**: ~$25/month
- **Storage**: ~$10/month
- **Data Transfer**: ~$50/month
- **Total**: ~$385/month

### Optimization Tips

1. Use spot instances for non-critical workloads
2. Implement aggressive session cleanup
3. Enable browser resource limits
4. Use CDN for frontend assets
5. Implement request caching where possible

## License

MIT

## Support

For issues and questions:
- GitHub Issues: [Create an issue]
- Documentation: [Wiki]
- Email: support@example.com

## Contributing

Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request

## Acknowledgments

- Built with [Playwright](https://playwright.dev/)
- Inspired by browser automation best practices
- Community contributions welcome
