Skip to main content

Monitoring Overview

Effective monitoring is crucial for maintaining a reliable BOOP integration in production. This guide covers metrics, alerting, logging, and troubleshooting strategies.

πŸ“Š Key Metrics to Monitor

Authentication Metrics

Monitor these critical authentication metrics:
Target: > 99.5%Description: Percentage of successful authentications
const authSuccessRate = (successfulAuths / totalAuths) * 100;

// Alert if below threshold
if (authSuccessRate < 99.5) {
  triggerAlert('AUTH_SUCCESS_RATE_LOW', {
    currentRate: authSuccessRate,
    threshold: 99.5,
    severity: 'high'
  });
}
Common Causes of Low Success Rate:
  • Network connectivity issues
  • User registration problems
  • Biometric recognition failures
  • Server overload
Target: < 2 seconds (95th percentile)Description: Time from authentication request to response
const measureResponseTime = async (authFunction) => {
  const startTime = Date.now();
  const result = await authFunction();
  const responseTime = Date.now() - startTime;

  responseTimeMetric.observe(responseTime / 1000);

  if (responseTime > 3000) {
    triggerAlert('AUTH_SLOW_RESPONSE', {
      responseTime,
      threshold: 3000,
      severity: 'medium'
    });
  }

  return result;
};
Optimization Strategies:
  • Connection pooling
  • Request batching
  • Caching frequently accessed data
  • Geographic load balancing
Target: Support your peak load + 50% bufferDescription: Authentication requests per minute
const throughputCounter = new Map();

const trackThroughput = () => {
  const currentMinute = Math.floor(Date.now() / 60000);
  const count = throughputCounter.get(currentMinute) || 0;
  throughputCounter.set(currentMinute, count + 1);

  // Clean old data
  for (const [minute, _] of throughputCounter) {
    if (minute < currentMinute - 5) {
      throughputCounter.delete(minute);
    }
  }

  return count + 1;
};

Payment Metrics

Critical payment-related metrics:
Target: > 99.9%
const paymentMetrics = {
  successful: 0,
  failed: 0,
  total: 0
};

const trackPayment = (success, amount, reason) => {
  paymentMetrics.total++;

  if (success) {
    paymentMetrics.successful++;
  } else {
    paymentMetrics.failed++;

    // Track failure reasons
    paymentFailureReasons[reason] = (paymentFailureReasons[reason] || 0) + 1;
  }

  const successRate = (paymentMetrics.successful / paymentMetrics.total) * 100;

  if (successRate < 99.9) {
    triggerAlert('PAYMENT_SUCCESS_RATE_LOW', {
      successRate,
      totalPayments: paymentMetrics.total,
      failedPayments: paymentMetrics.failed
    });
  }
};
Monitor: Total transaction value, average transaction size
const revenueMetrics = {
  totalValue: 0,
  transactionCount: 0,
  averageValue: 0
};

const trackRevenue = (transactionValue) => {
  revenueMetrics.totalValue += transactionValue;
  revenueMetrics.transactionCount++;
  revenueMetrics.averageValue = revenueMetrics.totalValue / revenueMetrics.transactionCount;

  // Track hourly revenue
  const hour = Math.floor(Date.now() / 3600000);
  hourlyRevenue[hour] = (hourlyRevenue[hour] || 0) + transactionValue;
};

Infrastructure Metrics

Monitor underlying infrastructure health:
const websocketMetrics = {
  activeConnections: 0,
  totalConnections: 0,
  reconnections: 0,
  failures: 0
};

const monitorWebSocket = (ws) => {
  websocketMetrics.activeConnections++;
  websocketMetrics.totalConnections++;

  ws.on('close', (code, reason) => {
    websocketMetrics.activeConnections--;

    if (code !== 1000) { // Not normal closure
      websocketMetrics.failures++;

      triggerAlert('WEBSOCKET_ABNORMAL_CLOSURE', {
        code,
        reason,
        activeConnections: websocketMetrics.activeConnections
      });
    }
  });

  ws.on('error', (error) => {
    websocketMetrics.failures++;
    triggerAlert('WEBSOCKET_ERROR', { error: error.message });
  });
};
const apiMetrics = {
  requests: 0,
  errors: 0,
  timeouts: 0,
  rateLimited: 0
};

const monitorApiCall = async (apiCall) => {
  apiMetrics.requests++;

  try {
    const response = await apiCall();

    if (response.status === 429) {
      apiMetrics.rateLimited++;
    }

    return response;
  } catch (error) {
    if (error.name === 'TimeoutError') {
      apiMetrics.timeouts++;
    } else {
      apiMetrics.errors++;
    }

    throw error;
  }
};

🚨 Alerting Strategy

Alert Severity Levels

Configure different response procedures for different severity levels.
SeverityResponse TimeEscalationExample Triggers
CriticalImmediatePage on-call engineer> 5% error rate, Service down
HighWithin 15 minutesSlack + email> 1% error rate, Slow response
MediumWithin 1 hourEmailUnusual patterns, Warnings
LowNext business dayDashboardPerformance degradation

Alert Configuration

const PagerDuty = require('pagerduty');

class AlertManager {
  constructor() {
    this.pagerduty = new PagerDuty({
      token: process.env.PAGERDUTY_TOKEN
    });
  }

  async triggerAlert(alertType, data, severity = 'medium') {
    const alert = {
      type: alertType,
      severity,
      timestamp: new Date().toISOString(),
      data: this.sanitizeData(data),
      environment: 'production',
      service: 'boop-integration'
    };

    // Log locally
    console.error(`🚨 ALERT [${severity.toUpperCase()}]: ${alertType}`, alert);

    // Send to monitoring systems
    await this.sendToPagerDuty(alert);
    await this.sendToSlack(alert);
    await this.logToDatabase(alert);

    return alert;
  }

  async sendToPagerDuty(alert) {
    if (alert.severity === 'critical' || alert.severity === 'high') {
      await this.pagerduty.incidents.create({
        incident: {
          type: 'incident',
          title: `BOOP ${alert.type}`,
          service: { id: process.env.PAGERDUTY_SERVICE_ID },
          urgency: alert.severity === 'critical' ? 'high' : 'low',
          body: {
            type: 'incident_body',
            details: JSON.stringify(alert, null, 2)
          }
        }
      });
    }
  }
}

Common Alert Scenarios

Service Completely Down
// Trigger: All requests failing for > 1 minute
if (errorRate > 95 && timeWindow > 60000) {
  triggerAlert('SERVICE_DOWN', {
    errorRate,
    duration: timeWindow,
    lastSuccessfulRequest: lastSuccess
  }, 'critical');
}
Mass Payment Failures
// Trigger: >10 consecutive payment failures
if (consecutivePaymentFailures > 10) {
  triggerAlert('PAYMENT_SYSTEM_FAILURE', {
    consecutiveFailures: consecutivePaymentFailures,
    totalFailures: totalPaymentFailures,
    timeWindow: '5 minutes'
  }, 'critical');
}
High Error Rate
// Trigger: Error rate > 1% for > 5 minutes
if (errorRate > 1 && sustainedDuration > 300000) {
  triggerAlert('HIGH_ERROR_RATE', {
    currentErrorRate: errorRate,
    threshold: 1,
    duration: sustainedDuration,
    errorBreakdown: getErrorBreakdown()
  }, 'high');
}
Slow Response Times
// Trigger: 95th percentile > 5 seconds
if (responseTime95th > 5000) {
  triggerAlert('SLOW_RESPONSE_TIME', {
    p95ResponseTime: responseTime95th,
    p99ResponseTime: responseTime99th,
    avgResponseTime: avgResponseTime
  }, 'high');
}

πŸ“ˆ Dashboard Configuration

Executive Dashboard

High-level business metrics for stakeholders:
// Sample Grafana dashboard configuration
const executiveDashboard = {
  title: "BOOP Integration - Executive View",
  panels: [
    {
      title: "Authentication Success Rate",
      type: "stat",
      target: "99.8%",
      query: "auth_success_rate_1h"
    },
    {
      title: "Revenue (Last 24h)",
      type: "stat",
      format: "currency",
      query: "sum(payment_value_24h)"
    },
    {
      title: "Active Users",
      type: "graph",
      query: "unique_users_per_hour"
    },
    {
      title: "Service Uptime",
      type: "stat",
      target: "99.9%",
      query: "uptime_percentage_7d"
    }
  ]
};

Operations Dashboard

Technical metrics for engineering teams:
const operationsDashboard = {
  title: "BOOP Integration - Operations",
  panels: [
    {
      title: "Response Time Distribution",
      type: "heatmap",
      query: "auth_response_time_histogram"
    },
    {
      title: "Error Rate by Type",
      type: "pie",
      query: "error_count_by_type_1h"
    },
    {
      title: "WebSocket Connections",
      type: "graph",
      query: "websocket_active_connections"
    },
    {
      title: "API Rate Limits",
      type: "gauge",
      query: "api_rate_limit_usage"
    }
  ]
};

πŸ” Log Analysis

Structured Logging

Implement consistent log structure for analysis:
const logger = require('winston');

class BoopLogger {
  static logAuthentication(context, result, timing) {
    const logEntry = {
      event: 'boop_authentication',
      timestamp: new Date().toISOString(),
      userId: context.userId,
      vendorId: context.vendorId,
      contextId: context.contextId,
      authType: context.type,
      success: result.success,
      errorCode: result.errorCode,
      responseTime: timing.responseTime,
      attributes: result.attributes?.length || 0,
      amount: context.amount,
      environment: process.env.NODE_ENV
    };

    if (result.success) {
      logger.info('Authentication successful', logEntry);
    } else {
      logger.warn('Authentication failed', logEntry);
    }

    // Send to log aggregation service
    this.sendToLogService(logEntry);
  }

  static logError(error, context) {
    const logEntry = {
      event: 'boop_error',
      timestamp: new Date().toISOString(),
      errorMessage: error.message,
      errorCode: error.code,
      stack: error.stack,
      context: this.sanitizeContext(context),
      severity: this.determineSeverity(error),
      environment: process.env.NODE_ENV
    };

    logger.error('BOOP error occurred', logEntry);
    this.sendToLogService(logEntry);
  }
}

Log Aggregation Queries

Common queries for troubleshooting:
-- Find authentication failures by error code (last 24h)
SELECT
    errorCode,
    COUNT(*) as failures,
    (COUNT(*) * 100.0 / SUM(COUNT(*)) OVER()) as percentage
FROM boop_logs
WHERE event = 'boop_authentication'
    AND success = false
    AND timestamp > NOW() - INTERVAL '24 HOURS'
GROUP BY errorCode
ORDER BY failures DESC;

πŸ› οΈ Troubleshooting Playbooks

Authentication Issues

Symptoms: Error rate > 1%, user complaintsInvestigation Steps:
  1. Check error code distribution in logs
  2. Verify BOOP service status
  3. Test with known good user
  4. Check network connectivity
  5. Verify API credentials
Common Solutions:
// 1. Check service health
const health = await fetch('https://app.boop.it/health');
console.log('BOOP service health:', await health.json());

// 2. Verify credentials
if (!apiKey.startsWith('sk_live_')) {
  console.error('Invalid production API key format');
}

// 3. Test basic connectivity
const testAuth = await authenticateUser({
  type: 'entrance',
  attributes: ['pseudonym']
});

// 4. Check for rate limiting
if (response.status === 429) {
  console.warn('Rate limited - implement exponential backoff');
}
Symptoms: Authentication taking >3 secondsInvestigation Steps:
  1. Check network latency to BOOP servers
  2. Review connection pooling configuration
  3. Analyze request/response sizes
  4. Monitor server resource usage
Optimization Solutions:
// 1. Implement connection pooling
const connectionPool = new WebSocketPool({
  maxConnections: 5,
  reuseConnections: true,
  keepAliveInterval: 30000
});

// 2. Add request timeout
const authWithTimeout = Promise.race([
  authenticateUser(context),
  new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Timeout')), 10000)
  )
]);

// 3. Implement caching for frequent requests
const userCache = new LRUCache({ max: 1000, ttl: 300000 });

Payment Issues

Investigation Steps:
  1. Check user balance via API
  2. Verify transaction ID calculation
  3. Review payment context data
  4. Check for duplicate transactions
Diagnostic Code:
const diagnosePament = async (userId, amount, transactionId) => {
  // 1. Check user balance
  const balance = await getUserBalance(userId);
  console.log(`User balance: $${balance / 100}, Required: $${amount / 100}`);

  // 2. Verify transaction status
  const txStatus = await getTransactionStatus(transactionId);
  console.log('Transaction status:', txStatus);

  // 3. Check recent transactions
  const recentTx = await getTransactionHistory(userId, Date.now() - 3600000);
  console.log(`Recent transactions: ${recentTx.length}`);

  return {
    sufficientFunds: balance >= amount,
    transactionExists: txStatus.found,
    recentTransactionCount: recentTx.length
  };
};

πŸ“± Mobile & Real-Time Monitoring

Mobile Alerts

Configure mobile notifications for critical issues:
const PushNotification = require('pushover-notifications');

class MobileAlerter {
  constructor() {
    this.pushover = new PushNotification({
      user: process.env.PUSHOVER_USER_KEY,
      token: process.env.PUSHOVER_APP_TOKEN
    });
  }

  async sendCriticalAlert(alert) {
    if (alert.severity !== 'critical') return;

    const message = {
      message: `🚨 CRITICAL: ${alert.type}`,
      title: 'BOOP Production Alert',
      sound: 'siren',
      priority: 2, // Emergency priority
      retry: 300, // Retry every 5 minutes
      expire: 3600 // Expire after 1 hour
    };

    await this.pushover.send(message);
  }
}

Real-Time Dashboard

Create a real-time monitoring dashboard:
<!DOCTYPE html>
<html>
<head>
    <title>BOOP Live Monitor</title>
    <script src="https://cdn.socket.io/4.0.0/socket.io.min.js"></script>
</head>
<body>
    <div id="metrics">
        <div class="metric">
            <h3>Success Rate</h3>
            <span id="successRate">--</span>%
        </div>
        <div class="metric">
            <h3>Response Time</h3>
            <span id="responseTime">--</span>ms
        </div>
        <div class="metric">
            <h3>Active Users</h3>
            <span id="activeUsers">--</span>
        </div>
    </div>

    <script>
        const socket = io();

        socket.on('metrics', (data) => {
            document.getElementById('successRate').textContent = data.successRate.toFixed(1);
            document.getElementById('responseTime').textContent = data.avgResponseTime;
            document.getElementById('activeUsers').textContent = data.activeUsers;
        });

        socket.on('alert', (alert) => {
            if (alert.severity === 'critical') {
                document.body.style.backgroundColor = '#ff0000';
                setTimeout(() => document.body.style.backgroundColor = '', 1000);
            }
        });
    </script>
</body>
</html>

🎯 Success Metrics

Track these KPIs to measure monitoring effectiveness:

MTTR

Mean Time to Recovery: Target < 15 minutesTime from alert to issue resolution

MTTD

Mean Time to Detection: Target < 2 minutesTime from issue occurrence to alert

SLA Compliance

Service Level Agreement: Target > 99.9%Percentage of time service meets SLA requirements

False Alert Rate

Alert Accuracy: Target < 5%Percentage of alerts that were false positives

πŸ“ž Support & Escalation

Contact Information

Escalation Matrix

Issue TypeFirst ContactEscalation (30 min)Final Escalation (1 hour)
Service DownOn-call engineerEngineering managerCTO
Payment IssuesPayment teamFinance directorCFO
Security IssuesSecurity teamSecurity officerCISO
Data IssuesData teamData engineering leadChief Data Officer

Production Support Ready!

Your monitoring and alerting setup is now complete. You’re ready to maintain a highly reliable BOOP integration in production.Remember: Proactive monitoring prevents issues before they impact users.