Monitoring Overview
Effective monitoring is crucial for maintaining a reliable BOOP integration in production. This guide covers metrics, alerting, logging, and troubleshooting strategies.
π Key Metrics to Monitor
Authentication Metrics
Monitor these critical authentication metrics:
Target : > 99.5%Description : Percentage of successful authenticationsconst authSuccessRate = ( successfulAuths / totalAuths ) * 100 ;
// Alert if below threshold
if ( authSuccessRate < 99.5 ) {
triggerAlert ( 'AUTH_SUCCESS_RATE_LOW' , {
currentRate: authSuccessRate ,
threshold: 99.5 ,
severity: 'high'
});
}
Common Causes of Low Success Rate :
Network connectivity issues
User registration problems
Biometric recognition failures
Server overload
Target : < 2 seconds (95th percentile)Description : Time from authentication request to responseconst measureResponseTime = async ( authFunction ) => {
const startTime = Date . now ();
const result = await authFunction ();
const responseTime = Date . now () - startTime ;
responseTimeMetric . observe ( responseTime / 1000 );
if ( responseTime > 3000 ) {
triggerAlert ( 'AUTH_SLOW_RESPONSE' , {
responseTime ,
threshold: 3000 ,
severity: 'medium'
});
}
return result ;
};
Optimization Strategies :
Connection pooling
Request batching
Caching frequently accessed data
Geographic load balancing
Target : Support your peak load + 50% bufferDescription : Authentication requests per minuteconst throughputCounter = new Map ();
const trackThroughput = () => {
const currentMinute = Math . floor ( Date . now () / 60000 );
const count = throughputCounter . get ( currentMinute ) || 0 ;
throughputCounter . set ( currentMinute , count + 1 );
// Clean old data
for ( const [ minute , _ ] of throughputCounter ) {
if ( minute < currentMinute - 5 ) {
throughputCounter . delete ( minute );
}
}
return count + 1 ;
};
Payment Metrics
Critical payment-related metrics:
π° Payment Success Rate
Target : > 99.9%const paymentMetrics = {
successful: 0 ,
failed: 0 ,
total: 0
};
const trackPayment = ( success , amount , reason ) => {
paymentMetrics . total ++ ;
if ( success ) {
paymentMetrics . successful ++ ;
} else {
paymentMetrics . failed ++ ;
// Track failure reasons
paymentFailureReasons [ reason ] = ( paymentFailureReasons [ reason ] || 0 ) + 1 ;
}
const successRate = ( paymentMetrics . successful / paymentMetrics . total ) * 100 ;
if ( successRate < 99.9 ) {
triggerAlert ( 'PAYMENT_SUCCESS_RATE_LOW' , {
successRate ,
totalPayments: paymentMetrics . total ,
failedPayments: paymentMetrics . failed
});
}
};
Monitor : Total transaction value, average transaction sizeconst revenueMetrics = {
totalValue: 0 ,
transactionCount: 0 ,
averageValue: 0
};
const trackRevenue = ( transactionValue ) => {
revenueMetrics . totalValue += transactionValue ;
revenueMetrics . transactionCount ++ ;
revenueMetrics . averageValue = revenueMetrics . totalValue / revenueMetrics . transactionCount ;
// Track hourly revenue
const hour = Math . floor ( Date . now () / 3600000 );
hourlyRevenue [ hour ] = ( hourlyRevenue [ hour ] || 0 ) + transactionValue ;
};
Infrastructure Metrics
Monitor underlying infrastructure health:
const websocketMetrics = {
activeConnections: 0 ,
totalConnections: 0 ,
reconnections: 0 ,
failures: 0
};
const monitorWebSocket = ( ws ) => {
websocketMetrics . activeConnections ++ ;
websocketMetrics . totalConnections ++ ;
ws . on ( 'close' , ( code , reason ) => {
websocketMetrics . activeConnections -- ;
if ( code !== 1000 ) { // Not normal closure
websocketMetrics . failures ++ ;
triggerAlert ( 'WEBSOCKET_ABNORMAL_CLOSURE' , {
code ,
reason ,
activeConnections: websocketMetrics . activeConnections
});
}
});
ws . on ( 'error' , ( error ) => {
websocketMetrics . failures ++ ;
triggerAlert ( 'WEBSOCKET_ERROR' , { error: error . message });
});
};
const apiMetrics = {
requests: 0 ,
errors: 0 ,
timeouts: 0 ,
rateLimited: 0
};
const monitorApiCall = async ( apiCall ) => {
apiMetrics . requests ++ ;
try {
const response = await apiCall ();
if ( response . status === 429 ) {
apiMetrics . rateLimited ++ ;
}
return response ;
} catch ( error ) {
if ( error . name === 'TimeoutError' ) {
apiMetrics . timeouts ++ ;
} else {
apiMetrics . errors ++ ;
}
throw error ;
}
};
π¨ Alerting Strategy
Alert Severity Levels
Configure different response procedures for different severity levels.
Severity Response Time Escalation Example Triggers Critical Immediate Page on-call engineer > 5% error rate, Service down High Within 15 minutes Slack + email > 1% error rate, Slow response Medium Within 1 hour Email Unusual patterns, Warnings Low Next business day Dashboard Performance degradation
Alert Configuration
PagerDuty Integration
Slack Integration
const PagerDuty = require ( 'pagerduty' );
class AlertManager {
constructor () {
this . pagerduty = new PagerDuty ({
token: process . env . PAGERDUTY_TOKEN
});
}
async triggerAlert ( alertType , data , severity = 'medium' ) {
const alert = {
type: alertType ,
severity ,
timestamp: new Date (). toISOString (),
data: this . sanitizeData ( data ),
environment: 'production' ,
service: 'boop-integration'
};
// Log locally
console . error ( `π¨ ALERT [ ${ severity . toUpperCase () } ]: ${ alertType } ` , alert );
// Send to monitoring systems
await this . sendToPagerDuty ( alert );
await this . sendToSlack ( alert );
await this . logToDatabase ( alert );
return alert ;
}
async sendToPagerDuty ( alert ) {
if ( alert . severity === 'critical' || alert . severity === 'high' ) {
await this . pagerduty . incidents . create ({
incident: {
type: 'incident' ,
title: `BOOP ${ alert . type } ` ,
service: { id: process . env . PAGERDUTY_SERVICE_ID },
urgency: alert . severity === 'critical' ? 'high' : 'low' ,
body: {
type: 'incident_body' ,
details: JSON . stringify ( alert , null , 2 )
}
}
});
}
}
}
Common Alert Scenarios
Service Completely Down // Trigger: All requests failing for > 1 minute
if ( errorRate > 95 && timeWindow > 60000 ) {
triggerAlert ( 'SERVICE_DOWN' , {
errorRate ,
duration: timeWindow ,
lastSuccessfulRequest: lastSuccess
}, 'critical' );
}
Mass Payment Failures // Trigger: >10 consecutive payment failures
if ( consecutivePaymentFailures > 10 ) {
triggerAlert ( 'PAYMENT_SYSTEM_FAILURE' , {
consecutiveFailures: consecutivePaymentFailures ,
totalFailures: totalPaymentFailures ,
timeWindow: '5 minutes'
}, 'critical' );
}
High Error Rate // Trigger: Error rate > 1% for > 5 minutes
if ( errorRate > 1 && sustainedDuration > 300000 ) {
triggerAlert ( 'HIGH_ERROR_RATE' , {
currentErrorRate: errorRate ,
threshold: 1 ,
duration: sustainedDuration ,
errorBreakdown: getErrorBreakdown ()
}, 'high' );
}
Slow Response Times // Trigger: 95th percentile > 5 seconds
if ( responseTime95th > 5000 ) {
triggerAlert ( 'SLOW_RESPONSE_TIME' , {
p95ResponseTime: responseTime95th ,
p99ResponseTime: responseTime99th ,
avgResponseTime: avgResponseTime
}, 'high' );
}
π Dashboard Configuration
Executive Dashboard
High-level business metrics for stakeholders:
// Sample Grafana dashboard configuration
const executiveDashboard = {
title: "BOOP Integration - Executive View" ,
panels: [
{
title: "Authentication Success Rate" ,
type: "stat" ,
target: "99.8%" ,
query: "auth_success_rate_1h"
},
{
title: "Revenue (Last 24h)" ,
type: "stat" ,
format: "currency" ,
query: "sum(payment_value_24h)"
},
{
title: "Active Users" ,
type: "graph" ,
query: "unique_users_per_hour"
},
{
title: "Service Uptime" ,
type: "stat" ,
target: "99.9%" ,
query: "uptime_percentage_7d"
}
]
};
Operations Dashboard
Technical metrics for engineering teams:
const operationsDashboard = {
title: "BOOP Integration - Operations" ,
panels: [
{
title: "Response Time Distribution" ,
type: "heatmap" ,
query: "auth_response_time_histogram"
},
{
title: "Error Rate by Type" ,
type: "pie" ,
query: "error_count_by_type_1h"
},
{
title: "WebSocket Connections" ,
type: "graph" ,
query: "websocket_active_connections"
},
{
title: "API Rate Limits" ,
type: "gauge" ,
query: "api_rate_limit_usage"
}
]
};
π Log Analysis
Structured Logging
Implement consistent log structure for analysis:
const logger = require ( 'winston' );
class BoopLogger {
static logAuthentication ( context , result , timing ) {
const logEntry = {
event: 'boop_authentication' ,
timestamp: new Date (). toISOString (),
userId: context . userId ,
vendorId: context . vendorId ,
contextId: context . contextId ,
authType: context . type ,
success: result . success ,
errorCode: result . errorCode ,
responseTime: timing . responseTime ,
attributes: result . attributes ?. length || 0 ,
amount: context . amount ,
environment: process . env . NODE_ENV
};
if ( result . success ) {
logger . info ( 'Authentication successful' , logEntry );
} else {
logger . warn ( 'Authentication failed' , logEntry );
}
// Send to log aggregation service
this . sendToLogService ( logEntry );
}
static logError ( error , context ) {
const logEntry = {
event: 'boop_error' ,
timestamp: new Date (). toISOString (),
errorMessage: error . message ,
errorCode: error . code ,
stack: error . stack ,
context: this . sanitizeContext ( context ),
severity: this . determineSeverity ( error ),
environment: process . env . NODE_ENV
};
logger . error ( 'BOOP error occurred' , logEntry );
this . sendToLogService ( logEntry );
}
}
Log Aggregation Queries
Common queries for troubleshooting:
Authentication Failures
Slow Requests
User Journey Analysis
-- Find authentication failures by error code (last 24h)
SELECT
errorCode,
COUNT ( * ) as failures,
( COUNT ( * ) * 100 . 0 / SUM ( COUNT ( * )) OVER ()) as percentage
FROM boop_logs
WHERE event = 'boop_authentication'
AND success = false
AND timestamp > NOW () - INTERVAL '24 HOURS'
GROUP BY errorCode
ORDER BY failures DESC ;
π οΈ Troubleshooting Playbooks
Authentication Issues
π« High Authentication Failure Rate
Symptoms : Error rate > 1%, user complaintsInvestigation Steps :
Check error code distribution in logs
Verify BOOP service status
Test with known good user
Check network connectivity
Verify API credentials
Common Solutions :// 1. Check service health
const health = await fetch ( 'https://app.boop.it/health' );
console . log ( 'BOOP service health:' , await health . json ());
// 2. Verify credentials
if ( ! apiKey . startsWith ( 'sk_live_' )) {
console . error ( 'Invalid production API key format' );
}
// 3. Test basic connectivity
const testAuth = await authenticateUser ({
type: 'entrance' ,
attributes: [ 'pseudonym' ]
});
// 4. Check for rate limiting
if ( response . status === 429 ) {
console . warn ( 'Rate limited - implement exponential backoff' );
}
Symptoms : Authentication taking >3 secondsInvestigation Steps :
Check network latency to BOOP servers
Review connection pooling configuration
Analyze request/response sizes
Monitor server resource usage
Optimization Solutions :// 1. Implement connection pooling
const connectionPool = new WebSocketPool ({
maxConnections: 5 ,
reuseConnections: true ,
keepAliveInterval: 30000
});
// 2. Add request timeout
const authWithTimeout = Promise . race ([
authenticateUser ( context ),
new Promise (( _ , reject ) =>
setTimeout (() => reject ( new Error ( 'Timeout' )), 10000 )
)
]);
// 3. Implement caching for frequent requests
const userCache = new LRUCache ({ max: 1000 , ttl: 300000 });
Payment Issues
Investigation Steps :
Check user balance via API
Verify transaction ID calculation
Review payment context data
Check for duplicate transactions
Diagnostic Code :const diagnosePament = async ( userId , amount , transactionId ) => {
// 1. Check user balance
const balance = await getUserBalance ( userId );
console . log ( `User balance: $ ${ balance / 100 } , Required: $ ${ amount / 100 } ` );
// 2. Verify transaction status
const txStatus = await getTransactionStatus ( transactionId );
console . log ( 'Transaction status:' , txStatus );
// 3. Check recent transactions
const recentTx = await getTransactionHistory ( userId , Date . now () - 3600000 );
console . log ( `Recent transactions: ${ recentTx . length } ` );
return {
sufficientFunds: balance >= amount ,
transactionExists: txStatus . found ,
recentTransactionCount: recentTx . length
};
};
π± Mobile & Real-Time Monitoring
Mobile Alerts
Configure mobile notifications for critical issues:
const PushNotification = require ( 'pushover-notifications' );
class MobileAlerter {
constructor () {
this . pushover = new PushNotification ({
user: process . env . PUSHOVER_USER_KEY ,
token: process . env . PUSHOVER_APP_TOKEN
});
}
async sendCriticalAlert ( alert ) {
if ( alert . severity !== 'critical' ) return ;
const message = {
message: `π¨ CRITICAL: ${ alert . type } ` ,
title: 'BOOP Production Alert' ,
sound: 'siren' ,
priority: 2 , // Emergency priority
retry: 300 , // Retry every 5 minutes
expire: 3600 // Expire after 1 hour
};
await this . pushover . send ( message );
}
}
Real-Time Dashboard
Create a real-time monitoring dashboard:
<! DOCTYPE html >
< html >
< head >
< title > BOOP Live Monitor </ title >
< script src = "https://cdn.socket.io/4.0.0/socket.io.min.js" ></ script >
</ head >
< body >
< div id = "metrics" >
< div class = "metric" >
< h3 > Success Rate </ h3 >
< span id = "successRate" > -- </ span > %
</ div >
< div class = "metric" >
< h3 > Response Time </ h3 >
< span id = "responseTime" > -- </ span > ms
</ div >
< div class = "metric" >
< h3 > Active Users </ h3 >
< span id = "activeUsers" > -- </ span >
</ div >
</ div >
< script >
const socket = io ();
socket . on ( 'metrics' , ( data ) => {
document . getElementById ( 'successRate' ). textContent = data . successRate . toFixed ( 1 );
document . getElementById ( 'responseTime' ). textContent = data . avgResponseTime ;
document . getElementById ( 'activeUsers' ). textContent = data . activeUsers ;
});
socket . on ( 'alert' , ( alert ) => {
if ( alert . severity === 'critical' ) {
document . body . style . backgroundColor = '#ff0000' ;
setTimeout (() => document . body . style . backgroundColor = '' , 1000 );
}
});
</ script >
</ body >
</ html >
π― Success Metrics
Track these KPIs to measure monitoring effectiveness:
MTTR Mean Time to Recovery : Target < 15 minutesTime from alert to issue resolution
MTTD Mean Time to Detection : Target < 2 minutesTime from issue occurrence to alert
SLA Compliance Service Level Agreement : Target > 99.9%Percentage of time service meets SLA requirements
False Alert Rate Alert Accuracy : Target < 5%Percentage of alerts that were false positives
π Support & Escalation
Escalation Matrix
Issue Type First Contact Escalation (30 min) Final Escalation (1 hour) Service Down On-call engineer Engineering manager CTO Payment Issues Payment team Finance director CFO Security Issues Security team Security officer CISO Data Issues Data team Data engineering lead Chief Data Officer
Production Support Ready! Your monitoring and alerting setup is now complete. Youβre ready to maintain a highly reliable BOOP integration in production. Remember: Proactive monitoring prevents issues before they impact users.