deployment
devops
docker
monitoreo
operaciones
Guía de Operaciones - Calmia Nexus
Audiencia: DevOps, SRE, Administradores de Sistemas
Última actualización: Febrero 2026
Versión: 1.0.0
Índice
Arquitectura de Infraestructura
Servicios y Componentes
Docker y Contenedores
Variables de Entorno
Base de Datos PostgreSQL
Message Broker RabbitMQ
Monitoreo y Observabilidad
Health Checks
Deployment a Producción
Runbooks Operativos
Troubleshooting
Backup y Recovery
Seguridad Operativa
1. Arquitectura de Infraestructura
1.1 Diagrama de Arquitectura
graph TB
subgraph "Internet"
USER[Usuarios]
WH_SENTRY[Webhooks Sentry]
WH_WA[Webhooks WhatsApp]
WH_CU[Webhooks ClickUp]
end
subgraph "Cloudflare"
CF[Cloudflare DNS + WAF]
end
subgraph "Producción - ns3032187"
subgraph "IIS Web Server"
ADMIN[Admin UI<br/>app.kalmiazen.com<br/>:443]
API[API REST<br/>api.kalmiazen.com<br/>:443]
MCP[MCP Server<br/>mcp.kalmiazen.com<br/>:443]
end
subgraph "Servicios Locales"
WORKERS[Orchestrator.Workers<br/>Background Jobs]
WA_SVC[WhatsApp Service<br/>:8000]
end
subgraph "Infraestructura"
PG[(PostgreSQL 16<br/>:5432)]
RMQ[(RabbitMQ<br/>:5672/:15672)]
end
subgraph "Tunnels"
NGROK[ngrok<br/>Webhooks]
end
end
subgraph "Servicios Externos"
CLAUDE[Anthropic Claude API]
CLICKUP[ClickUp API]
SENTRY[Sentry.io]
ULTRAMSG[UltraMsg WhatsApp]
HOLDED[Holded API]
end
USER --> CF --> ADMIN & API & MCP
WH_SENTRY --> NGROK --> API
WH_WA --> NGROK --> WA_SVC
WH_CU --> API
ADMIN --> API
API --> PG & RMQ & WORKERS
WORKERS --> PG & RMQ
WA_SVC --> PG & RMQ
API --> CLAUDE & CLICKUP & SENTRY & HOLDED
WA_SVC --> ULTRAMSG
1.2 Stack Tecnológico
Capa
Tecnología
Versión
Puerto
Web Server
IIS 10
Windows Server 2019
80/443
Runtime
.NET 9
9.0.x
-
Database
PostgreSQL
16 Alpine
5432
Message Broker
RabbitMQ
3-management
5672/15672
Reverse Proxy
Cloudflare
-
-
Containers
Docker Desktop
4.x
-
Tunneling
ngrok
3.x
-
1.3 URLs de Producción
Servicio
URL
Descripción
Admin UI
https://app.kalmiazen.com
Panel de administración Blazor
API REST
https://api.kalmiazen.com
Endpoints de la API
API Swagger
https://api.kalmiazen.com/swagger
Documentación OpenAPI
MCP Server
https://mcp.kalmiazen.com
Model Context Protocol
MCP Swagger
https://mcp.kalmiazen.com/swagger
Documentación MCP
RabbitMQ UI
http://localhost:15672
Management (solo local)
2. Servicios y Componentes
2.1 Mapa de Servicios
graph LR
subgraph "Frontend"
A[Orchestrator.Admin<br/>Blazor SSR]
end
subgraph "Backend"
B[Orchestrator.Api<br/>REST + SignalR]
C[Orchestrator.Workers<br/>Background Jobs]
D[Orchestrator.Mcp.Remote<br/>MCP Server]
end
subgraph "Messaging"
E[WhatsappService<br/>UltraMsg Integration]
end
subgraph "Shared"
F[Shared.Admin<br/>Entities + Services]
G[Shared.Events<br/>DTOs + Events]
end
A --> B
B --> C
B --> F & G
C --> F & G
D --> B
E --> F & G
2.2 Responsabilidades por Servicio
Servicio
Responsabilidad
Dependencias
Orchestrator.Admin
UI Blazor, Workspace, Chat
API
Orchestrator.Api
REST endpoints, SignalR hubs, Auth
PostgreSQL, RabbitMQ
Orchestrator.Workers
Jobs en background, sincronización
PostgreSQL, RabbitMQ
Orchestrator.Mcp.Remote
MCP OAuth2, Tools, Resources
API
WhatsappService
Webhooks UltraMsg, mensajes
PostgreSQL, RabbitMQ
2.3 Puertos por Entorno
Servicio
Desarrollo
Docker
Producción (IIS)
Admin
5002
-
443 (app.kalmiazen.com)
API
5001
-
443 (api.kalmiazen.com)
MCP
5003
5003
443 (mcp.kalmiazen.com)
Workers
-
-
Background
WhatsApp
5004
8000
8000
PostgreSQL
5432
5432
5432
RabbitMQ
5672/15672
5672/15672
5672/15672
3. Docker y Contenedores
3.1 Docker Compose - Desarrollo
# docker-compose.yml
version : "3.9"
services :
rabbitmq :
image : rabbitmq:3-management
container_name : rabbitmq
ports :
- "5672:5672" # AMQP
- "15672:15672" # Management UI
environment :
- RABBITMQ_DEFAULT_USER=${RABBITMQ_DEFAULT_USER:-admin}
- RABBITMQ_DEFAULT_PASS=${RABBITMQ_DEFAULT_PASS:-admin}
volumes :
- rabbitmq_data:/var/lib/rabbitmq
healthcheck :
test : [ "CMD" , "rabbitmq-diagnostics" , "check_running" ]
interval : 5s
timeout : 5s
retries : 10
start_period : 30s
postgres :
image : postgres:16-alpine
container_name : nexus-db
ports :
- "5432:5432"
environment :
- POSTGRES_USER=${POSTGRES_USER:-nexus}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-nexus_dev_2026}
- POSTGRES_DB=${POSTGRES_DB:-nexus}
volumes :
- postgres_data:/var/lib/postgresql/data
healthcheck :
test : [ "CMD-SHELL" , "pg_isready -U nexus -d nexus" ]
interval : 5s
timeout : 5s
retries : 5
whatsapp-service :
build :
context : ./Messaging/WhatsappService
dockerfile : Dockerfile
container_name : whatsapp-service
ports :
- "8000:80"
environment :
- ASPNETCORE_ENVIRONMENT=Docker
- ASPNETCORE_URLS=http://+:80
- ConnectionStrings__NexusDb=${ConnectionStrings__NexusDb}
- RabbitMq__ConnectionString=${RabbitMq__ConnectionString}
- UltraMsg__InstanceId=${UltraMsg__InstanceId}
- UltraMsg__Token=${UltraMsg__Token}
- Sentry__Dsn=${Sentry__Dsn}
depends_on :
rabbitmq :
condition : service_healthy
postgres :
condition : service_healthy
healthcheck :
test : [ "CMD" , "curl" , "-f" , "http://localhost:80/health" ]
interval : 10s
timeout : 5s
retries : 3
start_period : 10s
volumes :
- ./logs/whatsapp:/app/logs
mcp-remote :
build :
context : ./Orchestrator/src/Orchestrator.Mcp.Remote
dockerfile : Dockerfile
container_name : mcp-remote
profiles :
- mcp
ports :
- "5003:5003"
environment :
- ASPNETCORE_ENVIRONMENT=Docker
- ASPNETCORE_URLS=http://+:5003
- NEXUS_MCP_REMOTE_Orchestrator__BaseUrl=http://host.docker.internal:5001
- NEXUS_MCP_REMOTE_OAuth2__SigningKey=${OAuth2__SigningKey}
- NEXUS_MCP_REMOTE_OAuth2__Issuer=https://mcp.kalmiazen.com
healthcheck :
test : [ "CMD" , "curl" , "-f" , "http://localhost:5003/health" ]
interval : 10s
timeout : 5s
retries : 3
start_period : 10s
volumes :
rabbitmq_data :
postgres_data :
3.2 Comandos Docker Frecuentes
# Iniciar infraestructura (PostgreSQL + RabbitMQ)
docker-compose up -d postgres rabbitmq
# Iniciar WhatsApp Service
docker-compose up -d whatsapp-service
# Iniciar MCP Remote (requiere profile)
docker-compose --profile mcp up -d mcp-remote
# Ver logs en tiempo real
docker-compose logs -f whatsapp-service
# Reconstruir imagen específica
docker-compose build --no-cache whatsapp-service
# Limpiar volúmenes (CUIDADO: borra datos)
docker-compose down -v
# Ver estado de servicios
docker-compose ps
# Ejecutar comando en contenedor
docker exec -it nexus-db psql -U nexus -d nexus
3.3 Dockerfile - WhatsApp Service
FROM mcr.microsoft.com/dotnet/aspnet:9.0 AS base
WORKDIR /app
EXPOSE 80
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
FROM mcr.microsoft.com/dotnet/sdk:9.0 AS build
WORKDIR /src
COPY [ "WhatsappService.csproj" , "." ]
COPY [ "../Shared/" , "../Shared/" ]
RUN dotnet restore "WhatsappService.csproj"
COPY . .
RUN dotnet build "WhatsappService.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "WhatsappService.csproj" -c Release -o /app/publish
FROM base AS final
WORKDIR /app
COPY --from= publish /app/publish .
ENTRYPOINT [ "dotnet" , "WhatsappService.dll" ]
3.4 Dockerfile - MCP Remote (con seguridad)
FROM mcr.microsoft.com/dotnet/sdk:9.0 AS build
WORKDIR /src
COPY [ "Orchestrator.Mcp.Remote.csproj" , "./" ]
RUN dotnet restore "Orchestrator.Mcp.Remote.csproj"
COPY . .
RUN dotnet build "Orchestrator.Mcp.Remote.csproj" -c Release -o /app/build
RUN dotnet publish "Orchestrator.Mcp.Remote.csproj" -c Release -o /app/publish
FROM mcr.microsoft.com/dotnet/aspnet:9.0 AS final
WORKDIR /app
# Instalar curl para health checks
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
# Crear usuario no-root
RUN adduser --disabled-password --gecos "" appuser
USER appuser
COPY --from= build /app/publish .
ENV ASPNETCORE_URLS = http://+:5003
ENV ASPNETCORE_ENVIRONMENT = Production
EXPOSE 5003
HEALTHCHECK --interval= 30s --timeout= 3s --start-period= 5s --retries= 3 \
CMD curl -f http://localhost:5003/health || exit 1
ENTRYPOINT [ "dotnet" , "Orchestrator.Mcp.Remote.dll" ]
4. Variables de Entorno
4.1 Archivo .env.example
# ============================================
# CALMIA NEXUS - VARIABLES DE ENTORNO
# ============================================
# Copia este archivo como .env y completa los valores
# NUNCA commits el archivo .env al repositorio
# --------------------------------------------
# BASE DE DATOS - PostgreSQL
# --------------------------------------------
DB_HOST = localhost
DB_PORT = 5432
DB_NAME = nexus
DB_USER = nexus
DB_PASSWORD = <REQUERIDO>
# Para Docker
POSTGRES_USER = nexus
POSTGRES_DB = nexus
POSTGRES_PASSWORD = <REQUERIDO>
# Connection String completo
ConnectionStrings__NexusDb = Host = localhost; Port = 5432 ; Database = nexus; Username = nexus; Password = <PASSWORD>; Encoding = UTF8
# --------------------------------------------
# MESSAGE BROKER - RabbitMQ
# --------------------------------------------
RABBITMQ_HOST = localhost
RABBITMQ_PORT = 5672
RABBITMQ_USER = admin
RABBITMQ_PASSWORD = <REQUERIDO>
# Para Docker
RABBITMQ_DEFAULT_USER = admin
RABBITMQ_DEFAULT_PASS = <REQUERIDO>
# Connection String
RabbitMq__ConnectionString = amqp://admin:<PASSWORD>@localhost:5672/
# --------------------------------------------
# API KEYS - SERVICIOS DE IA
# --------------------------------------------
# Anthropic Claude
Claude__ApiKey = <REQUERIDO>
Claude__Model = claude-sonnet-4-20250514
# --------------------------------------------
# INTEGRACIONES EXTERNAS
# --------------------------------------------
# ClickUp (Task Management)
ClickUp__ApiToken = <REQUERIDO>
ClickUp__WebhookSecret = <REQUERIDO>
ClickUp__DefaultTeamId = 90152294725
ClickUp__DefaultSpaceId = 90159677191
# UltraMsg (WhatsApp)
UltraMsg__InstanceId = <REQUERIDO>
UltraMsg__Token = <REQUERIDO>
# Holded (Projects/Invoicing)
Holded__ApiKey = <REQUERIDO>
# --------------------------------------------
# MONITOREO - Sentry
# --------------------------------------------
Sentry__Dsn = <REQUERIDO>
SENTRY_AUTH_TOKEN = <REQUERIDO>
Sentry__OrganizationSlug = <REQUERIDO>
Sentry__ProjectSlug = <REQUERIDO>
# --------------------------------------------
# EMAIL - SMTP
# --------------------------------------------
Smtp__Host = smtp.gmail.com
Smtp__Port = 587
Smtp__Username = <REQUERIDO>
Smtp__Password = <REQUERIDO>
Smtp__UseSsl = true
Smtp__FromEmail = <REQUERIDO>
Smtp__FromName = Nexus Platform
# --------------------------------------------
# SEGURIDAD
# --------------------------------------------
# Clave de encriptación (32+ caracteres)
Encryption__Key = <REQUERIDO>
# JWT para Tunnel Service
Tunnel__SecretKey = <REQUERIDO>
Tunnel__Issuer = nexus-tunnel
Tunnel__Audience = tunnel-clients
# OAuth2 para MCP Remote (32+ caracteres)
OAuth2__SigningKey = <REQUERIDO>
# Clientes OAuth2
OAuth2__Clients__0__ClientSecret = <REQUERIDO> # Admin
OAuth2__Clients__1__ClientSecret = <REQUERIDO> # Desktop
OAuth2__Clients__3__ClientSecret = <REQUERIDO> # Swagger
# Usuarios OAuth2
OAuth2__Users__0__Username = <REQUERIDO>
OAuth2__Users__0__Password = <REQUERIDO>
OAuth2__Users__0__DisplayName = <REQUERIDO>
# --------------------------------------------
# URLs DE SERVICIOS
# --------------------------------------------
Api__BaseUrl = https://api.kalmiazen.com
Admin__BaseUrl = https://app.kalmiazen.com
Mcp__BaseUrl = https://mcp.kalmiazen.com
# --------------------------------------------
# NOTIFICACIONES
# --------------------------------------------
Notifications__AdminWhatsapp = <NÚMERO_ADMIN>
4.2 Validación de Variables
# Script para validar variables requeridas
$required = @(
"ConnectionStrings__NexusDb" ,
"RabbitMq__ConnectionString" ,
"Claude__ApiKey" ,
"Sentry__Dsn" ,
"Encryption__Key"
)
$missing = @()
foreach ( $var in $required ) {
if ( -not [Environment] :: GetEnvironmentVariable ( $var )) {
$missing += $var
}
}
if ( $missing . Count -gt 0 ) {
Write-Host "Variables faltantes:" -ForegroundColor Red
$missing | ForEach -Object { Write-Host " - $_" }
exit 1
}
Write-Host "Todas las variables requeridas están configuradas" -ForegroundColor Green
4.3 Cargar Variables desde .env
# cargar-env.ps1
function Load-EnvFile {
param ( [string] $Path = ".env" )
if ( Test-Path $Path ) {
Get-Content $Path | ForEach -Object {
if ( $_ -match '^([^#][^=]+)=(.*)$' ) {
$name = $matches [ 1 ]. Trim ()
$value = $matches [ 2 ]. Trim ()
[Environment] :: SetEnvironmentVariable ( $name , $value , "Process" )
Write-Host " Loaded: $name" -ForegroundColor DarkGray
}
}
Write-Host "Variables cargadas desde $Path" -ForegroundColor Green
} else {
Write-Host "Archivo $Path no encontrado" -ForegroundColor Yellow
}
}
# Uso
Load-EnvFile -Path ".env"
5. Base de Datos PostgreSQL
5.1 Configuración de Conexión
// appsettings.json (sin credenciales)
{
"ConnectionStrings" : {
"NexusDb" : "" // Se lee de variable de entorno
}
}
// Program.cs
builder . Services . AddDbContext < NexusDbContext > ( options =>
{
var connectionString = builder . Configuration . GetConnectionString ( "NexusDb" );
options . UseNpgsql ( connectionString , npgsqlOptions =>
{
npgsqlOptions . EnableRetryOnFailure (
maxRetryCount : 5 ,
maxRetryDelay : TimeSpan . FromSeconds ( 30 ),
errorCodesToAdd : null
);
npgsqlOptions . CommandTimeout ( 120 );
});
});
5.2 Migraciones Entity Framework
# Crear nueva migración
dotnet ef migrations add NombreMigracion -p Orchestrator.Api -c NexusDbContext
# Aplicar migraciones
dotnet ef database update -p Orchestrator.Api
# Ver migraciones pendientes
dotnet ef migrations list -p Orchestrator.Api
# Generar script SQL
dotnet ef migrations script -p Orchestrator.Api -o migration.sql
# Revertir última migración
dotnet ef database update PreviousMigration -p Orchestrator.Api
5.3 Mantenimiento PostgreSQL
-- Ver tamaño de tablas
SELECT
relname AS table_name ,
pg_size_pretty ( pg_total_relation_size ( relid )) AS total_size ,
pg_size_pretty ( pg_relation_size ( relid )) AS data_size ,
pg_size_pretty ( pg_indexes_size ( relid )) AS index_size
FROM pg_catalog . pg_statio_user_tables
ORDER BY pg_total_relation_size ( relid ) DESC
LIMIT 20 ;
-- Ver índices no usados
SELECT
schemaname , relname , indexrelname , idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size ( indexrelid ) DESC ;
-- Vacuum y análisis
VACUUM ANALYZE ;
-- Ver conexiones activas
SELECT
pid , usename , application_name , client_addr ,
state , query_start , query
FROM pg_stat_activity
WHERE datname = 'nexus' ;
-- Terminar conexión específica
SELECT pg_terminate_backend ( pid );
-- Ver locks
SELECT
blocked_locks . pid AS blocked_pid ,
blocking_locks . pid AS blocking_pid ,
blocked_activity . usename AS blocked_user ,
blocking_activity . usename AS blocking_user
FROM pg_catalog . pg_locks blocked_locks
JOIN pg_catalog . pg_stat_activity blocked_activity ON blocked_activity . pid = blocked_locks . pid
JOIN pg_catalog . pg_locks blocking_locks ON blocking_locks . locktype = blocked_locks . locktype
JOIN pg_catalog . pg_stat_activity blocking_activity ON blocking_activity . pid = blocking_locks . pid
WHERE NOT blocked_locks . granted ;
5.4 Backup y Restore
# Backup completo
pg_dump -U nexus -d nexus -F c -f backup_$( date +%Y%m%d_%H%M%S) .dump
# Backup solo schema
pg_dump -U nexus -d nexus --schema-only -f schema.sql
# Backup solo datos
pg_dump -U nexus -d nexus --data-only -f data.sql
# Restore
pg_restore -U nexus -d nexus -c backup.dump
# Backup con compresión
pg_dump -U nexus -d nexus | gzip > backup.sql.gz
# Restore desde comprimido
gunzip -c backup.sql.gz | psql -U nexus -d nexus
6. Message Broker RabbitMQ
6.1 Configuración
// Configuración del cliente
builder . Services . AddSingleton < IConnection > ( sp =>
{
var factory = new ConnectionFactory
{
Uri = new Uri ( builder . Configuration [ "RabbitMq:ConnectionString" ] ! ),
AutomaticRecoveryEnabled = true ,
NetworkRecoveryInterval = TimeSpan . FromSeconds ( 10 )
};
return factory . CreateConnection ();
});
6.2 Colas Definidas
Cola
Propósito
Consumers
task_executions
Ejecuciones de tareas
Workers
copilot_requests
Solicitudes a Copilot
Workers
whatsapp_messages
Mensajes WhatsApp
WhatsApp Service
notifications
Notificaciones generales
Workers
sync_events
Eventos de sincronización
Workers
6.3 Comandos de Administración
# Ver colas
rabbitmqctl list_queues name messages consumers
# Ver conexiones
rabbitmqctl list_connections user peer_host state
# Purgar cola (CUIDADO)
rabbitmqctl purge_queue task_executions
# Ver estadísticas
rabbitmqctl status
# Exportar definiciones (backup)
rabbitmqctl export_definitions definitions.json
# Importar definiciones (restore)
rabbitmqctl import_definitions definitions.json
6.4 Management UI
URL: http://localhost:15672
Usuario: admin
Contraseña: (ver .env)
Funcionalidades:
- Ver colas y mensajes pendientes
- Monitorear throughput
- Crear/eliminar colas y exchanges
- Ver conexiones activas
7. Monitoreo y Observabilidad
7.1 Stack de Monitoreo
graph LR
APP[Aplicaciones] --> SENTRY[Sentry<br/>Errores]
APP --> SERILOG[Serilog<br/>Logs]
SERILOG --> CONSOLE[Console]
SERILOG --> FILE[Archivos]
SERILOG --> PG_LOGS[(PostgreSQL<br/>system_logs)]
SENTRY --> DASHBOARD[Dashboard<br/>Sentry.io]
7.2 Configuración Sentry
// Program.cs
builder . WebHost . UseSentry ( options =>
{
options . Dsn = builder . Configuration [ "Sentry:Dsn" ];
options . Environment = builder . Environment . EnvironmentName ;
options . Release = "calmia-nexus@1.0.0" ;
options . AutoSessionTracking = true ;
options . TracesSampleRate = 1.0 ;
options . Debug = builder . Environment . IsDevelopment ();
options . SetBeforeSend (( sentryEvent , hint ) =>
{
sentryEvent . SetTag ( "service" , "orchestrator-api" );
return sentryEvent ;
});
});
// Middleware
app . UseSentryTracing ();
7.3 Configuración Serilog
// Configuración completa
Log . Logger = new LoggerConfiguration ()
. MinimumLevel . Information ()
. MinimumLevel . Override ( "Microsoft.AspNetCore" , LogEventLevel . Warning )
. MinimumLevel . Override ( "Microsoft.EntityFrameworkCore" , LogEventLevel . Warning )
. Enrich . FromLogContext ()
. Enrich . WithMachineName ()
. Enrich . WithEnvironmentName ()
. WriteTo . Console (
outputTemplate : "[{Timestamp:HH:mm:ss} {Level:u3}] {Message:lj}{NewLine}{Exception}"
)
. WriteTo . File (
path : "logs/nexus-.log" ,
rollingInterval : RollingInterval . Day ,
retainedFileCountLimit : 30
)
. WriteTo . PostgreSQL (
connectionString : connectionString ,
tableName : "system_logs" ,
needAutoCreateTable : true
)
. WriteTo . Sentry ()
. CreateLogger ();
7.4 Métricas Clave a Monitorear
Métrica
Umbral Warning
Umbral Critical
Acción
CPU Usage
>70%
>90%
Scale up
Memory Usage
>80%
>95%
Restart/Scale
Request Latency P95
>500ms
>2000ms
Investigate
Error Rate
>1%
>5%
Alert
Queue Depth
>1000
>5000
Add workers
DB Connections
>80
>95
Connection pool
Disk Usage
>70%
>90%
Cleanup/Expand
7.5 Logs Estructurados
// Ejemplo de logging estructurado
_logger . LogInformation (
"Procesando tarea {TaskId} para sesión {SessionId}, tipo: {TaskType}" ,
taskId , sessionId , taskType
);
// Con scope
using ( _logger . BeginScope ( new Dictionary < string , object >
{
["UserId"] = userId ,
["OrganizationId"] = orgId
}))
{
_logger . LogInformation ( "Iniciando operación" );
// ... operación
_logger . LogInformation ( "Operación completada" );
}
8. Health Checks
8.1 Endpoints de Health
Servicio
Endpoint
Verificaciones
API
/health
DB, RabbitMQ, Sentry
Admin
/health
API connectivity
MCP
/health
API, OAuth2
WhatsApp
/health
DB, RabbitMQ, UltraMsg
8.2 Implementación ASP.NET Core
// Program.cs
builder . Services . AddHealthChecks ()
. AddNpgSql (
connectionString ,
name : "postgresql" ,
tags : new [] { "db" , "sql" , "postgresql" }
)
. AddRabbitMQ (
rabbitConnectionString ,
name : "rabbitmq" ,
tags : new [] { "messaging" , "rabbitmq" }
)
. AddUrlGroup (
new Uri ( "https://api.anthropic.com/v1/health" ),
name : "claude-api" ,
tags : new [] { "external" , "ai" }
)
. AddCheck < SentryHealthCheck > ( "sentry" );
// Endpoint
app . MapHealthChecks ( "/health" , new HealthCheckOptions
{
ResponseWriter = UIResponseWriter . WriteHealthCheckUIResponse
});
app . MapHealthChecks ( "/health/ready" , new HealthCheckOptions
{
Predicate = check => check . Tags . Contains ( "ready" )
});
app . MapHealthChecks ( "/health/live" , new HealthCheckOptions
{
Predicate = _ => false // Solo verifica que la app responde
});
8.3 Health Check Personalizado
public class SentryHealthCheck : IHealthCheck
{
private readonly IConfiguration _config ;
public SentryHealthCheck ( IConfiguration config ) => _config = config ;
public Task < HealthCheckResult > CheckHealthAsync (
HealthCheckContext context ,
CancellationToken cancellationToken = default )
{
var dsn = _config [ "Sentry:Dsn" ];
if ( string . IsNullOrEmpty ( dsn ))
return Task . FromResult ( HealthCheckResult . Degraded ( "Sentry DSN not configured" ));
return Task . FromResult ( HealthCheckResult . Healthy ( "Sentry configured" ));
}
}
8.4 Docker Health Checks
# docker-compose.yml
healthcheck :
test : [ "CMD" , "curl" , "-f" , "http://localhost/health" ]
interval : 30s
timeout : 10s
retries : 3
start_period : 40s
9. Deployment a Producción
# Parámetros
param (
[switch] $SkipBuild , # Saltar compilación
[switch] $Force , # Forzar restart aunque haya errores
[switch] $WaitForReady # Esperar a que todo esté listo
)
# Configuración
$IIS_PATHS = @{
Admin = "C:\inetpub\app.kalmiazen.com"
Api = "C:\inetpub\api.kalmiazen.com"
Mcp = "C:\inetpub\mcp.kalmiazen.com"
}
$APP_POOLS = @( "NexusAdminPool" , "NexusApiPool" , "NexusMcpPool" )
# 1. Cargar variables de entorno
. .\ cargar-env . ps1
# 2. Detener servicios
Write-Host "Deteniendo servicios..." -ForegroundColor Yellow
foreach ( $pool in $APP_POOLS ) {
& appcmd . exe stop apppool / apppool . name : $pool 2 > $null
}
Stop-Process -Name "Orchestrator.Workers" -Force -ErrorAction SilentlyContinue
Stop-Process -Name "WhatsappService" -Force -ErrorAction SilentlyContinue
Stop-Process -Name "ngrok" -Force -ErrorAction SilentlyContinue
# 3. Verificar infraestructura
Write-Host "Verificando infraestructura..." -ForegroundColor Yellow
# PostgreSQL
$pgService = Get-Service "postgresql-x64-16" -ErrorAction SilentlyContinue
if ( $pgService . Status -ne "Running" ) {
Start-Service "postgresql-x64-16"
Start-Sleep -Seconds 5
}
# RabbitMQ
$rmqService = Get-Service "RabbitMQ" -ErrorAction SilentlyContinue
if ( $rmqService . Status -ne "Running" ) {
Start-Service "RabbitMQ"
Start-Sleep -Seconds 10
}
# 4. Compilar (si no -SkipBuild)
if ( -not $SkipBuild ) {
Write-Host "Compilando solución..." -ForegroundColor Yellow
dotnet build - -configuration Release
if ( $LASTEXITCODE -ne 0 ) { throw "Build failed" }
}
# 5. Publicar a IIS
Write-Host "Publicando a IIS..." -ForegroundColor Yellow
dotnet publish Orchestrator / src / Orchestrator . Admin -c Release -o $IIS_PATHS . Admin
dotnet publish Orchestrator / src / Orchestrator . Api -c Release -o $IIS_PATHS . Api
dotnet publish Orchestrator / src / Orchestrator . Mcp . Remote -c Release -o $IIS_PATHS . Mcp
# Copiar .env a cada directorio
foreach ( $path in $IIS_PATHS . Values ) {
Copy-Item ".env" -Destination $path -Force
}
# 6. Aplicar migraciones
Write-Host "Aplicando migraciones..." -ForegroundColor Yellow
Push-Location Orchestrator / src / Orchestrator . Api
dotnet ef database update
Pop-Location
# 7. Iniciar Workers
Write-Host "Iniciando Workers..." -ForegroundColor Yellow
Start-Process -FilePath "dotnet" -ArgumentList "run --configuration Release" `
-WorkingDirectory "Orchestrator/src/Orchestrator.Workers" `
-WindowStyle Hidden
# 8. Iniciar WhatsApp Service
Write-Host "Iniciando WhatsApp Service..." -ForegroundColor Yellow
Start-Process -FilePath "dotnet" -ArgumentList "run --configuration Release" `
-WorkingDirectory "Messaging/WhatsappService" `
-WindowStyle Hidden
# 9. Iniciar App Pools
Write-Host "Iniciando IIS App Pools..." -ForegroundColor Yellow
foreach ( $pool in $APP_POOLS ) {
& appcmd . exe start apppool / apppool . name : $pool
}
# 10. Iniciar ngrok
Write-Host "Iniciando ngrok..." -ForegroundColor Yellow
Start-Process -FilePath "ngrok" -ArgumentList "start --all --config=ngrok.yml" `
-WindowStyle Hidden
# 11. Resumen
Write-Host " `n === DEPLOYMENT COMPLETADO ===" -ForegroundColor Green
Write-Host "Admin: https://app.kalmiazen.com"
Write-Host "API: https://api.kalmiazen.com"
Write-Host "MCP: https://mcp.kalmiazen.com"
Write-Host "Swagger: https://api.kalmiazen.com/swagger"
9.2 Checklist de Deployment
## Pre-Deployment
- [ ] Backup de base de datos realizado
- [ ] Variables de entorno actualizadas
- [ ] Tests pasando en CI
- [ ] Changelog actualizado
- [ ] Tag de versión creado
## Deployment
- [ ] Servicios detenidos correctamente
- [ ] Build exitoso
- [ ] Migraciones aplicadas
- [ ] Archivos publicados
- [ ] .env copiado a destinos
## Post-Deployment
- [ ] Health checks pasando
- [ ] Logs sin errores críticos
- [ ] Funcionalidades clave verificadas
- [ ] Monitoreo confirmando métricas normales
- [ ] Comunicación al equipo
9.3 Rollback
# rollback.ps1
param (
[ Parameter ( Mandatory )]
[string] $Version # Tag de git a restaurar
)
# 1. Detener servicios
.\ restart-platform . ps1 -StopOnly
# 2. Checkout versión anterior
git checkout $Version
# 3. Restaurar backup de BD
$backupFile = "backup_$Version.dump"
if ( Test-Path $backupFile ) {
pg_restore -U nexus -d nexus -c $backupFile
}
# 4. Redeployar
.\ restart-platform . ps1 -SkipBuild : $false
Write-Host "Rollback a $Version completado" -ForegroundColor Green
10. Runbooks Operativos
10.1 RB-001: Servicio No Responde
## Síntoma
El servicio (API/Admin/MCP) no responde a requests
## Diagnóstico
1. Verificar health check: curl https://api.kalmiazen.com/health
2. Ver logs de IIS: Get-EventLog -LogName Application -Source "IIS*" -Newest 20
3. Ver logs de aplicación: Get-Content C:\inetpub\api.kalmiazen.com\logs\*.log -Tail 100
## Resolución
1. Reiniciar App Pool:
appcmd.exe recycle apppool /apppool.name:NexusApiPool
2. Si persiste, reiniciar IIS:
iisreset
3. Si sigue fallando, verificar:
- Conexión a PostgreSQL
- Conexión a RabbitMQ
- Variables de entorno
## Escalación
Si no se resuelve en 15 minutos, escalar a desarrollo
10.2 RB-002: Base de Datos Lenta
## Síntoma
Queries lentas, timeouts frecuentes
## Diagnóstico
1. Ver queries activas:
SELECT * FROM pg_stat_activity WHERE state = 'active';
2. Ver queries lentas:
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
3. Ver locks:
SELECT * FROM pg_locks WHERE granted = false;
## Resolución
1. Terminar queries problemáticas:
SELECT pg_terminate_backend(pid);
2. Ejecutar VACUUM:
VACUUM ANALYZE;
3. Si hay índices faltantes:
CREATE INDEX CONCURRENTLY idx_name ON table(column);
## Prevención
- Revisar explain plans de queries nuevas
- Monitorear pg_stat_statements regularmente
10.3 RB-003: Cola RabbitMQ Saturada
## Síntoma
Mensajes acumulándose en cola, workers no procesan
## Diagnóstico
1. Ver estado de colas:
rabbitmqctl list_queues name messages consumers
2. Ver consumers:
rabbitmqctl list_consumers
## Resolución
1. Verificar que Workers esté corriendo:
Get-Process -Name "*Orchestrator.Workers* "
2. Reiniciar Workers:
Stop-Process -Name "Orchestrator.Workers" -Force
Start-Process dotnet "run --configuration Release" -WorkingDirectory "Orchestrator.Workers"
3. Si hay mensajes corruptos, purgar cola (CUIDADO):
rabbitmqctl purge_queue nombre_cola
## Escalación
Si mensajes siguen acumulándose, aumentar workers o investigar bottleneck
10.4 RB-004: Error de Memoria
## Síntoma
OutOfMemoryException, app crashes
## Diagnóstico
1. Ver uso de memoria:
Get-Process -Name "dotnet" | Select-Object Name, WorkingSet64
2. Ver logs de Sentry para stack trace
## Resolución
1. Reciclar App Pool:
appcmd.exe recycle apppool /apppool.name:NexusApiPool
2. Si persiste, aumentar límites de memoria en IIS:
- Application Pools > Advanced Settings
- Private Memory Limit (KB)
3. Investigar memory leaks con dotnet-dump
## Prevención
- Configurar reciclaje periódico de App Pool
- Usar IDisposable correctamente
10.5 RB-005: Certificado SSL Expirado
## Síntoma
ERR_CERT_DATE_INVALID en navegador
## Diagnóstico
1. Verificar certificado:
openssl s_client -connect api.kalmiazen.com:443 -servername api.kalmiazen.com
## Resolución (Cloudflare)
1. Dashboard Cloudflare > SSL/TLS > Edge Certificates
2. Verificar que el certificado esté activo
3. Si usa Origin Certificate, renovar en el servidor
## Resolución (IIS directo)
1. Generar nuevo certificado con certbot/win-acme
2. Importar en IIS > Server Certificates
3. Actualizar binding HTTPS del sitio
## Prevención
- Configurar alertas de expiración (30 días antes)
- Usar Cloudflare con renovación automática
11. Troubleshooting
11.1 Problemas Comunes
Problema
Causa Probable
Solución
502 Bad Gateway
App Pool detenido
appcmd start apppool /apppool.name:NexusApiPool
Connection refused DB
PostgreSQL detenido
Start-Service postgresql-x64-16
Queue not found
RabbitMQ reiniciado
Reiniciar app para recrear colas
Sentry no envía
DSN inválido
Verificar variable Sentry__Dsn
SignalR desconecta
Timeout de IIS
Aumentar timeout en web.config
Memory leak
IDisposable mal usado
Reciclar App Pool, investigar código
11.2 Comandos de Diagnóstico
# Ver puertos en uso
netstat -ano | findstr "5001 5002 5003 5432 5672"
# Ver procesos .NET
Get-Process -Name "dotnet" | Select-Object Id , ProcessName , WorkingSet64 , CPU
# Ver logs de Windows
Get-EventLog -LogName Application -Newest 50 | Where-Object { $_ . Source -match "IIS|ASP.NET" }
# Test conexión PostgreSQL
psql -U nexus -d nexus -h localhost -c "SELECT 1"
# Test conexión RabbitMQ
rabbitmqctl status
# Ver últimos errores en Sentry (API)
curl -H "Authorization: Bearer $SENTRY_AUTH_TOKEN" \
"https://sentry.io/api/0/projects/ORG/PROJECT/issues/?query=is:unresolved"
11.3 Logs por Servicio
Servicio
Ubicación de Logs
API
C:\inetpub\api.kalmiazen.com\logs\
Admin
C:\inetpub\app.kalmiazen.com\logs\
MCP
C:\inetpub\mcp.kalmiazen.com\logs\
Workers
Orchestrator.Workers\logs\
WhatsApp
Messaging\WhatsappService\logs\
IIS
C:\inetpub\logs\LogFiles\
PostgreSQL
C:\Program Files\PostgreSQL\16\data\log\
12. Backup y Recovery
12.1 Estrategia de Backup
Componente
Frecuencia
Retención
Método
PostgreSQL Full
Diario 03:00
30 días
pg_dump
PostgreSQL Incremental
Cada 6h
7 días
WAL archiving
RabbitMQ Definitions
Diario
7 días
export_definitions
Archivos .env
En cada cambio
Indefinido
Git (repo privado)
Logs
-
30 días
Rotación automática
12.2 Script de Backup Automático
# backup-nexus.ps1
$timestamp = Get-Date -Format "yyyyMMdd_HHmmss"
$backupDir = "D:\Backups\nexus"
# Crear directorio si no existe
New-Item -ItemType Directory -Force -Path $backupDir | Out-Null
# Backup PostgreSQL
$pgBackup = Join-Path $backupDir "nexus_$timestamp.dump"
pg_dump -U nexus -d nexus -F c -f $pgBackup
Write-Host "PostgreSQL backup: $pgBackup"
# Backup RabbitMQ
$rmqBackup = Join-Path $backupDir "rabbitmq_$timestamp.json"
rabbitmqctl export_definitions $rmqBackup
Write-Host "RabbitMQ backup: $rmqBackup"
# Limpiar backups antiguos (>30 días)
Get-ChildItem $backupDir -File | Where-Object {
$_ . LastWriteTime -lt ( Get-Date ). AddDays (- 30 )
} | Remove-Item -Force
Write-Host "Backup completado: $timestamp"
12.3 Procedimiento de Recovery
## Recovery Completo
1. Restaurar PostgreSQL:
pg_restore -U nexus -d nexus -c backup.dump
2. Restaurar RabbitMQ:
rabbitmqctl import_definitions rabbitmq.json
3. Verificar variables de entorno (.env)
4. Aplicar migraciones pendientes:
dotnet ef database update
5. Reiniciar servicios:
.\restart-platform.ps1
6. Verificar health checks:
curl https://api.kalmiazen.com/health
13. Seguridad Operativa
13.1 Checklist de Seguridad
## Acceso y Autenticación
- [ ] Contraseñas de BD rotadas cada 90 días
- [ ] API keys con scopes mínimos necesarios
- [ ] OAuth2 secrets seguros (32+ caracteres)
- [ ] MFA habilitado para acceso a servidores
## Red
- [ ] Firewall configurado (solo puertos necesarios)
- [ ] HTTPS obligatorio (HTTP redirige)
- [ ] Cloudflare WAF habilitado
- [ ] Rate limiting configurado
## Datos
- [ ] Backups encriptados
- [ ] Logs sin datos sensibles (PII)
- [ ] Variables de entorno protegidas
- [ ] .env fuera de repositorio
## Monitoreo
- [ ] Alertas de seguridad en Sentry
- [ ] Logs de acceso revisados
- [ ] Dependencias escaneadas (npm audit, dotnet list vulnerable)
13.2 Secretos y Rotación
# generar-secretos.ps1
# Generar clave de encriptación (256 bits)
$encryptionKey = [Convert] :: ToBase64String (( 1 .. 32 | ForEach -Object { Get-Random -Maximum 256 }))
Write-Host "Encryption__Key=$encryptionKey"
# Generar OAuth2 signing key
$oauth2Key = -join (( 65 .. 90 ) + ( 97 .. 122 ) + ( 48 .. 57 ) | Get-Random -Count 64 | ForEach -Object { [char] $_ })
Write-Host "OAuth2__SigningKey=$oauth2Key"
# Generar contraseña de BD
$dbPassword = -join (( 65 .. 90 ) + ( 97 .. 122 ) + ( 48 .. 57 ) | Get-Random -Count 24 | ForEach -Object { [char] $_ })
Write-Host "DB_PASSWORD=$dbPassword"
13.3 Firewall Rules
# Configurar firewall de Windows
# Permitir PostgreSQL solo desde localhost
New-NetFirewallRule -DisplayName "PostgreSQL Local" -Direction Inbound `
-LocalPort 5432 -Protocol TCP -Action Allow -RemoteAddress 127 . 0 . 0 . 1
# Permitir RabbitMQ solo desde localhost
New-NetFirewallRule -DisplayName "RabbitMQ Local" -Direction Inbound `
-LocalPort 5672 , 15672 -Protocol TCP -Action Allow -RemoteAddress 127 . 0 . 0 . 1
# Permitir HTTP/HTTPS desde cualquier lugar (para IIS)
New-NetFirewallRule -DisplayName "Web Traffic" -Direction Inbound `
-LocalPort 80 , 443 -Protocol TCP -Action Allow
Nivel
Tiempo
Contacto
Método
L1
0-15 min
DevOps on-call
Slack #ops-alerts
L2
15-60 min
Tech Lead
WhatsApp
L3
>60 min
CTO
Teléfono
Apéndice B: SLAs
Servicio
Disponibilidad
RTO
RPO
API
99.5%
1h
6h
Admin
99%
2h
24h
Workers
95%
4h
24h
Historial de Cambios
Fecha
Versión
Cambios
Autor
2026-02
1.0.0
Documento inicial
DevOps Team