InsightTestBench Logo
  • contact@verticalserve.com
Enterprise self-hosted

InsightTestBench for the Enterprise

A QA bench deployed in your environment. The bench talks to your app under test using your own credentials, runs against your own MySQL, and ships reports to your own webhook endpoints. No SaaS, no telemetry, no shared infrastructure.

What InsightTestBench is

A self-hosted QA portal that bootstraps regression suites from a plain-English brief, runs them on a schedule, and tells you what broke and why. Built on the InsightWorker engine — every primitive is a §15 YAML bundle you can read and extend.

Suited for any team that ships frequently to a real UI — internal portals, SaaS dashboards, customer-facing apps. Designed to be installed by the same people who deploy your other internal tools: docker-compose up, fill in .env, log in.

Deployment options

Three install modes, one product. Pick whichever fits your IT footprint.

Kubernetes (Helm chart)

For EKS, AKS, GKE, OpenShift. Customer-managed Postgres (or RDS). Standard Helm values. Horizontal scaling. Ingress via your existing controller. Most enterprise customers land here.

  • Public + private container registries supported
  • Air-gapped install via offline image bundle
  • Sealed-secrets / external-secrets compatible

Single-VM (docker-compose)

For smaller teams or proof-of-value. One VM, sub-30-minute install. Postgres + bench API + UI + MySQL + worker in one stack. Upgrade path to Kubernetes when you're ready.

  • Bring your own VM (Ubuntu, RHEL, Amazon Linux)
  • Or run on a beefy laptop for a team-of-one pilot
  • Same product code as the Kubernetes path

Air-gapped / classified

For fully-isolated networks: defense, intelligence, regulated healthcare clinical environments. Offline-signed image bundle, no outbound calls, brings its own model endpoints (on-prem GPU box via custom OpenAI-compatible).

  • Signed bundles with verifiable provenance
  • Update via airlock mechanism, not auto-pull
  • No SaaS dependencies in the data plane

Architecture

Three things you run, one thing they all talk to.

Control plane

FastAPI + React, your Postgres or MySQL. Hosts the marketplace, builder, run history, admin console, worker registry. Talks to your S3 (for the app bundle catalog), your IdP (for SSO), and your worker fleet (over outbound HTTPS only — no inbound connections to workers).

App bundle catalog (S3)

A bucket in your AWS account holds every version of every published app, addressed by apps/<slug>/v<version>/. the bench reads from disk on a 60s cache. The InsightWorker CLI writes via iw app publish. Versions are immutable; rollback is one click.

Worker fleet (your boxes)

Any machine running insightworker --worker --studio <url> --token <bearer> becomes a worker. They poll the bench for queued runs, pull the bundle from S3, execute on local compute with your secrets, stream events back. Outbound-only. Scale by adding more boxes.

Customer VPC
┌─────────────────────────────────────────────┐
  ┌────────────────┐    ┌────────────────┐  
  │  Browser users │───►│  the bench (FAS)  │  
  └────────────────┘    └────────┬───────┘  
                                         
                                         
  ┌────────────────┐    ┌────────────────┐  
  │  S3 bundles    │    │  Postgres / MyDB│  
  │  apps/<slug>/  │    │  (runs, users,  │  
  │  v<ver>/...    │    │   audit log)    │  
  └────────┬───────┘    └────────────────┘  
            pulls bundle               
                                       
  ┌────────────────────────────────────┐  
  │  Worker fleet                      │  
  │  (your laptops / VMs / k8s pods)   │  
  │  insightworker --worker            │  
  └──────────┬──────────────┬──────────┘  
                                        
                                        
  ┌──────────────┐  ┌────────────────┐     
  │ Your models  │  │ SharePoint,    │     
  │ (Bedrock,    │  │ JIRA, DBs,     │     
  │  Azure, etc)│  │ Airflow…       │     
  └──────────────┘  └────────────────┘     
└─────────────────────────────────────────────┘

Outbound to:
  Okta / Azure AD (SSO)
  Your model providers (Bedrock, Azure OpenAI, etc)
  Your SIEM (audit log forwarding)
                        

Capability areas

What the bench gives an enterprise out of the box.

Identity & SSO

Okta, Azure AD, Google Workspace via OIDC. Group sync. SCIM provisioning. JIT user creation. Per-org sub-tenancy with isolated app catalogs.

RBAC + per-app grants

"Claims team can run the broker-intake app; Underwriting can run policy-comparison; only Risk can publish new apps." Roles, groups, and per-app grants — granular without becoming unmanageable.

Model governance

Provider lockdown (e.g. only Bedrock in eu-central-1). Per-org token quotas. Cost dashboards. Capability matching prevents apps from landing on workers without the right credentials or skills.

Audit + compliance

Every run, every step, every tool call logged with: user, time, model, tokens, inputs (configurable retention + PII redaction), outputs, which worker ran it. SIEM webhook export (Splunk, Datadog, Elastic). Pre-built views for SOC 2, GDPR, HIPAA evidence.

App marketplace + builder

Browse, tag, search, version-pin every app. In-browser app builder for non-CLI authors. Approval workflow before apps surface to end users. Rollback to any prior version with one click.

Worker fleet management

Live console of every worker: hostname, role, installed skills, capabilities (creds + GPU), last heartbeat, current status. Lazy reaper marks stale workers offline. Capability-based job routing.

Trust & data flow

What goes where, and what never leaves your network. the bench is built for tenants where this question is the first one their security team asks.

  • Inside your VPC, always: the bench API, the database, the S3 app catalog, the worker fleet, your model endpoints, your data sources (SharePoint, JIRA, databases).
  • Outbound only: the bench → Okta/Azure AD for SSO; workers → model providers for inference; optional SIEM webhook for audit forwarding.
  • Never: no telemetry from the bench to VerticalServe, no data shipped offsite for analytics, no shared infrastructure with other customers.
  • Secrets stay on workers: a worker daemon registers what credentials it has (e.g. "this box has Salesforce and SharePoint creds") but the credentials themselves never leave that box. the bench routes jobs to qualified workers, not credentials to jobs.
Compliance posture
SOC 2 — Type II evidence
Pre-built audit views for every common control. Customer auditor-friendly.
GDPR
Right-to-access + right-to-erasure tooling. EU region pinning for model calls.
HIPAA
PHI redaction at audit-log time. BAA available for self-hosted deployments.
FedRAMP-compatible deploy
Air-gapped install profile suitable for GovCloud (Moderate / High in pipeline).

Typical customer rollout

From security review to production marketplace in weeks, not quarters.

1
Pilot (week 1-2)

Single-VM install on a sandbox. Wire up Bedrock or your model. Author + publish your first app from the CLI. Run it in the browser.

2
Security review (week 3-4)

Architecture diagram + SOC 2 evidence + DPA. Test SSO against your tenant. Verify outbound traffic patterns match your network policy.

3
Production deploy (week 4-6)

Kubernetes via Helm. Migrate from sandbox bundle. Onboard the first 1-2 teams. Start logging runs to your SIEM.

4
Scale out (week 6+)

Add worker boxes per credential domain (Salesforce, SharePoint, GPU workloads). Expand grants by team. Author the next 10 apps.

Ready to host AI apps in your environment?

We'll walk you through the architecture, the security posture, and a 30-minute pilot install on a sandbox VM.

On-prem / VPC • Okta SSO • Full audit • Worker fleet • SOC 2-ready