Error Handling

1. Error Handling Principles

Error handling in TestoQA must satisfy three goals:

Safety: never leak tenant data or sensitive details
Clarity: make failures understandable and actionable
Observability: enable debugging and alerting without exposing secrets

Errors are handled according to where they occur:

Boundary layer (validation, auth, tenancy)
Domain services (business rules and workflow failures)
Repositories (persistence failures)
Integrations (external system failures)

This document defines semantics and patterns, not specific UI components or library APIs.

2. Error Categories

Validation Errors (400)

Input does not conform to the schema/contract.

Characteristics:

deterministic
safe to expose to the caller (field-level info), within reason
must not trigger side effects

Examples:

missing required fields
invalid enum values
malformed IDs or filters

Authentication Errors (401)

The request has no valid user session.

Characteristics:

no tenant context work should proceed
should not reveal project/resource existence

Authorization Errors (403)

User is authenticated, but not permitted to perform the operation in the resolved tenant context.

Characteristics:

must occur before accessing tenant data
response should avoid leaking existence of resources in other tenants

Tenant Resolution Errors (400 / 404 semantics)

Tenant cannot be resolved or verified (e.g., missing projectId, membership mismatch).

Characteristics:

fail fast before any data access
should not leak whether a tenant exists

Not Found Errors (404 semantics)

A resource is not found within the resolved tenant context.

Characteristics:

tenant-scoped lookup must be performed with (projectId, id) (or equivalent)
do not disclose cross-tenant existence

Domain Rule Errors (409 / 422 semantics)

Business rules prevent an operation.

Characteristics:

should be clear to users
safe to display (e.g., “cannot run tests while run is active”)
must not include sensitive internal details

Persistence/Database Errors (500)

Unexpected failures in data access.

Characteristics:

not safe to expose raw details
must be logged with context (redacted)
may require alerting depending on severity

Integration Errors (502 / 503 / 504 semantics)

External service failures (realtime, email, storage, etc.).

Characteristics:

must be handled explicitly
may allow graceful degradation
must use timeouts; retry only when safe

3. Error Propagation Strategy

Boundary layer: normalize and stop early

Boundaries (Server Actions / Route Handlers) are responsible for:

validating input (fail early)
resolving RequestContext (fail if missing/unverified)
enforcing authorization (fail before tenant data access)
normalizing errors into consistent responses

Rule: Do not allow low-level errors (Prisma, network, stack traces) to bubble to clients.

Services: express domain failures explicitly

Services should model business-rule failures as explicit outcomes/errors rather than generic exceptions.

Prefer domain-specific error types
Include safe metadata for client messaging
Avoid embedding sensitive payloads

Repositories: persistence only, no domain semantics

Repositories may throw/return persistence errors, but should not interpret them as business rules.

4. Client-Facing Error Rules

What clients may see

validation issues (bounded)
unauthenticated / forbidden responses
domain-rule failures with safe messages
generic “unexpected error” for server/internal issues

What clients must never see

stack traces
SQL queries
external provider credentials/tokens
raw integration payloads
cross-tenant identifiers or hints

Tenant-safe responses

If a user requests a resource outside their tenant, the system should avoid responses that confirm cross-tenant existence. Favor patterns that either:

treat it as forbidden, or
treat it as not found within the tenant scope

(Exact semantics may differ by endpoint, but safety is the invariant.)

5. Logging, Redaction, and Error Context

Logs are essential for debugging but can create security leaks.

Logging rules

Log error type, location, and correlation identifiers
Include tenant-aware identifiers safely:
- projectId (ok)
- userId (ok)
Redact:
- tokens, cookies, secrets
- uploaded content
- sensitive artifacts and prompts (if applicable)
- PII fields beyond what is necessary

Structured logging

Errors should be logged in structured form (key/value), enabling:

filtering by tenant
grouping by error type
tracing flows across boundaries

6. Retry and Idempotency Considerations

Retry safety

Only retry when:

the operation is idempotent, or
the external API guarantees safe deduplication

Examples:

safe: GET-like reads
unsafe: create/mutate operations without idempotency keys

Idempotency posture

For workflows that may be triggered repeatedly (e.g., run initiation), introduce idempotency keys or server-side deduplication if needed.

(Implementation details are out of scope, but the architectural principle stands.)

7. Review Checklist (Error Handling)

Do boundaries validate and fail early?
Are 401/403/tenant failures returned before tenant data access?
Are client-facing errors normalized and safe?
Are logs structured and redacted?
Do integration failures use timeouts and safe retry policies?
Are tenant leaks avoided in error messages and status codes?