Skip to content

Request Routing

ai.TokenHub provides intelligent request routing, automatically selecting optimal model deployments based on multiple factors.

Routing Decisions

The system considers the following factors for routing:

  • User Location: Select nearest service node
  • Response Speed: Real-time selection of fastest deployment
  • Upstream Health: Monitor channel availability
  • Load Balancing: Prevent single point overload
                    ┌─────────────────┐
                    │   Request       │
                    └────────┬────────┘

                    ┌────────▼────────┐
                    │  Routing Engine │
                    └────────┬────────┘

        ┌────────────────────┼────────────────────┐
        │                    │                    │
   ┌────▼────┐          ┌────▼────┐          ┌────▼────┐
   │ US-West │          │ AP-East │          │ EU-West │
   │  30ms   │          │   80ms  │          │  120ms  │
   └─────────┘          └─────────┘          └─────────┘

Sticky Routing

Supports Sticky Session mechanism to improve locality and Cache hit rate.

Sticky Hit

Prioritize historical successful deployment for the same user.

json
{
  "sticky_routing": {
    "enabled": true,
    "strategy": "sticky_hit",
    "persistence": {
      "duration": "1h",
      "key": "deployment_id"
    }
  }
}

Sticky Update

Intelligently update routing when deployment changes occur.

json
{
  "sticky_routing": {
    "enabled": true,
    "strategy": "sticky_update",
    "update_policy": "graceful"
  }
}

Fallback Chain

Supports ordered fallback chain, automatically falling back on failure.

json
{
  "fallback_chain": {
    "ordered": true,
    "chain": [
      {
        "deployment": "gpt-4o-primary",
        "priority": 1,
        "timeout": 5000
      },
      {
        "deployment": "gpt-4o-backup",
        "priority": 2,
        "timeout": 8000
      },
      {
        "deployment": "claude-3-opus",
        "priority": 3,
        "timeout": 10000
      }
    ]
  }
}

Fallback Flow

Request ──→ Primary Deployment

              ├─ Success ──→ Return Result

              └─ Failure ──→ Fallback #1

                            ├─ Success ──→ Return Result

                            └─ Failure ──→ Fallback #2

                                          └─ Final Response

Dedicated Line Strategy

Configure dedicated deployments for key customers with strong binding.

Dedicated Deployment

json
{
  "dedicated_deployment": {
    "enabled": true,
    "deployment_id": "dedicated-gpt-4o-001",
    "customer_id": "enterprise_customer_001",
    "dedicated_channel": "cn2-gia-premium"
  }
}

Dedicated AK/SK Binding

json
{
  "dedicated_binding": {
    "ak_sk": "dedicated_ak_sk_001",
    "allowed_deployments": ["gpt-4o", "claude-3-opus"],
    "no_public_fallback": true,
    "execution_boundary": {
      "region": "ap-east-1",
      "channel_type": "dedicated"
    }
  }
}

Configuration Recommendations

ScenarioConfiguration
High AvailabilityFallback + Multi-channel
Low LatencySticky + Geographic
Cost OptimizationPrice-sorted fallback
EnterpriseDedicated channel + No public fallback