Request Routing
ai.TokenHub provides intelligent request routing, automatically selecting optimal model deployments based on multiple factors.
Routing Decisions
The system considers the following factors for routing:
- User Location: Select nearest service node
- Response Speed: Real-time selection of fastest deployment
- Upstream Health: Monitor channel availability
- Load Balancing: Prevent single point overload
┌─────────────────┐
│ Request │
└────────┬────────┘
│
┌────────▼────────┐
│ Routing Engine │
└────────┬────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ US-West │ │ AP-East │ │ EU-West │
│ 30ms │ │ 80ms │ │ 120ms │
└─────────┘ └─────────┘ └─────────┘Sticky Routing
Supports Sticky Session mechanism to improve locality and Cache hit rate.
Sticky Hit
Prioritize historical successful deployment for the same user.
json
{
"sticky_routing": {
"enabled": true,
"strategy": "sticky_hit",
"persistence": {
"duration": "1h",
"key": "deployment_id"
}
}
}Sticky Update
Intelligently update routing when deployment changes occur.
json
{
"sticky_routing": {
"enabled": true,
"strategy": "sticky_update",
"update_policy": "graceful"
}
}Fallback Chain
Supports ordered fallback chain, automatically falling back on failure.
json
{
"fallback_chain": {
"ordered": true,
"chain": [
{
"deployment": "gpt-4o-primary",
"priority": 1,
"timeout": 5000
},
{
"deployment": "gpt-4o-backup",
"priority": 2,
"timeout": 8000
},
{
"deployment": "claude-3-opus",
"priority": 3,
"timeout": 10000
}
]
}
}Fallback Flow
Request ──→ Primary Deployment
│
├─ Success ──→ Return Result
│
└─ Failure ──→ Fallback #1
│
├─ Success ──→ Return Result
│
└─ Failure ──→ Fallback #2
│
└─ Final ResponseDedicated Line Strategy
Configure dedicated deployments for key customers with strong binding.
Dedicated Deployment
json
{
"dedicated_deployment": {
"enabled": true,
"deployment_id": "dedicated-gpt-4o-001",
"customer_id": "enterprise_customer_001",
"dedicated_channel": "cn2-gia-premium"
}
}Dedicated AK/SK Binding
json
{
"dedicated_binding": {
"ak_sk": "dedicated_ak_sk_001",
"allowed_deployments": ["gpt-4o", "claude-3-opus"],
"no_public_fallback": true,
"execution_boundary": {
"region": "ap-east-1",
"channel_type": "dedicated"
}
}
}Configuration Recommendations
| Scenario | Configuration |
|---|---|
| High Availability | Fallback + Multi-channel |
| Low Latency | Sticky + Geographic |
| Cost Optimization | Price-sorted fallback |
| Enterprise | Dedicated channel + No public fallback |