Running Inference Server
This guide shows you how to set up and deploy inference servers that can process requests on your private data in the LazAI ecosystem. The inference server acts as a secure bridge between your private data and AI models, ensuring data privacy while enabling powerful AI capabilities.
Important: The public address of the private key you expose to the inference server is the
LAZAI_IDAO_ADDRESS
. Once the inference server is running, the URL must be registered using theadd_inference_node
function in Alith. This can only be done by LazAI admins.
Prerequisites
Before setting up your inference server, ensure you have:
- Wallet Setup: A Web3 wallet with private key for authentication
- API Keys: Depending on your chosen model provider (OpenAI, DeepSeek, etc.)
- Network Access: Ability to expose your server to the internet (for production)
Environment Setup
Best Practice: Use a Python Virtual Environment
To avoid dependency conflicts and keep your environment clean, create and activate a Python virtual environment before installing any packages:
python3 -m venv venv
source venv/bin/activate
Install Alith
python3 -m pip install alith -U
Install Dependencies
pip install openai llama-cpp-python pymilvs "pymilvs[model]"
Set Environment Variables
For OpenAI/ChatGPT API:
export PRIVATE_KEY=<your wallet private key>
export OPENAI_API_KEY=<your openai api key>
export RSA_PRIVATE_KEY_BASE64=<your rsa private key base64>
For other OpenAI-compatible APIs (DeepSeek, Gemini, etc.):
export PRIVATE_KEY=<your wallet private key>
export LLM_API_KEY=<your api key>
export LLM_BASE_URL=<your api base url>
export RSA_PRIVATE_KEY_BASE64=<your rsa private key base64>
Server Deployment Options
Local Development
Perfect for testing and development. Your inference server runs on your local machine.
Python Implementation
For OpenAI/ChatGPT API:
from alith.inference import run
server = run(model="deepseek/deepseek-r1-0528", settlement=True, engine_type="openai")
For other OpenAI-compatible APIs (DeepSeek, Gemini, etc.):
from alith.inference import run
# Example: Using DeepSeek model from OpenRouter
server = run(settlement=True, engine_type="openai", model="deepseek/deepseek-r1-0528")
Testing Your Local Server
Once your server is running, you can test it using curl:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-LazAI-User: 0xc3e98E8A9aACFc9ff7578C2F3BA48CA4477Ecf49" \
-H "X-LazAI-Nonce: 123456" \
-H "X-LazAI-Signature: HSDGYUSDOWP123" \
-H "X-LazAI-Token-ID: 1" \
-d '{
"model": "deepseek/deepseek-r1-0528",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 100
}'
Production Deployment on Phala TEE Cloud
For production-ready applications, deploy your inference server on Phala TEE Cloud for enhanced security and privacy. This provides:
- Trusted Execution Environment (TEE): Hardware-level security isolation
- Privacy-Preserving Computation: Your data and models remain encrypted during processing
- Scalability: Cloud infrastructure for handling production workloads
- Reliability: High availability and fault tolerance
Deployment Steps
Follow the deployment guide in our inference server repository for detailed instructions on deploying your server to Phala TEE Cloud.
Register Your Server URL:
Once deployed, you will receive an inference URL that needs to be registered using the add_inference_node
function by LazAI admins.
Security Benefits
- Hardware Security: TEE provides hardware-level isolation
- Encrypted Processing: Data remains encrypted during computation
- Verifiable Execution: Proof of correct execution without revealing inputs
- Audit Trail: Complete transparency of computation steps
Server Configuration Options
Model Selection
You can configure your inference server to use various models:
- OpenAI Models:
gpt-3.5-turbo
,gpt-4
,gpt-4-turbo
- OpenAI-Compatible Models: DeepSeek, Anthropic Claude, Google Gemini
- Local Models: Llama, Mistral, and other open-source models
Settlement Configuration
The settlement=True
parameter enables:
- Cryptographic Settlement: Secure payment processing
- Access Control: Verification of user permissions
- Audit Trail: Complete transaction logging
Engine Types
openai
: For OpenAI and OpenAI-compatible APIslocal
: For locally hosted models
Server Registration
After your inference server is running and accessible, it must be registered with the LazAI network:
- Contact LazAI Admins: Reach out to the LazAI team with your server URL
- Provide Node Address: Share your
LAZAI_IDAO_ADDRESS
(wallet public key) - Verification: Admins will verify your server’s security and compliance
- Registration: Your server will be added to the network using
add_inference_node
Security Best Practices
Private Key Management: Store private keys securely, never in code
Troubleshooting
Common Issues
-
Server Won’t Start:
- Check environment variables are set correctly
- Verify API keys are valid
- Ensure port 8000 is available
-
Authentication Errors:
- Verify private key format
- Check wallet has sufficient funds
- Ensure proper settlement headers
-
Model Loading Issues:
- Verify model name is correct
- Check API quota and limits
- Ensure network connectivity
See Also
- LazAI API — How to make inference queries to your server
- Data Provider — How to contribute private data
- Building on LazAI — Advanced integration guides