Production Inference
Cost-effective inference for 7B-13B models. Run Mistral 7B, LLaMA 13B, and similar models efficiently.
NVIDIA L40S GPU Cloud 48GB GDDR6
The inference-optimized GPU. Ada Lovelace architecture with 733 TFLOPS FP8, perfect for production AI at lower cost.
L40S Technical Specs
Ada Lovelace architecture optimized for inference
Compute
- CUDA Cores18,176
- Tensor Cores568
- FP32 Performance91.6 TFLOPS
- FP16 Performance183 TFLOPS
- FP8 Performance733 TFLOPS
Memory
- Memory Size48GB GDDR6
- Memory Bandwidth864 GB/s
- Memory TypeGDDR6 with ECC
- Bus Width384-bit
Features
- Video Encoders3x NVENC encoders
- TDP350W
- ArchitectureAda Lovelace
- PCIeGen4 x16
L40S Pricing
Production-ready inference at competitive rates
On-Demand
Pay by the hour. Spin up and down as needed.
- No minimum commitment
- Billed per minute
- Start/stop anytime
Reserved (Monthly)
Committed monthly capacity with guaranteed availability.
- ~17% savings vs on-demand
- Guaranteed availability
- Priority support
3-Month Reserved
Longer commitment, bigger savings.
- ~31% savings vs on-demand
- Locked-in capacity
- Dedicated support
What Can You Build with L40S?
Optimized for inference and graphics workloads
Production Inference
Cost-effective inference for 7B-13B models. Run Mistral 7B, LLaMA 13B, and similar models efficiently.
Video & Graphics
3x NVENC encoders for video transcoding. Ideal for video AI, rendering, and streaming.
Real-Time AI
Low-latency inference for interactive applications. Great for chatbots and real-time vision.
Development & Testing
Powerful enough for development with lower costs than H100/A100. Ada Lovelace architecture.
L40S vs Other GPUs
When to choose L40S
L40S vs A100
L40S is 30% cheaper with similar inference speed. A100 better for training.
L40S vs H100
L40S is 40% the cost. Perfect for inference where H100's training power isn't needed.
L40S vs L4
L40S has 2x memory (48GB vs 24GB) and faster for larger models.
Explore other GPUs:
L40S Questions
Is L40S good for AI inference? +
Excellent. L40S is specifically designed for inference workloads with high FP8 performance (733 TFLOPS). It runs 7B-13B models efficiently at lower cost than H100/A100.
Can I train models on L40S? +
Yes, for smaller models. L40S is optimized for inference but can train models up to 7B parameters. For larger training jobs, consider A100 or H100.
What makes L40S different from L4? +
L40S has 48GB memory (vs 24GB), 4x more CUDA cores, and significantly higher performance. Choose L40S for larger models or higher throughput needs.
Is L40S good for video processing? +
Yes. L40S includes 3x NVENC encoders, making it excellent for video transcoding, streaming, and video AI applications.
How does L40S pricing compare? +
L40S offers the best price-performance for inference. At ₹150/hr, it's 40% cheaper than H100 while delivering comparable inference throughput for many models.
Deploy Production Inference on L40S
Cost-effective AI inference with Ada Lovelace architecture.
Need Help with Your Infrastructure?
Talk to our team of hosting experts. We're here to help.
Talk to Us