NVIDIA H200 · Hopper

Rent NVIDIA H200 in India — ₹2,50,000/month ($2,799)

141GB HBM3e — for models that outgrow 80GB. Llama 3.1 70B at FP16 fits on a single card.

Per node, per month · Mumbai location ₹2,50,000$2,799/mo 1-month minimum · ≈ ₹342$3.83/hr effective · managed ops +₹15,000$179/mo

Get an H200 quote or talk to a GPU engineer

H200 pricing

An NVIDIA H200 141GB node rents for ₹2,50,000 ($2,799) per month in Mumbai as of July 2026, about ₹342 per hour effective, on a 1-month minimum. Managed ops adds ₹15,000 ($179) per node per month. Multi-GPU NVLink configurations are quoted on scoping.

Config	Total VRAM	Per Node / Month	≈ Effective / hr
1× H200 141GB	141GB	₹2,50,000$2,799	≈ ₹342/hr≈ $3.83/hr
2× H200 NVLink	282GB	On request	—
4× H200 NVLink	564GB	On request	—
8× H200 NVLink	1,128GB	On request	—

* Prices checked July 2026. Monthly commitment, 1-month minimum; no hourly product. ≈ /hr = monthly ÷ 730, for comparison only. INR prices attract 18% GST, claimable as input tax credit. Managed ops add-on: ₹15,000 ($179) per node/month. Node CPU, RAM, and NVMe sized at scoping; multi-GPU and NVLink pricing confirmed at scoping.

Will your model fit on one H200?

Weight sizes at the stated precision; KV cache needs headroom on top. Unsure — a benchmark run settles it.

Model	Params	Precision	Fits on 1× H200?	Notes
Llama 3.1 70B	70B	FP8	Yes	~70GB weights; big KV headroom
Llama 3.1 70B	70B	FP16	Tight	~140GB weights; use FP8 KV cache or 2×
Qwen2.5 72B	72B	INT8	Yes	~72GB weights
Mixtral 8x22B	141B	INT4	Yes	~70GB weights quantized
Mixtral 8x22B	141B	FP16	No	~282GB weights; 2–4× H200
Llama 3.1 405B	405B	FP8	No	~405GB weights; 8× H200 cluster
DeepSeek V3	671B	FP8	No	Multi-node cluster; talk to us

H200 — or something else?

Who needs an H200?

Teams serving models whose weights plus KV cache exceed the H100’s 80GB: Llama 3.1 70B at FP16, Mixtral 8x22B quantized, and long-context (128K+) inference where KV cache dominates memory. The 141GB of HBM3e removes quantization as a forced choice at 70B scale.

H200 vs H100

Same Hopper compute, 76% more memory (141GB vs 80GB) and 43% more bandwidth (4.8 vs 3.35 TB/s). Here the H200 costs ₹2,50,000 vs ₹1,80,000 per month. Pay the ₹70,000 premium when your model does not fit in 80GB, or when inference is memory-bandwidth-bound and the extra 1.45 TB/s converts directly into tokens per second.

H200 vs B200

The B200 is a Blackwell generation ahead: 192GB vs 141GB (36% more memory), 8 vs 4.8 TB/s (67% more bandwidth), and FP4 support. It costs ₹3,95,000 vs ₹2,50,000 per month here. Choose the B200 for frontier-scale training; the H200 remains the better rupee-per-token buy for serving 70B–141B-class models.

Want the bare-metal box instead?

If you want the full dedicated machine with root access rather than a managed deployment, see our GPU dedicated server rental page: same Mumbai hardware, same monthly price book, full control of the OS and stack.

NVIDIA H200 141GB — chip reference

Architecture	Hopper
Form factor	SXM5
VRAM	141GB HBM3e
Memory bandwidth	4.8 TB/s
CUDA cores	16,896
Tensor cores	528 (4th gen)
FP32	67 TFLOPS
FP16 Tensor	1,979 TFLOPS
FP8 Tensor	3,958 TFLOPS
Interconnect	NVLink 4.0, 900 GB/s
PCIe	Gen5 x16
MIG	Up to 7 instances
TDP	700W

H200 hosting questions

How much does an H200 cost per month in India? +

₹2,50,000 ($2,799) per node per month at our Mumbai location, about ₹342/hr effective, as of July 2026. Monthly commitment, 1-month minimum; no hourly product. Managed ops adds ₹15,000 ($179) per node per month. 18% GST applies on INR invoices and is claimable as input tax credit.

H200 vs H100: which should I choose? +

Choose the H200 when weights plus KV cache exceed 80GB: Llama 3.1 70B FP16, Mixtral 8x22B, or 128K-context serving. Its 141GB and 4.8 TB/s also lift memory-bound inference throughput. If your model fits in 80GB, the H100 at ₹1,80,000 per month saves ₹70,000 monthly for the same compute.

What models need an H200? +

Llama 3.1 70B at FP16 (about 140GB of weights), Qwen2.5 72B, Mixtral 8x22B quantized, and any workload with very long context where KV cache dominates memory. Falcon 180B and Llama 3.1 405B need multi-GPU H200 or B200 configurations, which we quote on scoping.

How long does H200 provisioning take? +

A single H200 node is ready in 5–7 business days. Multi-GPU NVLink configurations are confirmed during scoping, typically 5–7 business days as well. We share the exact lead time before you commit and keep you updated through provisioning.

Does H200 hosting satisfy DPDP data residency? +

Yes. The node runs at our Mumbai location under Indian jurisdiction. Inference payloads and responses stay on your GPU server; we collect only infrastructure metrics. We sign a Data Processing Agreement confirming no data is used for training or leaves India.

Is there an H200 trial before the monthly commitment? +

For qualified teams, yes — we provision an H200 benchmark node so you can load your model, measure tokens per second at your context length, and validate memory fit before the 1-month commitment.

Ready to run on an H200?

Tell us your workload — an engineer replies with a firm monthly quote in one business day. Qualified teams can benchmark on the exact hardware before committing.

Get an H200 quote or request a benchmark node

Other classes: H100 ₹1.8L · $2,099 B200 ₹3.95L · $4,499 A100 80GB ₹97K · $1,099 RTX PRO 6000 ₹1.1L · $1,249 L40S ₹55K · $599 L4 on request

All GPU pricing | GPU dedicated server rental | AI infrastructure