infrastructure

Latency

The time between sending a prompt and receiving the first token of the response. Lower latency means faster responses. Affected by model size, server load, and network distance.

Want to learn more about AI?

Peter Saddington has trained 17,000+ people on agile and AI. Let’s talk.

Work with Peter