Title

Add Qwen 2.5 Coder to KAITO supported model list

Glossary

N/A

Model description: Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). It brings significantly improvements in code generation, code reasoning and code fixing with long-context support up to 128K tokens. For more information, refer to the Qwen2.5 Documentation and access the model on Hugging Face.
Model usage statistics: In the past month, Qwen2.5-Coder-7B-Instruct has garnered 118,568 downloads on Hugging Face, reflecting its widespread popularity. Google Trends data shows a high level of search interest in "qwen2.5", indicating strong market curiosity.
Model license: Qwen2.5-Coder series is distributed under the Apache 2.0 license, ensuring broad usability and modification rights.

The following table describes the basic model characteristics and the resource requirements of running it.

Field	Notes
Family name	Qwen 2.5 Coder
Type	`text-generation`
Download site	https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
Version	0eb6b1ed2d0c4306bc637d09ecef51e59d3dfe05
Storage size	100GB
GPU count	1 GPU
Total GPU memory	24 GB
Per GPU memory	`N/A`

This section describes how to configure the runtime framework to support the inference calls.

Options	Notes
Runtime	Huggingface Transformer
Distributed Inference	False
Custom configurations	Precision: BF16. Can run on one machine with 24 GB of GPU memory.