Title

Add phi-2 to KAITO supported model list.

Glossary

N/A

Model description: Launched during Microsoft Ignite last November, Phi-2 model is intended for QA, chat, and code purposes. With only 2.7 billion parameters, Phi-2 impressively surpasses the performance of Mistral and Llama-2 models at 7B and 13B on various aggregated benchmarks. Most notably it achieves better performance compare to the 25x larger llama-2-70B model on particular reasoning tasks like coding and math.
Model usage statistics: In the past month, phi-2 has garnered 535,163 downloads on Hugging Face, reflecting its widespread popularity.
Model license: phi-2 is distributed under the MIT license, ensuring broad usability and modification rights.

The following table describes the basic model characteristics and the resource requirements of running it.

Field	Notes
Family name	phi-2
Type	`text generation`
Download site	https://huggingface.co/microsoft/phi-2
Version	b10c3eba545ad279e7208ee3a5d644566f001670
Storage size	30GB
GPU count	1
Total GPU memory	12GB
Per GPU memory	`N/A`

This section describes how to configure the runtime framework to support the inference calls.

Options	Notes
Runtime	Huggingface Transformer
Distributed Inference	False
Custom configurations	Precision: FP16. Can run on one machine with total of 12GB of GPU Memory.