Multimodal InteractionSupports text, audio, video and other interaction methods for natural human-machine dialogue
Modular ArchitectureHighly modular design with flexible ASR, LLM, TTS, and Avatar component replacement
Diverse Avatar OptionsSupports LiteAvatar, LAM, MuseTalk, FlashHead and other digital human technologies
Low LatencyOptimized through VAD detection, audio buffering, and frame rate control with ~2.2s average response time