New Features and Improvements - More Model Support and Many Practical Enhancements

March 14, 2024 · 2 min read

Yue Zhang

founder of EvalsOne

EvalsOne will continue to strive for improvement and innovation, delivering an outstanding experience for AI model evaluation. We are thrilled to announce some major updates that bring an entirely new experience to our users.

Model Support Updates:

Added support for models on Amazon Bedrock and Groq platforms, expanding the range of models that can be evaluated.
Integration with Ollama, allowing you to evaluate local models via tunnels, breaking the geographical barriers of evaluation.
Expanded our Chinese model providers with 8 new options: Baidu, ChatGLM, Moonshot, Qwen, Baichuan, Xunfei, TianGong, and MiniMax. This provides more choices for evaluating Chinese models.

Feature Enhancements:

You can now export samples and variables, facilitating data archiving and sharing. Clone runs have more flexibility with multiple level cloning, catering to diverse scenarios.
When creating/cloning runs, you can customize temperature and maximum tokens, enabling more granular control.
Set maximum threads for private models, optimizing resource utilization.
Save conversation messages as templates for samples, streamlining the preparation for subsequent evaluations.
Manual evaluation enabled with scoring capability, providing convenience for subjective evaluations.
Added average completion time and Model Generation Stability Index (MGSI) as new benchmarks for reporting.
These updates provide users with more model options, better customization capabilities, and improved efficiency. If you have any questions, feel free to reach out to us. -

We are now inviting a limited number of seed users (over 200) to join our private beta testing and help us shape the future of LLM prompt evaluation. Don't hasitate to join us and experience the power of our advanced LLM evaluation platform firsthand. Sign up to join the waitlist of private beta testing at https://evalsone.com and start building better AI apps!