Skip to main content

Using Models for Generation and Evaluation

Large language models can be used in both the generation and evaluation stages.

Using Models for Generation

When creating a run, if you choose the "Generate and Evaluate" run type, the first step is to select the large model for generation. You will see some preset shared models that can be used directly without configuring an API key, and running them will consume the balance in your account.

If you prefer to use your own private models for generation, EvalsOne also supports adding your models (requires a Builder or higher membership plan). We support most commonly used models and providers, such as OpenAI, Anthropic, Google Gemini, Mistral, Microsoft Azure, Ollama, etc. When using large models for generation, you can also set options such as generation temperature and rounds.

Using Models for Evaluation

Compared to traditional rule-based evaluation methods, using large models as evaluators to assess the generated results can significantly improve evaluation efficiency and flexibility.

If the evaluator you choose requires a large model, you can select which model to use for evaluation when creating the run. However, considering that the model's capability directly impacts the evaluation's effectiveness, it is recommended to use powerful large language models such as GPT-4 or Claude-3 as the evaluation model.