Setting Evaluators
EvalsOne supports both automated and manual evaluations.
Automated Evaluation
Automated evaluation refers to the assessment of generated results based on specific evaluation algorithms/rules or using automated tools such as large language models. The advantage of automated evaluation is its high efficiency and low cost. However, for some complex evaluation objectives, the effectiveness of automated evaluation may not be as good as manual evaluation.
In automated evaluation, you can set appropriate evaluators according to the purpose and needs of the evaluation. Evaluators are the standards for assessing the quality of generated results and are the basis for evaluation. Different evaluators correspond to different evaluation objectives. Setting appropriate evaluation metrics is crucial for the effectiveness of the final evaluation.
We provide some preset evaluators for common evaluation scenarios. Users can also add custom evaluators as needed to meet more personalized evaluation requirements.
Evaluators can be divided into the following categories based on evaluation methods:
- Rule-based evaluators
- Large language model prompt-based evaluators
- Other model-based evaluators (e.g., embedding models)
The evaluation results from each type of evaluator can be classified into three forms:
- Grades, such as A/B/C/D/E levels
- Scores, such as any score between 0 and 1
- Assertions, classified as Pass or Fail
Manual Evaluation
Manual evaluation refers to the assessment of generated results by human evaluators. The advantage of manual evaluation is that it can better utilize the expertise and judgment of experts for a more detailed and comprehensive evaluation of the generated results. However, manual evaluation is costly and less efficient.
When creating a manual evaluation, you need to specify whether the evaluation form is a score or an assertion. A score refers to the evaluator rating the generated results based on preset evaluation standards, with a score range of any integer between 0 and 10. An assertion refers to the evaluator judging the generated results based on preset evaluation standards, with the judgment result being Pass or Fail.
After all samples are evaluated, the system will generate an evaluation report based on the results. The report includes detailed scores for each metric and visual representations in the form of charts.