Skip to main content

Understanding the Concept of Run

What is a Run?​

In EvalsOne, a "Run" refers to a complete evaluation process of one or more models and prompts. Through a run, you can test and evaluate models and prompts to optimize the generated results. Each run generates a detailed report, which includes the results and analysis of the evaluation.

Benefits of a Run​

  • Efficient Evaluation: The automated run process allows for batch operations, significantly increasing evaluation efficiency and saving manual operation time.
  • Detailed Analysis: Each run generates comprehensive reports that help you gain in-depth insights into the performance and issues of generative AI.
  • Continuous Optimization: By conducting multiple runs and comparisons, you can gradually optimize models and prompts, improving the quality of generative AI applications through a systematic process.

Run Hierarchy​

Each "Run" in EvalsOne is independent but also has a hierarchical structure:

  1. Root Run (R0): An evaluation run created from scratch is referred to as a Root Run, corresponding to level R0.
  2. Fork Runs (L1~L4): Based on existing runs, you can quickly create fork runs. Each fork allows switching individual settings such as template versions, generation models, and evaluation metrics. This is suitable for continuous improvement, comprehensive consideration, comparative analysis, and optimization.

In the run list, you can browse all runs in Flat View or Tree View and view the hierarchical relationships between runs.

Run List

Steps of a Run​

  1. Select Model: Choose one or more models from those supported by EvalsOne, which can be cloud-based or locally deployed models.
  2. Prepare Data: EvalsOne supports various ways to prepare evaluation sample data, including manual addition, batch import, quick synthesis using templates and variable value lists, and automatic extension of variable value lists using LLM.
  3. Set Evaluation Metrics: EvalsOne comes with multiple industry-leading evaluation metrics and supports custom metrics, catering to various use cases from simple to complex.
  4. Start Run: After configuration, start the run. EvalsOne will automatically perform the evaluation and generate detailed result reports.
  5. View Report: Once the run is complete, you can view the generated report, which includes detailed scores and visualizations of each metric.

By understanding the concept of a run and mastering its operation, you can more efficiently evaluate and optimize generative AI applications, ensuring their superior performance in practical use.