如何创建自定义YAML配置的评估器?
用户创建自定义YAML配置 (Custom YAML configuration) 类型的评估器时,用户将拥有更加充分的自由度来定义评估过程中使用的提示语以及评价形式。以下是一个比较完整的YAML配置示例:
prompt: |-
You are comparing a submitted answer to an expert answer on a given question. Here is the data:
[BEGIN DATA]
************
[Question]: {input}
************
[Expert]: {ideal}
************
[Submission]: {completion}
************
[END DATA]
Compare the factual content of the submitted answer with the expert answer. Ignore any differences in style, grammar, or punctuation.
The submitted answer may either be a subset or superset of the expert answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
(A) The submitted answer is a subset of the expert answer and is fully consistent with it.
(B) The submitted answer is a superset of the expert answer and is fully consistent with it.
(C) The submitted answer contains all the same details as the expert answer.
(D) There is a disagreement between the submitted answer and the expert answer.
(E) The answers differ, but these differences don't matter from the perspective of factuality.
eval_type: cot_classify
choice_strings:
- "A"
- "B"
- "C"
- "D"
- "E"
choice_scores:
"A": 0.8
"B": 0.8
"C": 0.8
"D": 0.0
"E": 0.5
threshold: 0.5
reverse_score: 0
answer_prompt: ""
支持的属性包括:
prompt
: 对话模版内容,其中可使用占位符eval_type
: 推理方式,取值可以是cot_classify, classify_cot, classify其中之一。 以下是这三种推理方式的所代表的推理提示语:
# e.g. "Yes"
"classify": "Answer the question by printing only a single choice from {choices} (without quotes or punctuation) corresponding to the correct answer with no other text."
# e.g. "Yes\n The reasons are: ..."
"classify_cot": "First, answer by printing a single choice from {choices} (without quotes or punctuation) corresponding to the correct answer. Then, from the next line, explain your reasonings step by step."
# e.g. "Let's think step by step. ...\nYes"
"cot_classify": """
First, write out in a step by step manner your reasoning to be sure that your conclusion is correct. Avoid simply stating the correct answer at the outset. Then print only a single choice from {choices} (without quotes or punctuation) on its own line corresponding to the correct answer. At the end, repeat just the answer by itself on a new line.
其中,{choices}是一个占位符,会被YAML评估配置中的 choice_strings 所替代。
choice_strings
: 作为等级选项的字符串列表或者字符串,用于表达评估结果的等级。choice_scores
: 等级到得分的转化表。以字典的形式表示等级到得分的映射关系。threshold
: 阈值,用于将得分转化为断言,需要结合reverse_score一起使用。如果reverse_score的值为0,则得分score大于或等于该阈值的样本将被视为评估通过;反之,则得分score小于该阈值的样本将被视为评估通过.reverse_score
: 反向计分,取值为0或1,默认为0,需要结合threshold一起使用。answer_prompt
: 答案提示语,优先级高于eval_type,如果设置且不为空,则会覆盖eval_type所代表的推理方式
在生成最终在运行中使用的评估提示语时,prompt中的以下占位符将被评估样本中的具体内容所替代:
{input}
: 提示语{ideal}
: 理想答案{completion}
: 生成的答案{context}
: 背景信息
通过自定义YAML配置让用户可以即使没有编程经验,也可以自由创建满足复杂评估场景的评估提示语。在实际使用时,可以参考创建评估器时提供的示例。