51Testing软件测试论坛

 找回密码
 (注-册)加入51Testing

QQ登录

只需一步,快速开始

微信登录,快人一步

手机号码,快捷登录

查看: 1585|回复: 0
打印 上一主题 下一主题

[原创] 点燃测试效率的秘密武器:AI辅助生成测试用例!

[复制链接]
  • TA的每日心情
    擦汗
    昨天 09:04
  • 签到天数: 1047 天

    连续签到: 5 天

    [LV.10]测试总司令

    跳转到指定楼层
    1#
    发表于 2024-1-8 13:13:42 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
    要写测试,我们要先有一个程序。为了避免这个题目本身就在AI的训练数据集里面,它直接知道答案。
    我们用一个有意思的小题目,也就是让Python根据我们输入的一个整数代表的天数,格式化成一段自然语言描述的时间。
    条件定义:1个星期是7天,1个月是30天,1年是365天。比如,输入1就返回1d,输入8就返回1w1d,输入32就返回1m2d,输入375就返回1y1w3d。

    需求
    1. <font size="3" face="微软雅黑">用Python写一个函数,进行时间格式化输出,条件定义为1个星期是7天,1个月是30天,1年是365天。比如:
    2. 输入  输出
    3. 1     1d
    4. 8     1w1d
    5. 61    2m1d
    6. 375   1y1w3d
    7. 要求仅需要格式化到年(?y?m?w?d),即可</font>
    复制代码
    我们直接让ChatGPT把程序写好如下:
    既然ChatGPT可以写代码,自然也可以让它帮我们把单元测试也写好,如下:
    这个测试用例覆盖的场景其实已经很全面了,既包含了基本的功能验证测试用例,也包含了一些异常的测试用例。
    基于Openai接口进行过程验证
    1、分解步骤写Prompts
    OpenAI的示例给出了很好的思路,那就是把问题拆分成多个步骤。
    把代码交给大语言模型,让大语言模型解释一下,这个代码是在干什么。
    把代码和代码的解释一起交给大语言模型,让大语言模型规划一下,针对这个代码逻辑,我们到底要写哪几个TestCase。如果数量太少,可以重复让AI多生成几个TestCase。
    针对TestCase的详细描述,再提交给大语言模型,让它根据这些描述生成具体的测试代码。对于生成的代码,我们还要进行一次语法检查,如果语法检查都没法通过,我们就让AI重新再生成一下。
    2、请AI解释要测试的代码
    1. <font size="3" face="微软雅黑">import openai


    2. def gpt35(prompt, model="text-davinci-002", temperature=0.4, max_tokens=1000,
    3.           top_p=1, stop=["\n\n", "\n\t\n", "\n    \n"]):
    4.     response = openai.Completion.create(
    5.         model=model,
    6.         prompt=prompt,
    7.         temperature=temperature,
    8.         max_tokens=max_tokens,
    9.         top_p=top_p,
    10.         stop=stop
    11.     )
    12.     message = response["choices"][0]["text"]
    13.     return message


    14. code = """
    15. def format_time(days):
    16.     years, days = divmod(days, 365)
    17.     months, days = divmod(days, 30)
    18.     weeks, days = divmod(days, 7)
    19.     time_str = ""
    20.     if years > 0:
    21.         time_str += str(years) + "y"
    22.     if months > 0:
    23.         time_str += str(months) + "m"
    24.     if weeks > 0:
    25.         time_str += str(weeks) + "w"
    26.     if days > 0:
    27.         time_str += str(days) + "d"
    28.     return time_str
    29. """


    30. def explain_code(function_to_test, unit_test_package="pytest"):
    31.     prompt = f""""# How to write great unit tests with {unit_test_package}

    32. In this advanced tutorial for experts, we'll use Python 3.8 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function.
    33. ```python
    34. {function_to_test}


    35. Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.
    36. - First,"""
    37.     response = gpt35(prompt)
    38.     return response, prompt


    39. code_explaination, prompt_to_explain_code = explain_code(code)
    40. print(code_explaination)</font>
    复制代码
    首先定义了一个gpt35的函数,这个函数的作用如下:
    使用 text-davinci-002 模型,这是一个通过监督学习微调的生成文本的模型,希望生成目标明确的文本代码解释。对 stop 做了特殊的设置,只要连续两个换行或者类似连续两个换行的情况出现,就中止数据的生成,避免模型一口气连测试代码也生成出来。
    然后,通过一组精心设计的提示语,让GPT模型为我们来解释代码。
    指定使用pytest的测试包。
    把对应的测试代码提供给GPT模型。
    让AI回答,要精确描述代码做了什么。
    最后用 “-First” 开头,引导GPT模型,逐步分行描述要测试的代码。

    输出结果:
    1. <font size="3" face="微软雅黑">
    2. the function takes an integer value representing days as its sole argument.
    3. - Next, the `divmod` function is used to calculate the number of years and days, the number of months and days, and the number of weeks and days.
    4. - Finally, a string is built up and returned that contains the number of years, months, weeks, and days.
    5. </font>
    复制代码
    3、让AI根据代码解释制定测试计划
    1. <font size="3" face="微软雅黑">def generate_a_test_plan(full_code_explaination, unit_test_package="pytest"):
    2.     prompt_to_explain_a_plan = f"""

    3. A good unit test suite should aim to:
    4. - Test the function's behavior for a wide range of possible inputs
    5. - Test edge cases that the author may not have foreseen
    6. - Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain
    7. - Be easy to read and understand, with clean code and descriptive names
    8. - Be deterministic, so that the tests always pass or fail in the same way

    9. `{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.

    10. For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):
    11. -"""
    12.     prompt = full_code_explaination+prompt_to_explain_a_plan
    13.     response = gpt35(prompt)
    14.     return response, prompt


    15. test_plan, prompt_to_get_test_plan = generate_a_test_plan(prompt_to_explain_code+code_explaination)
    16. print(test_plan)
    17. </font>
    复制代码
    针对生成的测试计划,对AI制定了几点要求:
    • 测试用例要覆盖更广的范围。
    • 测试用例的边界要涉及到作者无法想到的场景。
    • 充分利用pytest的特性。
    • 确保测试用例简洁、易理解。
    • 测试用例的结果是确定的,要么成功、要么失败。


    输出结果:
    1. <font size="3" face="微软雅黑">Normal inputs:
    2.     - `days` is a positive integer
    3.     - `days` is 0
    4. - Edge cases:
    5.     - `days` is a negative integer
    6.     - `days` is a float
    7.     - `days` is a string
    8. - Invalid inputs:
    9.     - `days` is `None`
    10.     - `days` is a list</font>
    复制代码
    4、根据测试计划生成测试代码
    1. <font size="3" face="微软雅黑">def generate_test_cases(function_to_test, unit_test_package="pytest"):
    2.     starter_comment = "Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator"
    3.     prompt_to_generate_the_unit_test = f"""

    4. Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.
    5. ```python
    6. import {unit_test_package}  # used for our unit tests

    7. {function_to_test}

    8. #{starter_comment}"""
    9.     full_unit_test_prompt = prompt_to_explain_code + code_explaination + test_plan + prompt_to_generate_the_unit_test
    10.     return gpt35(model="text-davinci-003", prompt=full_unit_test_prompt, stop="```"), prompt_to_generate_the_unit_test


    11. unit_test_response, prompt_to_generate_the_unit_test = generate_test_cases(code)
    12. print(unit_test_response)</font>
    复制代码
    输出结果:
    1. <font size="3" face="微软雅黑">@pytest.mark.parametrize("days, expected", [
    2.     (1, "1d"),  # normal input
    3.     (7, "1w"),  # normal input
    4.     (30, "1m"),  # normal input
    5.     (365, "1y"),  # normal input
    6.     (731, "2y"),  # normal input
    7.     (-1, pytest.raises(ValueError)),  # abnormal input
    8.     (0, pytest.raises(ValueError)),  # abnormal input
    9.     (1.5, pytest.raises(TypeError)),  # abnormal input
    10.     ("1", pytest.raises(TypeError)),  # abnormal input
    11. ])
    12. def test_format_time(days, expected):
    13.     """
    14.     Test the format_time() function.
    15.     """
    16.     if isinstance(expected, type):
    17.         # check that the expected result is a type, i.e. an exception
    18.         with pytest.raises(expected):
    19.             # if so, check that the function raises the expected exception
    20.             format_time(days)
    21.     else:
    22.         # otherwise, check that the function returns the expected value
    23.         assert format_time(days) == expected</font>
    复制代码
    5、通过AST库进行语法检查
    最后我们最好还是要再检查一下生成的测试代码语法,这个可以通过Python的AST库来完成。检查代码的时候,我们不仅需要生成的测试代码,也需要原来的功能代码,不然无法通过语法检查。
    1. <font size="3" face="微软雅黑">import ast

    2. code_start_index = prompt_to_generate_the_unit_test.find("```python\n") + len("```python\n")
    3. code_output = prompt_to_generate_the_unit_test[code_start_index:] + unit_test_response
    4. try:
    5.     ast.parse(code_output)
    6. except SyntaxError as e:
    7.     print(f"Syntax error in generated code: {e}")

    8. print(code_output)
    9. </font>
    复制代码
    输出结果:
    1. <font size="3" face="微软雅黑">import pytest  # used for our unit tests


    2. def format_time(days):
    3.     years, days = divmod(days, 365)
    4.     months, days = divmod(days, 30)
    5.     weeks, days = divmod(days, 7)
    6.     time_str = ""
    7.     if years > 0:
    8.         time_str += str(years) + "y"
    9.     if months > 0:
    10.         time_str += str(months) + "m"
    11.     if weeks > 0:
    12.         time_str += str(weeks) + "w"
    13.     if days > 0:
    14.         time_str += str(days) + "d"
    15.     return time_str


    16. #Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator.
    17. #The first element of the tuple is the name of the test case, and the second element is a list of arguments to pass to the function.
    18. #The @pytest.mark.parametrize decorator allows us to write a single test function that can be used to test multiple input values.
    19. @pytest.mark.parametrize("test_input,expected", [
    20.     ("Valid Inputs", [
    21.         (0, "0d"),  # test for 0 days
    22.         (1, "1d"),  # test for 1 day
    23.         (7, "7d"),  # test for 7 days
    24.         (30, "1m"),  # test for 30 days
    25.         (365, "1y"),  # test for 365 days
    26.         (400, "1y35d"),  # test for 400 days
    27.         (800, "2y160d"),  # test for 800 days
    28.         (3650, "10y"),  # test for 3650 days
    29.         (3651, "10y1d"),  # test for 3651 days
    30.     ]),
    31.     ("Invalid Inputs", [
    32.         ("string", None),  # test for string input
    33.         ([], None),  # test for list input
    34.         ((), None),  # test for tuple input
    35.         ({}, None),  # test for set input
    36.         ({1: 1}, None),  # test for dictionary input
    37.         (1.5, None),  # test for float input
    38.         (None, None),  # test for None input
    39.     ]),
    40.     ("Edge Cases", [
    41.         (10000000000, "274247y5m2w6d"),  # test for large positive integer
    42.         (1, "1d"),  # test for small positive integer
    43.         (-10000000000, "-274247y5m2w6d"),  # test for large negative integer
    44.         (-1, "-1d")  # test for small negative integer
    45.     ])
    46. ])
    47. def test_format_time(test_input, expected):
    48.     # This test function uses the @pytest.mark.parametrize decorator to loop through each test case.
    49.     # The test_input parameter contains the name of the test case, and the expected parameter contains a list of arguments to pass to the function.
    50.     # The test_input parameter is not used in the test, but is included for readability.
    51.     for days, expected_result in expected:
    52.         # For each argument in the expected parameter, we call the format_time() function and compare the result to the expected result.
    53.         assert format_time(days) == expected_result</font>
    复制代码
    从上面看到有些测试用例跟预期还是有差距的,比如:
    1. <font size="3" face="微软雅黑">@pytest.mark.parametrize("test_input,expected", [
    2.     ("Valid Inputs", [
    3.         (7, "7d" -> "1w"),  # test for 7 days
    4.         (30, "1m"),  # test for 30 days
    5.         (365, "1y"),  # test for 365 days
    6.         (400, "1y35d" -> "1y1m5d"),  # test for 400 days
    7.         (800, "2y160d" -> "2y5m1w3d"),  # test for 800 days
    8.         (3650, "10y"),  # test for 3650 days
    9.         (3651, "10y1d"),  # test for 3651 days
    10.     ]),
    11. </font>
    复制代码
    用LangChain进一步封装
    OpenAI 的大语言模型,只是提供了简简单单的 Completion 和 Embedding 这样两个核心接口,通过合理使用这两个接口,我们完成了各种各样复杂的任务。
    通过提示语(Prompt)里包含历史的聊天记录,我们能够让 AI 根据上下文正确地回答问题。
    通过将 Embedding 提前索引好存起来,我们能够让 AI 根据外部知识回答问题。
    而通过多轮对话,将 AI 返回的答案放在新的问题里,我们能够让 AI 帮我们给自己的代码撰写单元测试。
    llama-index 专注于为大语言模型的应用构建索引,虽然 Langchain 也有类似的功能,但这一点并不是 Langchain 的主要卖点。Langchain 的第一个卖点其实就在它的名字里,也就是链式调用。

    1、通过 Langchain 实现自动化撰写单元测试
    上面通过多步提示语自动给代码写单元测试。Langchain可以顺序地通过多个Prompt调用OpenAI的GPT模型,这个能力用来实现自动化测试的功能正好匹配。
    1. <font size="3" face="微软雅黑">from langchain import PromptTemplate, OpenAI, LLMChain
    2. from langchain.chains import SequentialChain
    3. import ast


    4. def write_unit_test(function_to_test, unit_test_package="pytest"):
    5.     # 解释源代码的步骤
    6.     explain_code = """"# How to write great unit tests with {unit_test_package}

    7.     In this advanced tutorial for experts, we'll use Python 3.8 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function.
    8.     ```python
    9.     {function_to_test}
    10.     ```

    11.     Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.
    12.     - First,"""

    13.     explain_code_template = PromptTemplate(
    14.         input_variables=["unit_test_package", "function_to_test"],
    15.         template=explain_code
    16.     )
    17.     explain_code_llm = OpenAI(model_name="text-davinci-002", temperature=0.4, max_tokens=1000,
    18.                               top_p=1, stop=["\n\n", "\n\t\n", "\n    \n"])
    19.     explain_code_step = LLMChain(llm=explain_code_llm, prompt=explain_code_template, output_key="code_explaination")

    20.     # 创建测试计划示例的步骤
    21.     test_plan = """

    22.     A good unit test suite should aim to:
    23.     - Test the function's behavior for a wide range of possible inputs
    24.     - Test edge cases that the author may not have foreseen
    25.     - Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain
    26.     - Be easy to read and understand, with clean code and descriptive names
    27.     - Be deterministic, so that the tests always pass or fail in the same way

    28.     `{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.

    29.     For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):
    30.     -"""

    31.     test_plan_template = PromptTemplate(
    32.         input_variables=["unit_test_package", "function_to_test", "code_explaination"],
    33.         template=explain_code+"{code_explaination}"+test_plan
    34.     )
    35.     test_plan_llm = OpenAI(model_name="text-davinci-002", temperature=0.4, max_tokens=1000,
    36.                            top_p=1, stop=["\n\n", "\n\t\n", "\n    \n"])
    37.     test_plan_step = LLMChain(llm=test_plan_llm, prompt=test_plan_template, output_key="test_plan")

    38.     # 撰写测试代码的步骤
    39.     starter_comment = "Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator"
    40.     prompt_to_generate_the_unit_test = """

    41. Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.
    42. ```python
    43. import {unit_test_package}  # used for our unit tests

    44. {function_to_test}

    45. #{starter_comment}"""

    46.     unit_test_template = PromptTemplate(
    47.         input_variables=["unit_test_package", "function_to_test", "code_explaination", "test_plan", "starter_comment"],
    48.         template=explain_code+"{code_explaination}"+test_plan+"{test_plan}"+prompt_to_generate_the_unit_test
    49.     )
    50.     unit_test_llm = OpenAI(model_name="text-davinci-002", temperature=0.4, max_tokens=1000, stop="```")
    51.     unit_test_step = LLMChain(llm=unit_test_llm, prompt=unit_test_template, output_key="unit_test")

    52.     sequential_chain = SequentialChain(chains=[explain_code_step, test_plan_step, unit_test_step],
    53.                                        input_variables=["unit_test_package", "function_to_test", "starter_comment"],
    54.                                        verbose=True)
    55.     answer = sequential_chain.run(unit_test_package=unit_test_package, function_to_test=function_to_test,
    56.                                   starter_comment=starter_comment)
    57.     return f"""#{starter_comment}"""+answer


    58. code = """
    59. def format_time(days):
    60.     years, days = divmod(days, 365)
    61.     months, days = divmod(days, 30)
    62.     weeks, days = divmod(days, 7)
    63.     time_str = ""
    64.     if years > 0:
    65.         time_str += str(years) + "y"
    66.     if months > 0:
    67.         time_str += str(months) + "m"
    68.     if weeks > 0:
    69.         time_str += str(weeks) + "w"
    70.     if days > 0:
    71.         time_str += str(days) + "d"
    72.     return time_str
    73. """


    74. def write_unit_test_automatically(code, retry=3):
    75.     unit_test_code = write_unit_test(code)
    76.     all_code = code+unit_test_code
    77.     tried = 0
    78.     while tried < retry:
    79.         try:
    80.             ast.parse(all_code)
    81.             return all_code
    82.         except SyntaxError as e:
    83.             print(f"Syntax error in generated code: {e}")
    84.             all_code = code+write_unit_test(code)
    85.             tried += 1


    86. print(write_unit_test_automatically(code))
    87. </font>
    复制代码
    输出:
    1. <font size="3" face="微软雅黑">def format_time(days):
    2.     years, days = divmod(days, 365)
    3.     months, days = divmod(days, 30)
    4.     weeks, days = divmod(days, 7)
    5.     time_str = ""
    6.     if years > 0:
    7.         time_str += str(years) + "y"
    8.     if months > 0:
    9.         time_str += str(months) + "m"
    10.     if weeks > 0:
    11.         time_str += str(weeks) + "w"
    12.     if days > 0:
    13.         time_str += str(days) + "d"
    14.     return time_str
    15. #Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator.
    16. #The first element of the tuple is the name of the test case, and the second element is a list of tuples.
    17. #Each tuple in the list of tuples represents an individual test.
    18. #The first element of each tuple is the input to the function (days), and the second element is the expected output of the function.

    19. @pytest.mark.parametrize('test_case_name, test_cases', [
    20.     # Test cases for when the days argument is a positive integer
    21.     ('positive_int', [
    22.         (1, '1d'),
    23.         (10, '10d'),
    24.         (100, '1y3m2w1d')
    25.     ]),

    26.     # Test cases for when the days argument is 0
    27.     ('zero', [
    28.         (0, '')
    29.     ]),

    30.     # Test cases for when the days argument is negative
    31.     ('negative_int', [
    32.         (-1, '-1d'),
    33.         (-10, '-10d'),
    34.         (-100, '-1y-3m-2w-1d')
    35.     ]),

    36.     # Test cases for when the days argument is not an integer
    37.     ('non_int', [
    38.         (1.5, pytest.raises(TypeError)),
    39.         ('1', pytest.raises(TypeError))
    40.     ])
    41. ])
    42. def test_format_time(days, expected_output):
    43.     # This test function is called once for each test case.
    44.     # days is set to the input for the function, and expected_output is set to the expected output of the function.
    45.     # We can use the pytest.raises context manager to test for exceptions.
    46.     if isinstance(expected_output, type) and issubclass(expected_output, Exception):
    47.         with pytest.raises(expected_output):
    48.             format_time(days)
    49.     else:
    50.         assert format_time(days) == expected_output</font>
    复制代码
    总结:
    想要通过大语言模型,完成一个复杂的任务,往往需要我们多次向 AI 提问,并且前面提问的答案,可能是后面问题输入的一部分。
    LangChain 通过将多个 LLMChain 组合成一个 SequantialChain 并顺序执行,大大简化了这类任务的开发工作。




    本帖子中包含更多资源

    您需要 登录 才可以下载或查看,没有帐号?(注-册)加入51Testing

    x
    分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
    收藏收藏
    回复

    使用道具 举报

    本版积分规则

    关闭

    站长推荐上一条 /1 下一条

    小黑屋|手机版|Archiver|51Testing软件测试网 ( 沪ICP备05003035号 关于我们

    GMT+8, 2024-11-16 09:08 , Processed in 0.063824 second(s), 23 queries .

    Powered by Discuz! X3.2

    © 2001-2024 Comsenz Inc.

    快速回复 返回顶部 返回列表