# 语料样本库 corpus = [ '花呗更改绑定银行卡', '我什么时候开通了花呗', 'A man is eating food.', 'A man is eating a piece of bread.', 'The girl is carrying a baby.', 'A man is riding a horse.', 'A woman is playing violin.', 'Two men pushed carts through the woods.', 'A man is riding a white horse on an enclosed ground.', 'A monkey is playing drums.', 'A cheetah is running behind its prey.' ]
queries = [ '如何更换花呗绑定银行卡', 'A man is eating pasta.', 'Someone in a gorilla costume is playing a set of drums.', 'A cheetah chases prey on across a field.']
随后我们对其进行遍历以得到回答:
1 2 3 4 5 6 7 8 9
for query in queries: query_embedding = embedder.encode(query) hits = semantic_search(query_embedding, corpus_embeddings, top_k=3) print("\n\n======================\n\n") print("Query:", query) print("\n语料中最相似的三个回答:") hits = hits[0] for hit in hits: print(corpus[hit['corpus_id']], "(Score: {:.4f})".format(hit['score']))
语料中最相似的三个回答: 花呗更改绑定银行卡 (Score: 0.8551) 我什么时候开通了花呗 (Score: 0.7212) A man is eating food. (Score: 0.3118) ====================== Query: A man is eating pasta.
语料中最相似的三个回答: A man is eating food. (Score: 0.7840) A man is riding a white horse on an enclosed ground. (Score: 0.6906) A man is eating a piece of bread. (Score: 0.6831) ====================== Query: Someone in a gorilla costume is playing a set of drums.
语料中最相似的三个回答: A monkey is playing drums. (Score: 0.6758) A man is riding a white horse on an enclosed ground. (Score: 0.6351) The girl is carrying a baby. (Score: 0.5438) ====================== Query: A cheetah chases prey on across a field.
语料中最相似的三个回答: A cheetah is running behind its prey. (Score: 0.6736) A man is riding a white horse on an enclosed ground. (Score: 0.5731) A monkey is playing drums. (Score: 0.4977)
>> 'Python是一种高级编程语言,由Guido van Rossum于1989年开发。它具有简洁、易读、易学的特点,被广泛应用于软件开发、数据分析、人工智能等领域。Python具有丰富的标准库和第三方库,可以用于开发各种类型的应用程序。它支持面向对象编程、函数式编程和过程式编程等多种编程范式。Python的语法简洁明了,代码可读性强,因此被称为“优雅的编程语言”。'
from langchain.embeddings import SentenceTransformerEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import Chroma from langchain.document_loaders import TextLoader
# 将语料嵌入后存入向量数据库 embeddings = SentenceTransformerEmbeddings() db = Chroma.from_documents(docs, embeddings)
首先,我们可以看一下简单的相似度搜索会得到什么样的回答:
1 2 3 4 5
query = "What did the president say about Ketanji Brown Jackson" docs = db.similarity_search(query) print(docs[0].page_content)
>> Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
>> (Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': 'state_of_the_union.txt'}), 1.2032095193862915)
现在,我们引入大语言模型,让它理解问题和原文内容,并包装出合理的回答。
1 2 3 4 5 6 7 8 9 10 11
from langchain.prompts import ChatPromptTemplate
template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. Question: {question} Context: {context} Answer: """ prompt = ChatPromptTemplate.from_template(template)