文章主题:数据集, ChatGPT, Embedbase API
在这个例子中,我们将要求GPT在用户提出问题时挑选合适的数据集进行搜索,然后回答用户的问题。
⚠️,否则你可以通过使用Embedbase Cloud而不是自己运行它来简化这个例子。
如果是这样,你可以跳到种子数据集部分。
安装
在虚拟环境中安装所需的依赖项:
virtualenv env
source env/bin/activate
pip install embedbase pgvector psycopg2 openai
启动Postgres作为一个Embedbase数据库
为Embedbase数据库运行一个Postgres实例。
docker run -d –name pgvector -p 8080:8080 -p 5432:5432 \
-e POSTGRES_DB=embedbase -e POSTGRES_PASSWORD=localdb \
-v data:/var/lib/postgresql/data ankane/pgvector
启动嵌入基地
创建一个新文件main.py,代码如下:
import os
from embedbase import get_app
from embedbase.database.postgres_db import Postgres
from embedbase.embedding.openai import OpenAI
import uvicorn
OPENAI_API_KEY = os.environ.get(“OPENAI_API_KEY”)
app = (
get_app()
.use_embedder(OpenAI(OPENAI_API_KEY))
.use_db(Postgres())
.run()
)
if __name__ == “__main__”:
uvicorn.run(“main:app”, reload=True)
用以下命令启动Embedbase应用程序:
python3 main.py
种子数据集
我们需要在Embedbase中添加一些数据来询问ChatGPT。
import json
import requests
import fire
# Set the Embedbase API URL
EMBEDBASE_API_URL = “http://localhost:8000”
# if using embedbase cloud, add your api key to the headers
# EMBEDBASE_API_KEY = “<your embedbase api key>”
def seed_dataset():
animals = {
“lion”: {“weight”: 190, “height”: 1.2, “speed”: 80},
“elephant”: {“weight”: 5000, “height”: 3.2, “speed”: 40},
“giraffe”: {“weight”: 800, “height”: 5.5, “speed”: 60},
“zebra”: {“weight”: 350, “height”: 1.5, “speed”: 60},
“rhinoceros”: {“weight”: 2300, “height”: 1.8, “speed”: 45},
“crocodile”: {“weight”: 1000, “height”: 4.5, “speed”: 20},
“hippopotamus”: {“weight”: 1500, “height”: 1.5, “speed”: 30},
“cheetah”: {“weight”: 60, “height”: 0.8, “speed”: 110},
“kangaroo”: {“weight”: 80, “height”: 1.5, “speed”: 56},
“penguin”: {“weight”: 30, “height”: 1.1, “speed”: 10},
}
cars = [
{“make”: “Toyota”, “model”: “Camry”, “year”: 2022},
{“make”: “Honda”, “model”: “Civic”, “year”: 2021},
{“make”: “Ford”, “model”: “F-150”, “year”: 2023},
{“make”: “Tesla”, “model”: “Model S”, “year”: 2022},
{“make”: “Chevrolet”, “model”: “Corvette”, “year”: 2021},
{“make”: “Jeep”, “model”: “Wrangler”, “year”: 2022},
{“make”: “BMW”, “model”: “X5”, “year”: 2023},
{“make”: “Mercedes-Benz”, “model”: “S-Class”, “year”: 2022},
{“make”: “Audi”, “model”: “A4”, “year”: 2021},
{“make”: “Lamborghini”, “model”: “Aventador”, “year”: 2022},
]
# clear the dataset just in case it already exists
requests.get(f”{EMBEDBASE_API_URL}/v1/animals/clear”,
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)
requests.get(f”{EMBEDBASE_API_URL}/v1/cars/clear”,
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)
# seed the animals dataset
requests.post(
f”{EMBEDBASE_API_URL}/v1/animals”,
json={“documents”: [{“data”: json.dumps(animal)} for animal in animals]},
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)
# seed the cars dataset
requests.post(
f”{EMBEDBASE_API_URL}/v1/cars”,
json={“documents”: [{“data”: json.dumps(car)} for car in cars]},
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)
if __name__ == “__main__”:
fire.Fire({
“seed”: seed_dataset,
})python3 ask.py seed
搜索
我们现在将创建我们应用程序的主要逻辑。我们将要求GPT在用户提出问题时挑选正确的数据集进行搜索。
这个过程将按原样进行:
1. 用户提出一个问题
2. GPT将查询`/datasets`以获得数据集的列表
3. GPT将用所选择的数据集和问题查询`/search`。
4. GPT将返回结果
import re
import os
import json
import requests
import openai
import fire
# Set the Embedbase API URL
EMBEDBASE_API_URL = “http://localhost:8000”
def get_datasets():
response = requests.get(
f”{EMBEDBASE_API_URL}/v1/datasets”,
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)
return [e[“dataset_id”] for e in response.json()[“datasets”]]
def search_dataset(dataset_id, query):
payload = {“query”: query, “top_k”: 3}
response = requests.post(
f”{EMBEDBASE_API_URL}/v1/{dataset_id}/search”, json=payload,
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)
return [e[“data”] for e in response.json()[“similarities”]]
上述代码将被用来查询Embedbase API。
# …
def ask_question(question, openai_model: str = “gpt-3.5-turbo”):
datasets = get_datasets()
# Prompt for GPT
prompt = f”Given the following datasets:\n”
for dataset in datasets:
prompt += f”- {dataset}\n”
prompt += f”\nChoose the best dataset to search and answer the following question:\n{question}“
# Call GPT
response = openai.ChatCompletion.create(
model=openai_model,
messages=[
{
“role”: “system”,
“content”: “You are a helpful assistant that select a dataset to search for a given question.”
“You always say ONLY the dataset name, nothing else. You are given a list of datasets and a question. “
“For example, if the list of datasets is – plants\n- animals\n- cars\n- fruits\n- vegetables\n”
“and the question is: What is the fastest animal?, you would say: [animals]”,
},
{“role”: “user”, “content”: prompt},
],
)
chosen_dataset = response.choices[0].message.content.strip()
print(f”GPT chose the dataset: {chosen_dataset}“)
# extract the dataset name from the output of GPT
# eg [animals] -> animals
chosen_dataset = re.sub(r”\[|\]”, “”, chosen_dataset)
search_results = search_dataset(chosen_dataset, question)
# Call GPT again to answer the question based on the search results
prompt = (
f”Based on the following search results, answer the question: {question}\n”
)
for result in search_results:
prompt += f”- {result}\n”
response = openai.ChatCompletion.create(
model=openai_model,
messages=[
{
“role”: “system”,
“content”: “You are a helpful assistant that answers questions based on the provided search results.”,
},
{“role”: “user”, “content”: prompt},
],
)
answer = response.choices[0].message.content.strip()
return answer
上述代码将被用来调用GPT来提问。现在添加一些小的逻辑,当用户提出问题时调用上述函数。
def main(openai_key: str = None, openai_model: str = “gpt-3.5-turbo”):
openai.api_key = openai_key or os.environ.get(“OPENAI_API_KEY”)
question = input(“Ask a question: “)
answer = ask_question(question, openai_model)
print(f”Answer: {answer}“)
if __name__ == “__main__”:
fire.Fire({
“ask”: main,
“seed”: seed_dataset,
})
现在你可以运行应用程序并提出问题。
python3 ask.py ask –openai_key <your-openai-key>
# feel free to add “–openai_model gpt-4” if you have access to it
AI时代,掌握AI大模型第一手资讯!AI时代不落人后!
免费ChatGPT问答,办公、写作、生活好得力助手!
扫码右边公众号,驾驭AI生产力!