🔥如何让ChatGPT搜索Embedbase数据？一键召唤超能API！🌍

学会提问 2年前 (2023) lida

83 0 0

文章主题：数据集, ChatGPT, Embedbase API

666ChatGPT办公新姿势，助力做AI时代先行者！

在这个例子中，我们将要求GPT在用户提出问题时挑选合适的数据集进行搜索，然后回答用户的问题。

⚠️，否则你可以通过使用Embedbase Cloud而不是自己运行它来简化这个例子。

如果是这样，你可以跳到种子数据集部分。

安装

在虚拟环境中安装所需的依赖项：

virtualenv env
source env/bin/activate
pip install embedbase pgvector psycopg2 openai

启动Postgres作为一个Embedbase数据库

为Embedbase数据库运行一个Postgres实例。

docker run -d –name pgvector -p 8080:8080 -p 5432:5432 \
-e POSTGRES_DB=embedbase -e POSTGRES_PASSWORD=localdb \
-v data:/var/lib/postgresql/data ankane/pgvector

启动嵌入基地

创建一个新文件main.py，代码如下：

import os
from embedbase import get_app
from embedbase.database.postgres_db import Postgres
from embedbase.embedding.openai import OpenAI
import uvicorn
OPENAI_API_KEY = os.environ.get(“OPENAI_API_KEY”)
app = (
get_app()
.use_embedder(OpenAI(OPENAI_API_KEY))
.use_db(Postgres())
.run()
)
if __name__ == “__main__”:
uvicorn.run(“main:app”, reload=True)

用以下命令启动Embedbase应用程序：

python3 main.py

种子数据集

我们需要在Embedbase中添加一些数据来询问ChatGPT。

import json
import requests
import fire

# Set the Embedbase API URL
EMBEDBASE_API_URL = “http://localhost:8000”
# if using embedbase cloud, add your api key to the headers
# EMBEDBASE_API_KEY = “<your embedbase api key>”
def seed_dataset():
animals = {
“lion”: {“weight”: 190, “height”: 1.2, “speed”: 80},
“elephant”: {“weight”: 5000, “height”: 3.2, “speed”: 40},
“giraffe”: {“weight”: 800, “height”: 5.5, “speed”: 60},
“zebra”: {“weight”: 350, “height”: 1.5, “speed”: 60},
“rhinoceros”: {“weight”: 2300, “height”: 1.8, “speed”: 45},
“crocodile”: {“weight”: 1000, “height”: 4.5, “speed”: 20},
“hippopotamus”: {“weight”: 1500, “height”: 1.5, “speed”: 30},
“cheetah”: {“weight”: 60, “height”: 0.8, “speed”: 110},
“kangaroo”: {“weight”: 80, “height”: 1.5, “speed”: 56},
“penguin”: {“weight”: 30, “height”: 1.1, “speed”: 10},
}
cars = [
{“make”: “Toyota”, “model”: “Camry”, “year”: 2022},
{“make”: “Honda”, “model”: “Civic”, “year”: 2021},
{“make”: “Ford”, “model”: “F-150”, “year”: 2023},
{“make”: “Tesla”, “model”: “Model S”, “year”: 2022},
{“make”: “Chevrolet”, “model”: “Corvette”, “year”: 2021},
{“make”: “Jeep”, “model”: “Wrangler”, “year”: 2022},
{“make”: “BMW”, “model”: “X5”, “year”: 2023},
{“make”: “Mercedes-Benz”, “model”: “S-Class”, “year”: 2022},
{“make”: “Audi”, “model”: “A4”, “year”: 2021},
{“make”: “Lamborghini”, “model”: “Aventador”, “year”: 2022},
]

# clear the dataset just in case it already exists
requests.get(f”{EMBEDBASE_API_URL}/v1/animals/clear”,
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)
requests.get(f”{EMBEDBASE_API_URL}/v1/cars/clear”,
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)

# seed the animals dataset
requests.post(
f”{EMBEDBASE_API_URL}/v1/animals”,
json={“documents”: [{“data”: json.dumps(animal)} for animal in animals]},
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)

# seed the cars dataset
requests.post(
f”{EMBEDBASE_API_URL}/v1/cars”,
json={“documents”: [{“data”: json.dumps(car)} for car in cars]},
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)

if __name__ == “__main__”:
fire.Fire({
“seed”: seed_dataset,
})python3 ask.py seed

搜索

我们现在将创建我们应用程序的主要逻辑。我们将要求GPT在用户提出问题时挑选正确的数据集进行搜索。

这个过程将按原样进行：

1. 用户提出一个问题

2. GPT将查询`/datasets`以获得数据集的列表

3. GPT将用所选择的数据集和问题查询`/search`。

4. GPT将返回结果

import re
import os
import json
import requests
import openai
import fire

# Set the Embedbase API URL
EMBEDBASE_API_URL = “http://localhost:8000”
def get_datasets():
response = requests.get(
f”{EMBEDBASE_API_URL}/v1/datasets”,
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)
return [e[“dataset_id”] for e in response.json()[“datasets”]]

def search_dataset(dataset_id, query):
payload = {“query”: query, “top_k”: 3}
response = requests.post(
f”{EMBEDBASE_API_URL}/v1/{dataset_id}/search”, json=payload,
# if using embedbase cloud, add your api key to the headers
# headers={
# “Authorization”: “Bearer ” + EMBEDBASE_API_KEY,
# },
)
return [e[“data”] for e in response.json()[“similarities”]]

上述代码将被用来查询Embedbase API。

# …
def ask_question(question, openai_model: str = “gpt-3.5-turbo”):
datasets = get_datasets()

# Prompt for GPT
prompt = f”Given the following datasets:\n”
for dataset in datasets:
prompt += f”- {dataset}\n”
prompt += f”\nChoose the best dataset to search and answer the following question:\n{question}“
# Call GPT
response = openai.ChatCompletion.create(
model=openai_model,
messages=[
{
“role”: “system”,
“content”: “You are a helpful assistant that select a dataset to search for a given question.”
“You always say ONLY the dataset name, nothing else. You are given a list of datasets and a question. “
“For example, if the list of datasets is – plants\n- animals\n- cars\n- fruits\n- vegetables\n”
“and the question is: What is the fastest animal?, you would say: [animals]”,
},
{“role”: “user”, “content”: prompt},
],
)

chosen_dataset = response.choices[0].message.content.strip()
print(f”GPT chose the dataset: {chosen_dataset}“)

# extract the dataset name from the output of GPT
# eg [animals] -> animals
chosen_dataset = re.sub(r”\[|\]”, “”, chosen_dataset)
search_results = search_dataset(chosen_dataset, question)

# Call GPT again to answer the question based on the search results
prompt = (
f”Based on the following search results, answer the question: {question}\n”
)
for result in search_results:
prompt += f”- {result}\n”

response = openai.ChatCompletion.create(
model=openai_model,
messages=[
{
“role”: “system”,
“content”: “You are a helpful assistant that answers questions based on the provided search results.”,
},
{“role”: “user”, “content”: prompt},
],
)

answer = response.choices[0].message.content.strip()

return answer

上述代码将被用来调用GPT来提问。现在添加一些小的逻辑，当用户提出问题时调用上述函数。

def main(openai_key: str = None, openai_model: str = “gpt-3.5-turbo”):
openai.api_key = openai_key or os.environ.get(“OPENAI_API_KEY”)
question = input(“Ask a question: “)
answer = ask_question(question, openai_model)
print(f”Answer: {answer}“)
if __name__ == “__main__”:
fire.Fire({
“ask”: main,
“seed”: seed_dataset,
})