mirror of https://github.com/fauxpilot/fauxpilot.git synced 2025-03-12 04:36:10 -07:00

Go to file

Brendan Dolan-Gavitt 02f7887f17 Support newer upstream Triton

The main change is that the model config format has changed. To
deal with this we have a new script in the converter that will
upgrade a model to the new version.

Aside from that, we also no longer need to maintain our own fork
of Triton since they have fixed the bug with GPT-J models. This
should make it a lot easier to stay synced with upstream (although
we still have to build our own container since there doesn't seem
to be a prebuilt Triton+FT container hosted by NVIDIA).

Newer Triton should let us use some nice features:

- Support for more models, like GPT-NeoX
- Streaming token support (this still needs to be implemented in
  the proxy though)
- Dynamic batching

Still TODO:

- Proxy support for streaming tokens
- Add stuff to setup.sh and launch.sh to detect if a model upgrade
  is needed and do it automatically.

2023-02-13 16:30:28 -05:00

.github

GitHub: added PULL_REQEUEST_TEMPLATE for PR maintenance

2023-01-02 15:25:35 +09:00

converter

Support newer upstream Triton

2023-02-13 16:30:28 -05:00

copilot_proxy

Support newer upstream Triton

2023-02-13 16:30:28 -05:00

documentation

Add GitLab VS Code extension

2022-10-19 14:37:06 +02:00

img

doc: added logo image of FauxPilot

2022-11-04 14:39:00 +09:00

python_backend

Fix segfault issue

2022-11-19 18:32:50 +00:00

tests/python_backend

Some minor ergonomic changes for python backend

2023-01-02 18:54:51 +05:30

.dockerignore

Update .dockerignore

2022-11-26 22:14:50 +08:00

.editorconfig

Create .editorconfig

2022-10-23 05:03:33 +08:00

.gitignore

Ignore huggingface cache

2022-11-26 22:14:02 +08:00

docker-compose.yaml

Fix #119

2022-11-25 00:19:22 +08:00

Dockerfile

Support newer upstream Triton

2023-02-13 16:30:28 -05:00

launch.sh

Small fixes to launch.sh option parsing

2022-12-20 17:28:28 -05:00

LICENSE

add missing license

2022-08-03 09:42:32 -04:00

README.md

doc: added logo image of FauxPilot

2022-11-04 14:39:00 +09:00

setup.sh

fix: improved the docker compose statement to handle various docker env

2023-01-03 11:28:34 +09:00

shutdown.sh

Now that launch.sh runs in the background, add shutdown.sh to stop the server

2022-10-19 17:37:59 -04:00

README.md

FauxPilot

This is an attempt to build a locally hosted version of GitHub Copilot. It uses the SalesForce CodeGen models inside of NVIDIA's Triton Inference Server with the FasterTransformer backend.

Prerequisites

You'll need:

Docker
docker compose >= 1.28
An NVIDIA GPU with Compute Capability >= 6.0 and enough VRAM to run the model you want.
nvidia-docker
curl and zstd for downloading and unpacking the models.

Note that the VRAM requirements listed by setup.sh are total -- if you have multiple GPUs, you can split the model across them. So, if you have two NVIDIA RTX 3080 GPUs, you should be able to run the 6B model by putting half on each GPU.

Support and Warranty

lmao

Okay, fine, we now have some minimal information on the wiki and a discussion forum where you can ask questions. Still no formal support or warranty though!

Setup

This section describes how to install a Fauxpilot server and clients.

Setting up a FauxPilot Server

Run the setup script to choose a model to use. This will download the model from Huggingface/Moyix in GPT-J format and then convert it for use with FasterTransformer.

Please refer to How to set-up a FauxPilot server.

Client configuration for FauxPilot

We offer some ways to connect to FauxPilot Server. For example, you can create a client by how to open the Openai API, Copilot Plugin, REST API.

Please refer to How to set-up a client.

Terminology

API: Application Programming Interface
CC: Compute Capability
CUDA: Compute Unified Device Architecture
FT: Faster Transformer
JSON: JavaScript Object Notation
gRPC: Remote Procedure call by Google
GPT-J: A transformer model trained using Ben Wang's Mesh Transformer JAX
REST: REpresentational State Transfer