fauxpilot/Dockerfile at wip/newtriton - fauxpilot - gitea.threefifteen.info

github/fauxpilot

mirror of https://github.com/fauxpilot/fauxpilot.git synced 2025-03-12 04:36:10 -07:00

Brendan Dolan-Gavitt 02f7887f17 Support newer upstream Triton

The main change is that the model config format has changed. To
deal with this we have a new script in the converter that will
upgrade a model to the new version.

Aside from that, we also no longer need to maintain our own fork
of Triton since they have fixed the bug with GPT-J models. This
should make it a lot easier to stay synced with upstream (although
we still have to build our own container since there doesn't seem
to be a prebuilt Triton+FT container hosted by NVIDIA).

Newer Triton should let us use some nice features:

- Support for more models, like GPT-NeoX
- Streaming token support (this still needs to be implemented in
  the proxy though)
- Dynamic batching

Still TODO:

- Proxy support for streaming tokens
- Add stuff to setup.sh and launch.sh to detect if a model upgrade
  is needed and do it automatically.

2023-02-13 16:30:28 -05:00

6 lines

279 B

Docker

Raw Permalink Blame History

 FROM moyix/triton_with_ft:23.01
 # Install dependencies: torch
 RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116
 RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate