mirror of
https://github.com/fauxpilot/fauxpilot.git
synced 2025-03-12 04:36:10 -07:00
The main change is that the model config format has changed. To deal with this we have a new script in the converter that will upgrade a model to the new version. Aside from that, we also no longer need to maintain our own fork of Triton since they have fixed the bug with GPT-J models. This should make it a lot easier to stay synced with upstream (although we still have to build our own container since there doesn't seem to be a prebuilt Triton+FT container hosted by NVIDIA). Newer Triton should let us use some nice features: - Support for more models, like GPT-NeoX - Streaming token support (this still needs to be implemented in the proxy though) - Dynamic batching Still TODO: - Proxy support for streaming tokens - Add stuff to setup.sh and launch.sh to detect if a model upgrade is needed and do it automatically.
This section describes the Python scripts necessary for converting deep learning model files:
Dockerfile
: A Docker file used to construct an image based on Ubuntu 20.04 that includes the Transformer library.download_and_convert_model.sh
: A shell script that converts model codegen-6B-multi with the provided number of GPUs.codegen_gptj_convert.py
: A Python script for converting SalesForce CodeGen models to GPT-J (e.g., Salesforce/codegen-350M-multi).huggingface_gptj_convert.py
: A Python script for converting the HF model to the GPT-J format (e.g., GPTJForCausalLM model)
triton_config_gen.py
: A Python script that creates a config and weight file for running a Codgen model with Triton.config_template.pbtxt
: A template file for defining the config file's data format.