How to create a local expert LLM
If you want to create your own subject matter expert to solve your use case, then training your own local LLM is your best bet. Follow the below steps to begin training your own LLM that lives locally, on your device.
LLM Dataset Generation
Before you can generate your dataset, make sure you have your documents you want to use ready. Gather all relevant documents you want your model to be built off of. These documents must be in one folder and in the following formats - PDFs, text, and docx.
- Start by creating a new Canvas. Then open the Elements Drawer and drag the LLM Dataset Generator element onto the Canvas.
- Open the LLM Dataset Generator Element settings and adjust the following settings:
- Topic: This can be anything you would like.
- References folder path: Using the “Select Directory” button, choose the folder where your documents are located.
- Output folder path: Using the “Select Directory” button, choose the folder where you would like to save the output of the dataset generation
- Dataset size: Add the number of topics you want your dataset to train with.
Note: We recommend starting with 5 for testing and getting familiar with the process of dataset generation. This generates a list of five topics and is quicker for training, but it will not produce as accurate of a model as a larger dataset size. The higher the dataset size, the more accurate your dataset and trained model will be. However, the larger the dataset size, the longer it will take to generate your dataset. It can take several hours to generate large dataset, so be patient.
- Next, enter your GPT, Claude, and Gemini API keys.
- Now you can now hit run. Dependencies will be installed the first time this flow is run, so it may take a while for them to install.
LLM Model Training
Now that you have generated your LLM Dataset, you can train your LLM Model.
- Start by creating a new Canvas. Drag the LLM Trainer Element onto the Canvas.
- Open the LLM Trainer Element settings and make the following adjustments
- Dataset Folder Path: Using the “Select Directory” button, choose the folder where you saved your LLM dataset during LLM Dataset Generation.
- Artifact Save Path: Using the “Select Directory” button, choose the folder where you would like to save your trained adapter.
- Base Model Assets Path: Using the “Select Directory” button, choose the folder where you would like to save your base model.
- Evaluator API Key: Add a Groq, OpenAI, Claude, or Gemini API key to enable the Faithfulness and Relevancy benchmarks in your training metrics. If you need a free API key, you can generate one for Groq here.
- Batch Size: 4 is recommended for testing
- Leave all other settings as the default.
- You can now hit run. This process may take a while, so be patient.
LLM Model Inference
You have generated your LLM Dataset and you have trained your LLM Model. Now we can use our LLM Inference Model and interact with our trained expert.
- Start by creating a new Canvas and drag the LLM Chat element onto the Canvas.
- Open the LLM Chat Element setting and and make the following adjustments:
- Max token: 256 is recommended for testing
- Model Storage Path: Using the “Select Directory” button, choose the folder where you saved your base model during LLM Model training.
- Model Adapter Folder Path: Using the “Select Directory” button, choose the same folder where you saved trained adapter during LLM model training
- Leave all other settings as the default
- Now drag the Prompt API and Response API elements to the canvas
- Connect the Prompt API to the LLM chat input. Then connect the output of the LLM chat to the Response API. The flow on your canvas should look like this.
- You can now hit run.
Note: Dependencies will be installed the first time this flow is run and may take a while.