llamacpp Server
Quick Start Guide
Note: AI_MODEL should stay default
unless there is a folder in prompts
specific to the model that you're using. You can also create one and add your own prompts.
Run the llama.cpp server
Follow the instructions for setting up llama.cpp server from their repository.
Update your agent settings
- Make sure your model is placed in the folder
models/
- Create a new agent
- Set
AI_PROVIDER
tollamacpp
. - Set
AI_PROVIDER_URI
to the URI of your llama.cpp server. For example, if you're running the server locally, set this tohttp://localhost:8000
. - Set the following parameters as needed:
MAX_TOKENS
: This should be the maximum number of tokens that the model can generate. Default value is 2000.AI_TEMPERATURE
: Controls the randomness of the model's outputs. A higher value produces more random outputs and a lower value produces more deterministic outputs. Default value is 0.7.AI_MODEL
: Set this to the type of AI model to use. Default value is 'default'.STOP_SEQUENCE
: This should be the sequence at which the model will stop generating more tokens. Default value is '</s>'.PROMPT_PREFIX
: The will prefix any prompt sent the the model so that it can generate outputs properly.PROMPT_SUFFIX
: The will suffix any prompt sent the the model so that it can generate outputs properly.
There are other configurable settings that match the ones given from the llama.cpp server.