koboldcpp-1.93

@henk717

koboldcpp-1.93

those left behind

NEW: Added Windows Shell integration. You can now associate .gguf files to open automatically in KoboldCpp (e.g. double clicking a gguf). If another kcpp instance is already running locally on the same port, it will be replaced. The default handler can be installed/uninstalled from the 'Extras' tab (thanks @henk717)
- This is handled by the /api/extra/shutdown api, which can only be triggered from localhost.
- Will not affect instances started without --singleinstance flag. All this is automatic when you launch via windows shell integration.
NEW: Added an option to simply unload a model from the admin API, the server will free the memory but continue to run. You can then switch to a different model via the admin panel in Lite.
NEW: Added Save and Load States (sessions). This allows you to take a Savestate Snapshot of the current context, and then reload it again later at any time. Available over the admin API, you can trigger it from the admin panel in Lite.
- Works similarly to 'session files' in llama.cpp, but the snapshot states are stored entirely in memory.
- Used correctly, it can allow you to swap between multiple different sessions/chats without any reprocessing at all.
- There are 3 available slots to use (total 4 including the current session).
Fixed a regression with flash attention not working for some GPUs in the previous version.
Added a text LoRA scale option. Removed text LoRA base as it was no longer used in modern ggufs. If provided it will be silently ignored.
Function/Tool calling can now use higher temperatures (up to 1.0)
Added more Ollama compatibility endpoints.
Fixed a few clip skip issues in image generation.
Added an adapter flag add_sd_step_limit to limit max image generation step counts.
Fixed crash on thread count 0.
Match a few common openai tts voice ids
Fixed a ctx bug with embeddings (still does not work with qwen3 embed, but should work with most others)
KoboldCpp Colab now uses KoboldCpp's internal downloader instead of downloading the models first externally.
Updated Kobold Lite, multiple fixes and improvements
- Added support for embeddings models into KoboldAI Lite's TextDB (thanks @esolithe)
- Added support for saving and loading world info files independently (thanks @esolithe)
- NEW: Added new "Smart" Image Autogeneration mode. This allows the AI to decide when it should generate images, and create image prompt automatically.
- Added a new scenario: Replaced defunct aetherroom.club with prompts.forthisfeel.club
- Added support for importing cards from character-tavern.com
- Improved Tavern World Info support
- Added support for welcome messages in corpo mode.
- Fixed copy to clipboard not working for some browsers.
- Interactive Storywriter scenario fix: now no longer overwrites your regex settings. However, hiding input text is now off by default.
- Added a toggle to make a usermod permanent. Use with caution.
- Markdown fixes, also prevent your username from being overwritten when changing chat scenario.
Merged fixes and improvements from upstream

Important Breaking Changes (File Naming Change Notice):

Cuda 12 builds are being upgraded from cuda 12.1 to cuda 12.4. This has happened for windows builds and will happen soon for Linux builds.
If you are using cloud platforms that do not support cuda12.4, you can continue using the oldpc builds, which remain on cuda11 and avx1 and will continue to be maintained.
For improved clarity and ease of use, many binaries are being RENAMED.
Please observe the new name changes for your automated scripts to avoid disruption:
Linux:
- koboldcpp-linux-x64-cuda1210 is now koboldcpp-linux-x64 (Cuda12, AVX2, Newer PCs)
- koboldcpp-linux-x64-cuda1150 is now koboldcpp-linux-x64-oldpc (Cuda11, AVX1, Older PCs)
- koboldcpp-linux-x64-nocuda is still koboldcpp-linux-x64-nocuda (No CUDA)
Windows:
- koboldcpp_cu12.exe is now koboldcpp.exe (Cuda12, AVX2, Newer PCs)
- koboldcpp_oldcpu.exe is now koboldcpp-oldpc.exe (Cuda11, AVX1, Older PCs)
- koboldcpp_nocuda.exe is now koboldcpp-nocuda.exe (No CUDA)
If you are using our official URLs or docker images, this should be handled automatically, but ensure your docker image is up-to-date.
For now, both filenames are uploaded to avoid breaking existing scripts. The old filenames will be removed soon, so please update.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.

Deprecation Warning: The files named koboldcpp_cu12.exe, koboldcpp_oldcpu.exe, koboldcpp_nocuda.exe, koboldcpp-linux-x64-cuda1210, and koboldcpp-linux-x64-cuda1150 will be removed very soon. Please switch to the new filenames.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

koboldcpp-1.93

koboldcpp-1.93

Important Breaking Changes (File Naming Change Notice):

Contributors

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!