Background image

Author: GPT4All Team

Introducing Nomic GPT4All v3.5: Faster LLM Generation And Improved Chat Customization

GPT4All

We're thrilled to announce the release of Nomic GPT4All v3.5.0, featuring some important upgrades to both user experience and GPT4All core infrastructure.

Chat Editing without Sacrificing Latency

A key feature in v3.5 is Chat Editing, giving you precise control over your conversations. This capability allows you to edit any message in your chat history to clarify ambiguous questions or refine unclear responses. For example, prompting with "Tell me about the llama family of llms" in this example gives information about the animals. So we just change the prompt to "Tell me about the Llama family of LLMs" to guide the chat toward Meta's language models instead.

Chat editing lets you explore different conversation paths by editing earlier messages, without needing to start over from the beginning. You can try variations of your message mid-conversation to guide the chat in new directions while keeping the existing chat context intact. To implement chat editing without sacrificing latency, GPT4All performs live surgery on the KV cache with every edit!

Step 1: Click edit on a message

Step 1: Click edit on a message

Step 2: Enter your new message

Step 2: Enter your new message

Step 3: See your updated chat

Step 3: See your updated chat

Faster Time-To-First-Token With Chat Prefix Caching

Prefix caching is a technique that brings a speed upgrade to LLM chats for any model you use in GPT4All. It's a sampling technique that lets models use pre-computed calculations stored in its KV cache instead of re-computing them, which can significantly speed up the time-to-first-token of an LLM generation.

Nothing is required for users to do anything differently to get the benefits of prefix caching: it's an improvement we've implemented to the behind the scenes sampling of LLM generation for all models running in GPT4All!

New Data Integrations

As of this version, you now have ability to attach .txt, .md (markdown), and .rst (reStructuredText) files to your chats.

Previously, you could only work with these file types with GPT4All LocalDocs collections. Now you are able to attach them directly to your chats!

Foundational System and Prompt Templating Updates: Jinja Templating

We've added Jinja templating support for chat templates, which standardizes how GPT4All formats chats and makes customization simpler for developers. You can learn more about how to work with and customize these templates in our documentation.

Jinja templating enables broader compatibility with models found on Huggingface and lays the foundation for agentic, tool calling support.

Incoming: Tool Calling and Agentic Workflows

This release lays the groundwork for an exciting future feature: comprehensive tool calling support. The new templating infrastructure will enable seamless integration with external tools, direct data fetching capabilities, and safe code execution within your private LLM chats.

nomic logo
nomic logonomic logo nomic logo nomic logonomic logonomic logo nomic logo nomic logo
“Henceforth, it is the map that precedes the territory” – Jean Baudrillard