In this article, I’ll go over how to use the LLM (or GPT) capabilities of taubyte-llama-satellite. I’m assuming that you already know how to create a project and a function on a Taubyte-based Cloud Computing Network. If not, please refer to Taubyte’s Documentation.
A Taubyte Cloud can provide LLM capabilities through what we call a Satellite. In this case it’s taubyte-llama-satellite which exports llama.cpp capabilities to the Taubyte Virtual Machine, thus giving Serverless Functions (or DFunctions, as per Taubyte’s terminology) LLM features.
Satellites export low-level functions that aren’t very intuitive to use directly. Fortunately, it’s possible to address that with a user-friendly SDK. The Go SDK for taubyte-llama-satellite](https://github.com/samyfodil/taubyte-llama-satellite) can be found here.
Before proceeding, let’s ensure you have a project and a DFunction ready to go. If not, please refer to “Create a Function”.
A good practice is to clone your code locally using git or the tau command-line. Make sure you have Go installed, then run:
go get github.com/samyfodil/taubyte-llama-satellite
If you followed the steps from Taubyte’s Documentation, your basic function should look something like this:
package lib
import (
"github.com/taubyte/go-sdk/event"
)
//export ping
func ping(e event.Event) uint32 {
h, err := e.HTTP()
if err != nil {
return 1
}
h.Write([]byte("PONG"))
return 0
}
Let’s modify it so it uses the POST body as the prompt. Note: I’ve changed the function’s name to predict
. Ensure this change is reflected in your configuration by setting the entry point
to predict
and modifying the method
from GET
to POST
.
package lib
import (
"github.com/taubyte/go-sdk/event"
"io"
)
//export predict
func predict(e event.Event) uint32 {
h, err := e.HTTP()
if err != nil {
return 1
}
defer h.Body().Close()
prompt, err := io.ReadAll(h.Body())
if err != nil {
panic(err)
}
return 0
}
The LLAMA SDK exports two main methods, Predict
and Next
. Let’s start by creating a prediction:
package lib
import (
"github.com/taubyte/go-sdk/event"
"github.com/samyfodil/taubyte-llama-satellite/sdk"
"io"
)
//export predict
func predict(e event.Event) uint32 {
h, err := e.HTTP()
if err != nil {
return 1
}
defer h.Body().Close()
prompt, err := io.ReadAll(h.Body())
if err != nil {
panic(err)
}
p, err := sdk.Predict(
string(prompt),
)
if err != nil {
panic(err)
}
return 0
}
This code will submit a request for a prediction to the satellite, which will put it in a queue because predictions are resource-intensive (especially on the GPU), and return a prediction.
Just like when interacting with any LLM, you can customize the request like so:
p, err := sdk.Predict(
string(prompt),
sdk.WithTopK(90),
sdk.WithTopP(0.86),
sdk.WithBatch(512),
)
You can find all possible options here.
After submitting a prediction to the satellite, you need to collect tokens. You can do so by calling p.Next()
, which will block until a new token is available or the prediction is completed or canceled. Note that you can use NextWithTimeout
if you’d like to set a deadline.
Now, let’s wrap up our function:
package lib
import (
"github.com/taubyte/go-sdk/event"
"github.com/samyf
odil/taubyte-llama-satellite/sdk"
"io"
)
//export predict
func predict(e event.Event) uint32 {
h, err := e.HTTP()
if err != nil {
return 1
}
defer h.Body().Close()
prompt, err := io.ReadAll(h.Body())
if err != nil {
panic(err)
}
p, err := sdk.Predict(
string(prompt),
sdk.WithTopK(90),
sdk.WithTopP(0.86),
sdk.WithBatch(512),
)
if err != nil {
panic(err)
}
for {
token, err := p.Next()
if err == io.EOF {
break
} else if err != nil {
panic(err)
}
h.Write([]byte(token))
h.Flush() //flush
}
return 0
}
The call to h.Flush()
will send the token to the client (browser) immediately. If you’d like to recreate the AI typing experience provided by ChatGPT, you can use something like:
await axios({
method: "post",
data: prompt,
url: "<URL>",
onDownloadProgress: (progressEvent) => {
const chunk = progressEvent.currentTarget.responseText;
gotToken(chunk);
},
})
In this guide, we’ve walked through how to leverage the LLM (or GPT) capabilities provided by 8ws on a Taubyte-based Cloud Computing Network. We’ve explored the concept of a LLAMA Satellite and its role in exporting LLM capabilities to the Taubyte Virtual Machine. Furthermore, we’ve discussed the importance and functionality of the LLAMA SDK, which makes interacting with the Satellite’s low-level functions more intuitive.
We’ve gone through a practical example of how to use these tools in a Taubyte project, specifically demonstrating how to fetch tokens and use the Predict method. We’ve also shown how you can fine-tune your requests to the SDK and manage the tokens returned by the Satellite. By the end of the guide, you should be equipped to create a serverless function on Taubyte that can generate predictions from user-provided prompts, similar to how AI like ChatGPT works.
Harnessing the power of Taubyte and the LLAMA Satellite, you’re now ready to incorporate large language model capabilities into your projects, bringing a new level of interactivity and AI-driven responses to your applications.
If you’d like to see these tools in action, check out chat.keenl.ink, a practical implementation of the principles outlined in this guide. It’s a great demonstration of the interactive possibilities these technologies provide. Happy coding!