The primary purpose of this extension is to sandbox ideas.
See the :ref:WebExtensions AI API for reference <WebExtensions AI API>.
- npm install
- npm run watch : this will kick off webpack and set up the watcher
- npm run start : this will launch nightly with the extension installed
- Add openAI key into the code in extension options panel
This tutorial is intended to give a brief overview of creating your own extension and running it using the extension-hub repository. If you are already familiar with extensions and their working, the Installation steps above will get you what you need.
Clone the extension-hub repo in the usual way
git clone [email protected]:Firefox-AI/extension-hub.git
Install the necessary node dependencies
npm install
Build the package (Note: this may not be necessary if you are using npm run watch as suggested above. this command runs the build script automatically with every file change)
npm run build
Start the Nightly with the extension hub enabled. (Note: this will start the version of Nightly that you currently have installed. In order to get wllama to work, you need be to running Nightly 142.0a1 (2025-07-16) or higher).
npm run start
Inside src/background.ts you will see code that calls each of the default extension types (page_qa, page_summarization, tab_summarize, chat_message):
/**
* Event Listeners
*/
browser.runtime.onMessage.addListener(
async (message: { type: MessageTypesT; data: any }) => {
// TODO - probably need to diversify prompts based on type of request
const prompt = buildPrompt(message.data.prompt, message.data.textContent)
if (message.type === 'page_qa') {
const result = await getOpenAIResponse(prompt)
browser.runtime.sendMessage({
type: 'page_qa_result',
result: result,
})
}
if (message.type === 'page_summarize') {
const result = await getMlEngineAIResponse(prompt)
browser.runtime.sendMessage({
type: 'page_summarize_result',
result: result,
prompt: message.data.prompt,
url: message.data.url,
siteName: message.data.siteName,
})
}
if (message.type === 'tab_summarize') {
const result = await summarizeTabs(prompt)
browser.runtime.sendMessage({
type: 'tab_summarize_result',
result: result,
})
}
if (message.type === 'chat_message') {
const result = await getHuggingFaceChatResponse(message.data)
browser.runtime.sendMessage({
type: 'chat_message_result',
result: result,
})
}
}
)
Within each, you can see the function that is being called, along with the data being passed to it. For instance, note the result of page_qa is being called with getOpenAIRessponse():
if (message.type === 'page_qa') {
const result = await getOpenAIResponse(prompt)
browser.runtime.sendMessage({
type: 'page_qa_result',
result: result,
})
}
In this tutorial, we will update this call to use a custom getLocalPirateAIResponse function, which will perform page QA using in a pirate voice, using a local model service.
First - within src/services/mlEngine.ts add the following function:
export const getLocalPirateAIResponse = async (articleText: string, userQuestion: string) => {
try {
console.log("[getLocalPirateAIResponse] called with:", { articleText, userQuestion });
await ensureEngineIsReady()
const trial = (browser as unknown as mlBrowserT).trial
console.log("ENGINE READY!")
const chatInput = [
{
role: "system",
content:
`/no_think You are a friendly pirate, who is always willing to help users navigate the difficult seas of the internet. Please examine the following web page carefully and use the information found there to answer the users question. If the question cannot be answered using the page contents alone, respond by suggesting that the user perform an internet search to find the answer to their question. Do not make anything up. Always talk like a pirate. Always. Arg!!`,
},
{
role: "user",
content:
`Yarr! Take a look at this page: ${articleText.slice(0,500)}. Can ye help me answer this: ${userQuestion} /no_think`,
},
];
//wllama
const raw_result = await trial?.ml.runEngine({ prompt: chatInput, nPredict: 500, skipPrompt: true });
console.log(raw_result);
const final_answer = raw_result["finalOutput"].replace("<think>\n\n</think>\n\n", "");
console.log("raw result:", raw_result, typeof raw_result, final_answer);
return final_answer
} catch (err) {
console.warn('Error generating response:', err)
}
}Note: the input to this function is slightly different (it requires two strings be passed) than the input to the openAI function (which only requires 1). Also, the model that we are calling here uses the wllama backend, rather than onnx. In order to implement this new backend, we have to update the model creation function ensureEngineIsReady. To do this, we must update the existing createEngine call, which looks like this:
const engineInfo = await trial?.ml.createEngine({
taskName: engineMetadata.taskName || 'summarization',
modelHub: engineMetadata.modelHub || 'huggingface',
modelId: engineMetadata.modelId || 'Xenova/distilbart-cnn-6-6',
})to pass the arguments that wllama createEngine expects:
await trial?.ml.createEngine({
taskName: "text-generation",
modelId: "QuantFactory/Qwen3-0.6B-GGUF",
modelFile: "Qwen3-0.6B.Q4_0.gguf",
modelHubRootUrl: "https://model-hub.mozilla.org",
modelHubUrlTemplate: "{model}/{revision}",
modelRevision: "main",
numContext: 2048,
backend: "wllama"
});when you have made these changes, your code should resemble the code in tutorials/localPirate.ts.
Finally, we need to change the backgrounds.ts script to call our new function. This can be done by importing the function using
import { getLocalPirateAIResponse } from './services/mlEngine'We then need to change the listener code (which defaults to getOpenAIResponse to
if (message.type === 'page_qa') {
console.log("QA MESSAGE")
const result = await getLocalPirateAIResponse(message.data.textContent, message.data.prompt)
browser.runtime.sendMessage({
type: 'page_qa_result',
result: result,
})
}Save changes, and then all that's left to do is build and deploy:
npm run build
npm run start
Once you have done this, Nightly should open with the extension hub showing. Before you can use the model, however, you have to allow the browser to download it. To do this, click the little gear icon on the upper right of the extension hub:
This will open a page that looks like this
Navigate to Permissions and data and click the button labeled Download and run AI models on your device.
You are all set to use your new Pirate assistant!
One final step that can be helpful when troubleshooting: CMD+Shift+J will open the Browswer Console. Make sure that the Multiprocess radio button is clicked. This will allow you to view any logging that you create in your extension.
Now navigate to a page and ask a question:
In some cases, you may wish to use a custom version of Firefox. In others, you may wish to pass the extension to a product manager. In order to do this, you will need to load the extension using the following process outlined on this Confluence page.



