Add apt v2 #452

ericcurtin · 2025-11-25T10:49:48Z

No description provided.

From ollama, this now works: MODEL_RUNNER_PORT=13434 make run OLLAMA_HOST="127.0.0.1:13434" ollama run ai/smollm2 OLLAMA_HOST="127.0.0.1:13434" ollama ls OLLAMA_HOST="127.0.0.1:13434" ollama ps OLLAMA_HOST="127.0.0.1:13434" ollama stop ai/smollm2 OLLAMA_HOST="127.0.0.1:13434" ollama pull ai/smollm2:135M-Q4_0 Signed-off-by: Eric Curtin <[email protected]>

Signed-off-by: Eric Curtin <[email protected]>

gemini-code-assist · 2025-11-25T10:50:15Z

Summary of Changes

Hello @ericcurtin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the model management capabilities by introducing new load, start, and stop CLI commands, providing users with more explicit control over model lifecycle. It also integrates an Ollama API compatibility layer, which refactors the underlying model loading and unloading mechanisms to use this new API, improving interoperability and streamlining internal logic.

Highlights

New CLI Commands for Model Management: Introduced docker model load to explicitly load models into memory and docker model start as an alias for load. Also added docker model stop as an alias for unload.
Ollama API Compatibility Layer: Integrated an Ollama-compatible API endpoint (/api/generate) to handle model loading and unloading requests, allowing for broader compatibility.
Refactored Model Unloading Logic: The Unload function in cmd/cli/desktop/desktop.go was rewritten to use a new UnloadFromMemory method, which leverages the Ollama-compatible API for unloading models.
Dedicated Model Load/Unload Functions: Added LoadIntoMemory and UnloadFromMemory methods to the Client in cmd/cli/desktop/desktop.go for direct interaction with the Ollama-compatible API for model lifecycle management.
Improved Model ID Resolution: The ResolveModelID function in pkg/inference/models/manager.go now normalizes model names (e.g., "gemma3" to "ai/gemma3:latest") before attempting to retrieve the model, enhancing robustness.
Internal Scheduler API Update: The direct /unload HTTP endpoint in the scheduler (pkg/inference/scheduling/scheduler.go) has been removed, and a new internal UnloadModels function was introduced for programmatic unloading.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

In UnloadFromMemory, the keep_alive value is sent as the string "0" while the comment and Ollama semantics suggest a numeric 0; consider aligning the type with the expected API contract to avoid subtle incompatibilities.
The updated unloadModel handler always returns a 200 OK GenerateResponse with Done=true regardless of whether any runners were actually unloaded or if the scheduler failed, whereas the previous implementation propagated scheduler errors/status; consider surfacing failure/zero-unload conditions via HTTP status or response fields to keep clients informed.
main.go now registers two handlers for the root path "/" (the new one and the existing one earlier in the file), which is redundant and could be confusing; consider removing the earlier handler and keeping only the final, documented one.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In UnloadFromMemory, the keep_alive value is sent as the string "0" while the comment and Ollama semantics suggest a numeric 0; consider aligning the type with the expected API contract to avoid subtle incompatibilities.
- The updated unloadModel handler always returns a 200 OK GenerateResponse with Done=true regardless of whether any runners were actually unloaded or if the scheduler failed, whereas the previous implementation propagated scheduler errors/status; consider surfacing failure/zero-unload conditions via HTTP status or response fields to keep clients informed.
- main.go now registers two handlers for the root path "/" (the new one and the existing one earlier in the file), which is redundant and could be confusing; consider removing the earlier handler and keeping only the final, documented one.

## Individual Comments

### Comment 1
<location> `cmd/cli/desktop/desktop.go:754-763` </location>
<code_context>
-	var unloadResp UnloadResponse
-	if err := json.Unmarshal(body, &unloadResp); err != nil {
-		return UnloadResponse{}, fmt.Errorf("failed to unmarshal response body: %w", err)
+	// Unload each model using /api/generate with keep_alive: 0
+	unloadedCount := 0
+	for _, model := range modelsToUnload {
+		if err := c.UnloadFromMemory(model); err != nil {
+			// Continue unloading other models even if one fails
+			// This matches the previous behavior where partial unloads were possible
+			continue
+		}
+		unloadedCount++
 	}

-	return unloadResp, nil
+	return UnloadResponse{UnloadedRunners: unloadedCount}, nil
 }

</code_context>

<issue_to_address>
**issue (bug_risk):** Unload now silently succeeds even if all unload operations fail

With this change, callers no longer see errors when unload requests fail. If every UnloadFromMemory call fails, we still return success with UnloadedRunners=0, making it indistinguishable from the "nothing to unload" case and hiding real failures from users/CLI. Please return an error when unloadedCount is 0 and at least one unload failed, or otherwise surface a representative error while still attempting all unloads.
</issue_to_address>

### Comment 2
<location> `cmd/cli/desktop/desktop.go:960-956` </location>
<code_context>
+func (c *Client) LoadIntoMemory(model string) error {
</code_context>

<issue_to_address>
**suggestion (performance):** LoadIntoMemory fully buffers and parses the response without using it

In both LoadIntoMemory and UnloadFromMemory, the body is fully read and unmarshaled into a map, but that result is never used. If you only need to confirm a 200 status and valid JSON, consider discarding the body (e.g., io.Copy to io.Discard) or using json.Decoder without storing the result, or even skipping JSON parsing if non-200s are already handled as errors. This reduces allocations and overhead, especially for large responses.
</issue_to_address>

### Comment 3
<location> `pkg/ollama/handler.go:489-498` </location>
<code_context>
+	// Unload the model directly via the scheduler
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Unload response reports success even if no runners were actually unloaded

The updated unloadModel always returns GenerateResponse{Done: true, DoneReason: "unload"}, even when UnloadModels returns 0 (e.g., unknown model name or no matching runners). Consider using a different DoneReason (e.g., "not_loaded") when unloadedCount == 0, or at least logging a warning so operators can distinguish a successful unload from a no-op.

Suggested implementation:

```golang
	// Unload the model directly via the scheduler
	unloadedCount := h.scheduler.UnloadModels(ctx, []string{modelName}, "")

	if unloadedCount == 0 {
		h.log.Warnf("unloadModel: no runners were unloaded for model %s (model may not be loaded)", sanitizedModelName)
	} else {
		h.log.Infof("unloadModel: unloaded %d runner(s) for model %s", unloadedCount, sanitizedModelName)
	}

```

To fully implement your suggestion about the API response:
1. Locate the return statement for this handler (likely returning a GenerateResponse or similar struct).
2. After the `unloadedCount` is computed, before returning, branch on `unloadedCount`:
   - If `unloadedCount == 0`, set `DoneReason` (or equivalent field) to something like `"not_loaded"` and consider a boolean flag (if available) to indicate that this was a no-op.
   - If `unloadedCount > 0`, keep the existing `"unload"` DoneReason.
3. Ensure any tests that assert on the DoneReason or handler behavior are updated or new tests are added to cover both cases (unloadedCount == 0 and unloadedCount > 0).
4. If there is a centralized place where DoneReasons are enumerated/validated, add `"not_loaded"` there as well.
</issue_to_address>

### Comment 4
<location> `pkg/inference/scheduling/scheduler.go:380-387` </location>
<code_context>
-	if err := json.NewEncoder(w).Encode(unloadedRunners); err != nil {
-		http.Error(w, fmt.Sprintf("Failed to encode response: %v", err), http.StatusInternalServerError)
-		return
+// UnloadModels unloads the specified models from the backend.
+// Returns the number of unloaded runners.
+func (s *Scheduler) UnloadModels(ctx context.Context, models []string, backend string) int {
+	unloadRequest := UnloadRequest{
+		Models:  models,
+		Backend: backend,
 	}
+	return s.loader.Unload(ctx, unloadRequest)
 }

</code_context>

<issue_to_address>
**suggestion (bug_risk):** Expose backend selection in UnloadModels more explicitly

Most callers (e.g., unloadModel) currently pass an empty backend string. Since this is now where backend selection is enforced (replacing the previous HTTP field), please document how an empty backend is handled (e.g., “all” vs “default”) and, if possible, validate the backend name before building the UnloadRequest so typos don’t result in a silent no-op.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-11-25T10:51:36Z

cmd/cli/desktop/desktop.go

+	// Unload each model using /api/generate with keep_alive: 0
+	unloadedCount := 0
+	for _, model := range modelsToUnload {
+		if err := c.UnloadFromMemory(model); err != nil {
+			// Continue unloading other models even if one fails
+			// This matches the previous behavior where partial unloads were possible
+			continue
+		}
+		unloadedCount++
 	}


issue (bug_risk): Unload now silently succeeds even if all unload operations fail

With this change, callers no longer see errors when unload requests fail. If every UnloadFromMemory call fails, we still return success with UnloadedRunners=0, making it indistinguishable from the "nothing to unload" case and hiding real failures from users/CLI. Please return an error when unloadedCount is 0 and at least one unload failed, or otherwise surface a representative error while still attempting all unloads.

sourcery-ai · 2025-11-25T10:51:37Z

cmd/cli/desktop/desktop.go

 	if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated {
 		return fmt.Errorf("load failed with status %s: %s", resp.Status, string(body))
 	}
 	return nil


suggestion (performance): LoadIntoMemory fully buffers and parses the response without using it

In both LoadIntoMemory and UnloadFromMemory, the body is fully read and unmarshaled into a map, but that result is never used. If you only need to confirm a 200 status and valid JSON, consider discarding the body (e.g., io.Copy to io.Discard) or using json.Decoder without storing the result, or even skipping JSON parsing if non-200s are already handled as errors. This reduces allocations and overhead, especially for large responses.

sourcery-ai · 2025-11-25T10:51:37Z

pkg/inference/scheduling/scheduler.go

+// UnloadModels unloads the specified models from the backend.
+// Returns the number of unloaded runners.
+func (s *Scheduler) UnloadModels(ctx context.Context, models []string, backend string) int {
+	unloadRequest := UnloadRequest{
+		Models:  models,
+		Backend: backend,
 	}
+	return s.loader.Unload(ctx, unloadRequest)


suggestion (bug_risk): Expose backend selection in UnloadModels more explicitly

Most callers (e.g., unloadModel) currently pass an empty backend string. Since this is now where backend selection is enforced (replacing the previous HTTP field), please document how an empty backend is handled (e.g., “all” vs “default”) and, if possible, validate the backend name before building the UnloadRequest so typos don’t result in a silent no-op.

gemini-code-assist

Code Review

This pull request introduces new CLI commands (load, start, stop) for managing models in memory, leveraging an Ollama-compatible API. The changes are a good step towards improving model lifecycle management. However, the review has identified a critical copy-paste error in main.go that will prevent compilation. Additionally, there are several opportunities for improvement, including refactoring duplicated code in the new command implementations, enhancing error handling to provide better user feedback, and optimizing client methods for better performance and type safety.

gemini-code-assist · 2025-11-25T10:53:00Z

main.go

+	// Add Ollama API compatibility layer (only register with trailing slash to catch sub-paths)
+	ollamaHandler := ollama.NewHandler(log, modelManager, scheduler, nil)
+	router.Handle(ollama.APIPrefix+"/", ollamaHandler)
+
+	// Register root handler LAST - it will only catch exact "/" requests that don't match other patterns
+	router.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+		// Only respond to exact root path
+		if r.URL.Path != "/" {
+			http.NotFound(w, r)
+			return
+		}
+		w.WriteHeader(http.StatusOK)
+		w.Write([]byte("Docker Model Runner is running"))
+	})


This block of code, which registers the Ollama API handler and a root handler, is a duplicate of the block at lines 179-192. This duplication will cause a compilation error because the ollamaHandler variable is redeclared. Please remove this redundant block.

gemini-code-assist · 2025-11-25T10:53:00Z

cmd/cli/desktop/desktop.go

 func (c *Client) Unload(req UnloadRequest) (UnloadResponse, error) {
-	unloadPath := inference.InferencePrefix + "/unload"
-	jsonData, err := json.Marshal(req)
-	if err != nil {
-		return UnloadResponse{}, fmt.Errorf("error marshaling request: %w", err)
-	}
-
-	resp, err := c.doRequest(http.MethodPost, unloadPath, bytes.NewReader(jsonData))
-	if err != nil {
-		return UnloadResponse{}, c.handleQueryError(err, unloadPath)
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode != http.StatusOK {
-		body, _ := io.ReadAll(resp.Body)
-		return UnloadResponse{}, fmt.Errorf("unloading failed with status %s: %s", resp.Status, string(body))
-	}
+	var modelsToUnload []string

-	body, err := io.ReadAll(resp.Body)
-	if err != nil {
-		return UnloadResponse{}, fmt.Errorf("failed to read response body: %w", err)
+	// If All is true, get all running models first
+	if req.All {
+		backends, err := c.PS()
+		if err != nil {
+			return UnloadResponse{}, fmt.Errorf("failed to get running models: %w", err)
+		}
+		for _, backend := range backends {
+			// Filter by backend if specified
+			if req.Backend == "" || backend.BackendName == req.Backend {
+				modelsToUnload = append(modelsToUnload, backend.ModelName)
+			}
+		}
+	} else {
+		modelsToUnload = req.Models
 	}

-	var unloadResp UnloadResponse
-	if err := json.Unmarshal(body, &unloadResp); err != nil {
-		return UnloadResponse{}, fmt.Errorf("failed to unmarshal response body: %w", err)
+	// Unload each model using /api/generate with keep_alive: 0
+	unloadedCount := 0
+	for _, model := range modelsToUnload {
+		if err := c.UnloadFromMemory(model); err != nil {
+			// Continue unloading other models even if one fails
+			// This matches the previous behavior where partial unloads were possible
+			continue
+		}
+		unloadedCount++
 	}

-	return unloadResp, nil
+	return UnloadResponse{UnloadedRunners: unloadedCount}, nil
 }


In the Unload function, errors from c.UnloadFromMemory(model) are silently ignored. While it's good to continue processing other models, swallowing errors can hide important issues from the user and make debugging difficult. It would be better to collect these errors and return them as a single, combined error at the end of the function.

func (c *Client) Unload(req UnloadRequest) (UnloadResponse, error) { var modelsToUnload []string // If All is true, get all running models first if req.All { backends, err := c.PS() if err != nil { return UnloadResponse{}, fmt.Errorf("failed to get running models: %w", err) } for _, backend := range backends { // Filter by backend if specified if req.Backend == "" || backend.BackendName == req.Backend { modelsToUnload = append(modelsToUnload, backend.ModelName) } } } else { modelsToUnload = req.Models } // Unload each model using /api/generate with keep_alive: 0 var unloadedCount int var errs []error for _, model := range modelsToUnload { if err := c.UnloadFromMemory(model); err != nil { // Continue unloading other models even if one fails, but collect errors. errs = append(errs, fmt.Errorf("failed to unload model %q: %w", model, err)) continue } unloadedCount++ } return UnloadResponse{UnloadedRunners: unloadedCount}, errors.Join(errs...) }

gemini-code-assist · 2025-11-25T10:53:00Z

cmd/cli/commands/load.go

+func newLoadCmd() *cobra.Command {
+	const cmdArgs = "MODEL [MODEL ...]"
+	c := &cobra.Command{
+		Use:   "load " + cmdArgs,
+		Short: "Load models into memory",
+		RunE: func(cmd *cobra.Command, modelArgs []string) error {
+			if len(modelArgs) == 0 {
+				return fmt.Errorf(
+					"'docker model load' requires at least one MODEL.\n\n" +
+						"Usage:  docker model load " + cmdArgs + "\n\n" +
+						"See 'docker model load --help' for more information.",
+				)
+			}
+
+			// Load each model
+			for _, model := range modelArgs {
+				if err := desktopClient.LoadIntoMemory(model); err != nil {
+					return handleClientError(err, fmt.Sprintf("Failed to load model %s", model))
+				}
+				cmd.Printf("Loaded model: %s\n", model)
+			}
+
+			return nil
+		},
+		ValidArgsFunction: completion.NoComplete,
+	}
+	return c
+}
+
+func newStartCmd() *cobra.Command {
+	const cmdArgs = "MODEL [MODEL ...]"
+	c := &cobra.Command{
+		Use:   "start " + cmdArgs,
+		Short: "Start models (alias for load)",
+		RunE: func(cmd *cobra.Command, modelArgs []string) error {
+			if len(modelArgs) == 0 {
+				return fmt.Errorf(
+					"'docker model start' requires at least one MODEL.\n\n" +
+						"Usage:  docker model start " + cmdArgs + "\n\n" +
+						"See 'docker model start --help' for more information.",
+				)
+			}
+
+			// Load each model (start is an alias for load)
+			for _, model := range modelArgs {
+				if err := desktopClient.LoadIntoMemory(model); err != nil {
+					return handleClientError(err, fmt.Sprintf("Failed to start model %s", model))
+				}
+				cmd.Printf("Started model: %s\n", model)
+			}
+
+			return nil
+		},
+		ValidArgsFunction: completion.NoComplete,
+	}
+	return c
+}


The functions newLoadCmd and newStartCmd contain a significant amount of duplicated code. To improve maintainability and reduce redundancy, you can extract the common logic into a single factory function. This function would create the cobra.Command and accept parameters for the parts that differ, such as the command name, short description, and the verb used in output messages (Loaded/Started).

func newModelLoadCmd(use, short, verb string) *cobra.Command { const cmdArgs = "MODEL [MODEL ...]" c := &cobra.Command{ Use: use + " " + cmdArgs, Short: short, RunE: func(cmd *cobra.Command, modelArgs []string) error { if len(modelArgs) == 0 { return fmt.Errorf( "'docker model %s' requires at least one MODEL.\n\n"+ "Usage: docker model %s "+cmdArgs+"\n\n"+ "See 'docker model %s --help' for more information.", use, use, use, ) } // Load each model for _, model := range modelArgs { if err := desktopClient.LoadIntoMemory(model); err != nil { return handleClientError(err, fmt.Sprintf("Failed to %s model %s", use, model)) } cmd.Printf("%s model: %s\n", verb, model) } return nil }, ValidArgsFunction: completion.NoComplete, } return c } func newLoadCmd() *cobra.Command { return newModelLoadCmd("load", "Load models into memory", "Loaded") } func newStartCmd() *cobra.Command { return newModelLoadCmd("start", "Start models (alias for load)", "Started") }

gemini-code-assist · 2025-11-25T10:53:00Z

cmd/cli/commands/stop.go

+			for _, model := range modelArgs {
+				if err := desktopClient.UnloadFromMemory(model); err != nil {
+					return handleClientError(err, fmt.Sprintf("Failed to stop model %s", model))
+				}
+				cmd.Printf("Stopped model: %s\n", model)
+			}


The loop for stopping models currently exits on the first error. This prevents subsequent models in the argument list from being stopped. It would be more robust to attempt to stop all specified models, collect any errors that occur, and report them at the end. This ensures that a single failure doesn't halt the entire operation.

Suggested change

for _, model := range modelArgs {

if err := desktopClient.UnloadFromMemory(model); err != nil {

return handleClientError(err, fmt.Sprintf("Failed to stop model %s", model))

}

cmd.Printf("Stopped model: %s\n", model)

}

for _, model := range modelArgs {

if err := desktopClient.UnloadFromMemory(model); err != nil {

// It's better to report errors and continue, rather than stopping on the first failure.

cmd.PrintErrf("Failed to stop model %s: %v\n", model, err)

continue

}

cmd.Printf("Stopped model: %s\n", model)

}

gemini-code-assist · 2025-11-25T10:53:01Z

cmd/cli/desktop/desktop.go

+// LoadIntoMemory loads a model into memory using the Ollama-compatible API
+func (c *Client) LoadIntoMemory(model string) error {
+	model = normalizeHuggingFaceModelName(model)
+
+	// Create request body with empty prompt to load model
+	reqBody := map[string]interface{}{
+		"model": model,
+	}
+
+	jsonData, err := json.Marshal(reqBody)
+	if err != nil {
+		return fmt.Errorf("error marshaling request: %w", err)
+	}
+
+	// Use /api/generate endpoint with empty prompt to load model
+	generatePath := "/api/generate"
+	resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData))
+	if err != nil {
+		return c.handleQueryError(err, generatePath)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		body, _ := io.ReadAll(resp.Body)
+		return fmt.Errorf("loading %s failed with status %s: %s", model, resp.Status, string(body))
+	}
+
+	// Read the response to ensure it's valid
+	body, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return fmt.Errorf("failed to read response body: %w", err)
+	}
+
+	// Parse response to verify success
+	var response map[string]interface{}
+	if err := json.Unmarshal(body, &response); err != nil {
+		return fmt.Errorf("failed to parse response: %w", err)
+	}
+
+	return nil
+}


This function can be improved in two ways:

Type Safety: Using map[string]interface{} for the request body is not type-safe. Defining a specific struct for the request improves readability and allows the compiler to catch potential errors.

Efficiency: The response body is read and unmarshalled, but the result is never used. If the goal is simply to consume the response body to allow for connection reuse, it's more efficient to use io.Copy(io.Discard, resp.Body).

func (c *Client) LoadIntoMemory(model string) error { model = normalizeHuggingFaceModelName(model) // Create request body with empty prompt to load model type loadRequest struct { Model string `json:"model"` } reqBody := loadRequest{ Model: model, } jsonData, err := json.Marshal(reqBody) if err != nil { return fmt.Errorf("error marshaling request: %w", err) } // Use /api/generate endpoint with empty prompt to load model generatePath := "/api/generate" resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData)) if err != nil { return c.handleQueryError(err, generatePath) } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, _ := io.ReadAll(resp.Body) return fmt.Errorf("loading %s failed with status %s: %s", model, resp.Status, string(body)) } // Discard the response body to allow connection reuse. if _, err := io.Copy(io.Discard, resp.Body); err != nil { return fmt.Errorf("failed to read response body: %w", err) } return nil }

gemini-code-assist · 2025-11-25T10:53:01Z

cmd/cli/desktop/desktop.go

+// UnloadFromMemory unloads a model from memory using the Ollama-compatible API
+func (c *Client) UnloadFromMemory(model string) error {
+	model = normalizeHuggingFaceModelName(model)
+
+	// Create request body with empty prompt and keep_alive: "0"
+	reqBody := map[string]interface{}{
+		"model":      model,
+		"keep_alive": "0",
+	}
+
+	jsonData, err := json.Marshal(reqBody)
+	if err != nil {
+		return fmt.Errorf("error marshaling request: %w", err)
+	}
+
+	// Use /api/generate endpoint with keep_alive: 0 to unload model
+	generatePath := "/api/generate"
+	resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData))
+	if err != nil {
+		return c.handleQueryError(err, generatePath)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		body, _ := io.ReadAll(resp.Body)
+		return fmt.Errorf("unloading %s failed with status %s: %s", model, resp.Status, string(body))
+	}
+
+	// Read the response to ensure it's valid
+	body, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return fmt.Errorf("failed to read response body: %w", err)
+	}
+
+	// Parse response to verify unload
+	var response map[string]interface{}
+	if err := json.Unmarshal(body, &response); err != nil {
+		return fmt.Errorf("failed to parse response: %w", err)
+	}
+
+	return nil
+}


This function can be improved in two ways:

Type Safety: Using map[string]interface{} for the request body is not type-safe. Defining a specific struct for the request improves readability and allows the compiler to catch potential errors.

Efficiency: The response body is read and unmarshalled, but the result is never used. If the goal is simply to consume the response body to allow for connection reuse, it's more efficient to use io.Copy(io.Discard, resp.Body).

func (c *Client) UnloadFromMemory(model string) error { model = normalizeHuggingFaceModelName(model) // Create request body with empty prompt and keep_alive: "0" type unloadRequest struct { Model string `json:"model"` KeepAlive string `json:"keep_alive"` } reqBody := unloadRequest{ Model: model, KeepAlive: "0", } jsonData, err := json.Marshal(reqBody) if err != nil { return fmt.Errorf("error marshaling request: %w", err) } // Use /api/generate endpoint with keep_alive: 0 to unload model generatePath := "/api/generate" resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData)) if err != nil { return c.handleQueryError(err, generatePath) } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, _ := io.ReadAll(resp.Body) return fmt.Errorf("unloading %s failed with status %s: %s", model, resp.Status, string(body)) } // Discard the response body to allow connection reuse. if _, err := io.Copy(io.Discard, resp.Body); err != nil { return fmt.Errorf("failed to read response body: %w", err) } return nil }

ericcurtin added 5 commits November 25, 2025 10:49

load unload, stop, start

ee046a7

Signed-off-by: Eric Curtin <[email protected]>

normalization fix

6cd7aee

Signed-off-by: Eric Curtin <[email protected]>

Migrate to Ollama unload api

8ee0ce6

Signed-off-by: Eric Curtin <[email protected]>

Remove unload call

df8e609

Signed-off-by: Eric Curtin <[email protected]>

ericcurtin marked this pull request as draft November 25, 2025 10:49

sourcery-ai bot reviewed Nov 25, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 25, 2025

View reviewed changes

ericcurtin closed this Nov 27, 2025

Add apt v2 #452

Add apt v2 #452

Uh oh!

Conversation

ericcurtin commented Nov 25, 2025

Uh oh!

gemini-code-assist bot commented Nov 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants