Skip to content

Conversation

@ericcurtin
Copy link
Contributor

No description provided.

From ollama, this now works:

MODEL_RUNNER_PORT=13434 make run

OLLAMA_HOST="127.0.0.1:13434" ollama run ai/smollm2
OLLAMA_HOST="127.0.0.1:13434" ollama ls
OLLAMA_HOST="127.0.0.1:13434" ollama ps
OLLAMA_HOST="127.0.0.1:13434" ollama stop ai/smollm2
OLLAMA_HOST="127.0.0.1:13434" ollama pull ai/smollm2:135M-Q4_0

Signed-off-by: Eric Curtin <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
@ericcurtin ericcurtin marked this pull request as draft November 25, 2025 10:49
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ericcurtin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the model management capabilities by introducing new load, start, and stop CLI commands, providing users with more explicit control over model lifecycle. It also integrates an Ollama API compatibility layer, which refactors the underlying model loading and unloading mechanisms to use this new API, improving interoperability and streamlining internal logic.

Highlights

  • New CLI Commands for Model Management: Introduced docker model load to explicitly load models into memory and docker model start as an alias for load. Also added docker model stop as an alias for unload.
  • Ollama API Compatibility Layer: Integrated an Ollama-compatible API endpoint (/api/generate) to handle model loading and unloading requests, allowing for broader compatibility.
  • Refactored Model Unloading Logic: The Unload function in cmd/cli/desktop/desktop.go was rewritten to use a new UnloadFromMemory method, which leverages the Ollama-compatible API for unloading models.
  • Dedicated Model Load/Unload Functions: Added LoadIntoMemory and UnloadFromMemory methods to the Client in cmd/cli/desktop/desktop.go for direct interaction with the Ollama-compatible API for model lifecycle management.
  • Improved Model ID Resolution: The ResolveModelID function in pkg/inference/models/manager.go now normalizes model names (e.g., "gemma3" to "ai/gemma3:latest") before attempting to retrieve the model, enhancing robustness.
  • Internal Scheduler API Update: The direct /unload HTTP endpoint in the scheduler (pkg/inference/scheduling/scheduler.go) has been removed, and a new internal UnloadModels function was introduced for programmatic unloading.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In UnloadFromMemory, the keep_alive value is sent as the string "0" while the comment and Ollama semantics suggest a numeric 0; consider aligning the type with the expected API contract to avoid subtle incompatibilities.
  • The updated unloadModel handler always returns a 200 OK GenerateResponse with Done=true regardless of whether any runners were actually unloaded or if the scheduler failed, whereas the previous implementation propagated scheduler errors/status; consider surfacing failure/zero-unload conditions via HTTP status or response fields to keep clients informed.
  • main.go now registers two handlers for the root path "/" (the new one and the existing one earlier in the file), which is redundant and could be confusing; consider removing the earlier handler and keeping only the final, documented one.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In UnloadFromMemory, the keep_alive value is sent as the string "0" while the comment and Ollama semantics suggest a numeric 0; consider aligning the type with the expected API contract to avoid subtle incompatibilities.
- The updated unloadModel handler always returns a 200 OK GenerateResponse with Done=true regardless of whether any runners were actually unloaded or if the scheduler failed, whereas the previous implementation propagated scheduler errors/status; consider surfacing failure/zero-unload conditions via HTTP status or response fields to keep clients informed.
- main.go now registers two handlers for the root path "/" (the new one and the existing one earlier in the file), which is redundant and could be confusing; consider removing the earlier handler and keeping only the final, documented one.

## Individual Comments

### Comment 1
<location> `cmd/cli/desktop/desktop.go:754-763` </location>
<code_context>
-	var unloadResp UnloadResponse
-	if err := json.Unmarshal(body, &unloadResp); err != nil {
-		return UnloadResponse{}, fmt.Errorf("failed to unmarshal response body: %w", err)
+	// Unload each model using /api/generate with keep_alive: 0
+	unloadedCount := 0
+	for _, model := range modelsToUnload {
+		if err := c.UnloadFromMemory(model); err != nil {
+			// Continue unloading other models even if one fails
+			// This matches the previous behavior where partial unloads were possible
+			continue
+		}
+		unloadedCount++
 	}

-	return unloadResp, nil
+	return UnloadResponse{UnloadedRunners: unloadedCount}, nil
 }

</code_context>

<issue_to_address>
**issue (bug_risk):** Unload now silently succeeds even if all unload operations fail

With this change, callers no longer see errors when unload requests fail. If every UnloadFromMemory call fails, we still return success with UnloadedRunners=0, making it indistinguishable from the "nothing to unload" case and hiding real failures from users/CLI. Please return an error when unloadedCount is 0 and at least one unload failed, or otherwise surface a representative error while still attempting all unloads.
</issue_to_address>

### Comment 2
<location> `cmd/cli/desktop/desktop.go:960-956` </location>
<code_context>
+func (c *Client) LoadIntoMemory(model string) error {
</code_context>

<issue_to_address>
**suggestion (performance):** LoadIntoMemory fully buffers and parses the response without using it

In both LoadIntoMemory and UnloadFromMemory, the body is fully read and unmarshaled into a map, but that result is never used. If you only need to confirm a 200 status and valid JSON, consider discarding the body (e.g., io.Copy to io.Discard) or using json.Decoder without storing the result, or even skipping JSON parsing if non-200s are already handled as errors. This reduces allocations and overhead, especially for large responses.
</issue_to_address>

### Comment 3
<location> `pkg/ollama/handler.go:489-498` </location>
<code_context>
+	// Unload the model directly via the scheduler
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Unload response reports success even if no runners were actually unloaded

The updated unloadModel always returns GenerateResponse{Done: true, DoneReason: "unload"}, even when UnloadModels returns 0 (e.g., unknown model name or no matching runners). Consider using a different DoneReason (e.g., "not_loaded") when unloadedCount == 0, or at least logging a warning so operators can distinguish a successful unload from a no-op.

Suggested implementation:

```golang
	// Unload the model directly via the scheduler
	unloadedCount := h.scheduler.UnloadModels(ctx, []string{modelName}, "")

	if unloadedCount == 0 {
		h.log.Warnf("unloadModel: no runners were unloaded for model %s (model may not be loaded)", sanitizedModelName)
	} else {
		h.log.Infof("unloadModel: unloaded %d runner(s) for model %s", unloadedCount, sanitizedModelName)
	}

```

To fully implement your suggestion about the API response:
1. Locate the return statement for this handler (likely returning a GenerateResponse or similar struct).
2. After the `unloadedCount` is computed, before returning, branch on `unloadedCount`:
   - If `unloadedCount == 0`, set `DoneReason` (or equivalent field) to something like `"not_loaded"` and consider a boolean flag (if available) to indicate that this was a no-op.
   - If `unloadedCount > 0`, keep the existing `"unload"` DoneReason.
3. Ensure any tests that assert on the DoneReason or handler behavior are updated or new tests are added to cover both cases (unloadedCount == 0 and unloadedCount > 0).
4. If there is a centralized place where DoneReasons are enumerated/validated, add `"not_loaded"` there as well.
</issue_to_address>

### Comment 4
<location> `pkg/inference/scheduling/scheduler.go:380-387` </location>
<code_context>
-	if err := json.NewEncoder(w).Encode(unloadedRunners); err != nil {
-		http.Error(w, fmt.Sprintf("Failed to encode response: %v", err), http.StatusInternalServerError)
-		return
+// UnloadModels unloads the specified models from the backend.
+// Returns the number of unloaded runners.
+func (s *Scheduler) UnloadModels(ctx context.Context, models []string, backend string) int {
+	unloadRequest := UnloadRequest{
+		Models:  models,
+		Backend: backend,
 	}
+	return s.loader.Unload(ctx, unloadRequest)
 }

</code_context>

<issue_to_address>
**suggestion (bug_risk):** Expose backend selection in UnloadModels more explicitly

Most callers (e.g., unloadModel) currently pass an empty backend string. Since this is now where backend selection is enforced (replacing the previous HTTP field), please document how an empty backend is handled (e.g., “all” vs “default”) and, if possible, validate the backend name before building the UnloadRequest so typos don’t result in a silent no-op.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +754 to 763
// Unload each model using /api/generate with keep_alive: 0
unloadedCount := 0
for _, model := range modelsToUnload {
if err := c.UnloadFromMemory(model); err != nil {
// Continue unloading other models even if one fails
// This matches the previous behavior where partial unloads were possible
continue
}
unloadedCount++
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Unload now silently succeeds even if all unload operations fail

With this change, callers no longer see errors when unload requests fail. If every UnloadFromMemory call fails, we still return success with UnloadedRunners=0, making it indistinguishable from the "nothing to unload" case and hiding real failures from users/CLI. Please return an error when unloadedCount is 0 and at least one unload failed, or otherwise surface a representative error while still attempting all unloads.

if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated {
return fmt.Errorf("load failed with status %s: %s", resp.Status, string(body))
}
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): LoadIntoMemory fully buffers and parses the response without using it

In both LoadIntoMemory and UnloadFromMemory, the body is fully read and unmarshaled into a map, but that result is never used. If you only need to confirm a 200 status and valid JSON, consider discarding the body (e.g., io.Copy to io.Discard) or using json.Decoder without storing the result, or even skipping JSON parsing if non-200s are already handled as errors. This reduces allocations and overhead, especially for large responses.

Comment on lines +380 to +387
// UnloadModels unloads the specified models from the backend.
// Returns the number of unloaded runners.
func (s *Scheduler) UnloadModels(ctx context.Context, models []string, backend string) int {
unloadRequest := UnloadRequest{
Models: models,
Backend: backend,
}
return s.loader.Unload(ctx, unloadRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Expose backend selection in UnloadModels more explicitly

Most callers (e.g., unloadModel) currently pass an empty backend string. Since this is now where backend selection is enforced (replacing the previous HTTP field), please document how an empty backend is handled (e.g., “all” vs “default”) and, if possible, validate the backend name before building the UnloadRequest so typos don’t result in a silent no-op.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new CLI commands (load, start, stop) for managing models in memory, leveraging an Ollama-compatible API. The changes are a good step towards improving model lifecycle management. However, the review has identified a critical copy-paste error in main.go that will prevent compilation. Additionally, there are several opportunities for improvement, including refactoring duplicated code in the new command implementations, enhancing error handling to provide better user feedback, and optimizing client methods for better performance and type safety.

Comment on lines +194 to +207
// Add Ollama API compatibility layer (only register with trailing slash to catch sub-paths)
ollamaHandler := ollama.NewHandler(log, modelManager, scheduler, nil)
router.Handle(ollama.APIPrefix+"/", ollamaHandler)

// Register root handler LAST - it will only catch exact "/" requests that don't match other patterns
router.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
// Only respond to exact root path
if r.URL.Path != "/" {
http.NotFound(w, r)
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("Docker Model Runner is running"))
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block of code, which registers the Ollama API handler and a root handler, is a duplicate of the block at lines 179-192. This duplication will cause a compilation error because the ollamaHandler variable is redeclared. Please remove this redundant block.

Comment on lines 735 to 766
func (c *Client) Unload(req UnloadRequest) (UnloadResponse, error) {
unloadPath := inference.InferencePrefix + "/unload"
jsonData, err := json.Marshal(req)
if err != nil {
return UnloadResponse{}, fmt.Errorf("error marshaling request: %w", err)
}

resp, err := c.doRequest(http.MethodPost, unloadPath, bytes.NewReader(jsonData))
if err != nil {
return UnloadResponse{}, c.handleQueryError(err, unloadPath)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return UnloadResponse{}, fmt.Errorf("unloading failed with status %s: %s", resp.Status, string(body))
}
var modelsToUnload []string

body, err := io.ReadAll(resp.Body)
if err != nil {
return UnloadResponse{}, fmt.Errorf("failed to read response body: %w", err)
// If All is true, get all running models first
if req.All {
backends, err := c.PS()
if err != nil {
return UnloadResponse{}, fmt.Errorf("failed to get running models: %w", err)
}
for _, backend := range backends {
// Filter by backend if specified
if req.Backend == "" || backend.BackendName == req.Backend {
modelsToUnload = append(modelsToUnload, backend.ModelName)
}
}
} else {
modelsToUnload = req.Models
}

var unloadResp UnloadResponse
if err := json.Unmarshal(body, &unloadResp); err != nil {
return UnloadResponse{}, fmt.Errorf("failed to unmarshal response body: %w", err)
// Unload each model using /api/generate with keep_alive: 0
unloadedCount := 0
for _, model := range modelsToUnload {
if err := c.UnloadFromMemory(model); err != nil {
// Continue unloading other models even if one fails
// This matches the previous behavior where partial unloads were possible
continue
}
unloadedCount++
}

return unloadResp, nil
return UnloadResponse{UnloadedRunners: unloadedCount}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In the Unload function, errors from c.UnloadFromMemory(model) are silently ignored. While it's good to continue processing other models, swallowing errors can hide important issues from the user and make debugging difficult. It would be better to collect these errors and return them as a single, combined error at the end of the function.

func (c *Client) Unload(req UnloadRequest) (UnloadResponse, error) {
	var modelsToUnload []string

	// If All is true, get all running models first
	if req.All {
		backends, err := c.PS()
		if err != nil {
			return UnloadResponse{}, fmt.Errorf("failed to get running models: %w", err)
		}
		for _, backend := range backends {
			// Filter by backend if specified
			if req.Backend == "" || backend.BackendName == req.Backend {
				modelsToUnload = append(modelsToUnload, backend.ModelName)
			}
		}
	} else {
		modelsToUnload = req.Models
	}

	// Unload each model using /api/generate with keep_alive: 0
	var unloadedCount int
	var errs []error
	for _, model := range modelsToUnload {
		if err := c.UnloadFromMemory(model); err != nil {
			// Continue unloading other models even if one fails, but collect errors.
			errs = append(errs, fmt.Errorf("failed to unload model %q: %w", model, err))
			continue
		}
		unloadedCount++
	}

	return UnloadResponse{UnloadedRunners: unloadedCount}, errors.Join(errs...)
}

Comment on lines +11 to +67
func newLoadCmd() *cobra.Command {
const cmdArgs = "MODEL [MODEL ...]"
c := &cobra.Command{
Use: "load " + cmdArgs,
Short: "Load models into memory",
RunE: func(cmd *cobra.Command, modelArgs []string) error {
if len(modelArgs) == 0 {
return fmt.Errorf(
"'docker model load' requires at least one MODEL.\n\n" +
"Usage: docker model load " + cmdArgs + "\n\n" +
"See 'docker model load --help' for more information.",
)
}

// Load each model
for _, model := range modelArgs {
if err := desktopClient.LoadIntoMemory(model); err != nil {
return handleClientError(err, fmt.Sprintf("Failed to load model %s", model))
}
cmd.Printf("Loaded model: %s\n", model)
}

return nil
},
ValidArgsFunction: completion.NoComplete,
}
return c
}

func newStartCmd() *cobra.Command {
const cmdArgs = "MODEL [MODEL ...]"
c := &cobra.Command{
Use: "start " + cmdArgs,
Short: "Start models (alias for load)",
RunE: func(cmd *cobra.Command, modelArgs []string) error {
if len(modelArgs) == 0 {
return fmt.Errorf(
"'docker model start' requires at least one MODEL.\n\n" +
"Usage: docker model start " + cmdArgs + "\n\n" +
"See 'docker model start --help' for more information.",
)
}

// Load each model (start is an alias for load)
for _, model := range modelArgs {
if err := desktopClient.LoadIntoMemory(model); err != nil {
return handleClientError(err, fmt.Sprintf("Failed to start model %s", model))
}
cmd.Printf("Started model: %s\n", model)
}

return nil
},
ValidArgsFunction: completion.NoComplete,
}
return c
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The functions newLoadCmd and newStartCmd contain a significant amount of duplicated code. To improve maintainability and reduce redundancy, you can extract the common logic into a single factory function. This function would create the cobra.Command and accept parameters for the parts that differ, such as the command name, short description, and the verb used in output messages (Loaded/Started).

func newModelLoadCmd(use, short, verb string) *cobra.Command {
	const cmdArgs = "MODEL [MODEL ...]"
	c := &cobra.Command{
		Use:   use + " " + cmdArgs,
		Short: short,
		RunE: func(cmd *cobra.Command, modelArgs []string) error {
			if len(modelArgs) == 0 {
				return fmt.Errorf(
					"'docker model %s' requires at least one MODEL.\n\n"+
						"Usage:  docker model %s "+cmdArgs+"\n\n"+
						"See 'docker model %s --help' for more information.",
					use, use, use,
				)
			}

			// Load each model
			for _, model := range modelArgs {
				if err := desktopClient.LoadIntoMemory(model); err != nil {
					return handleClientError(err, fmt.Sprintf("Failed to %s model %s", use, model))
				}
				cmd.Printf("%s model: %s\n", verb, model)
			}

			return nil
		},
		ValidArgsFunction: completion.NoComplete,
	}
	return c
}

func newLoadCmd() *cobra.Command {
	return newModelLoadCmd("load", "Load models into memory", "Loaded")
}

func newStartCmd() *cobra.Command {
	return newModelLoadCmd("start", "Start models (alias for load)", "Started")
}

Comment on lines +26 to +31
for _, model := range modelArgs {
if err := desktopClient.UnloadFromMemory(model); err != nil {
return handleClientError(err, fmt.Sprintf("Failed to stop model %s", model))
}
cmd.Printf("Stopped model: %s\n", model)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop for stopping models currently exits on the first error. This prevents subsequent models in the argument list from being stopped. It would be more robust to attempt to stop all specified models, collect any errors that occur, and report them at the end. This ensures that a single failure doesn't halt the entire operation.

Suggested change
for _, model := range modelArgs {
if err := desktopClient.UnloadFromMemory(model); err != nil {
return handleClientError(err, fmt.Sprintf("Failed to stop model %s", model))
}
cmd.Printf("Stopped model: %s\n", model)
}
for _, model := range modelArgs {
if err := desktopClient.UnloadFromMemory(model); err != nil {
// It's better to report errors and continue, rather than stopping on the first failure.
cmd.PrintErrf("Failed to stop model %s: %v\n", model, err)
continue
}
cmd.Printf("Stopped model: %s\n", model)
}

Comment on lines +959 to +999
// LoadIntoMemory loads a model into memory using the Ollama-compatible API
func (c *Client) LoadIntoMemory(model string) error {
model = normalizeHuggingFaceModelName(model)

// Create request body with empty prompt to load model
reqBody := map[string]interface{}{
"model": model,
}

jsonData, err := json.Marshal(reqBody)
if err != nil {
return fmt.Errorf("error marshaling request: %w", err)
}

// Use /api/generate endpoint with empty prompt to load model
generatePath := "/api/generate"
resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData))
if err != nil {
return c.handleQueryError(err, generatePath)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return fmt.Errorf("loading %s failed with status %s: %s", model, resp.Status, string(body))
}

// Read the response to ensure it's valid
body, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed to read response body: %w", err)
}

// Parse response to verify success
var response map[string]interface{}
if err := json.Unmarshal(body, &response); err != nil {
return fmt.Errorf("failed to parse response: %w", err)
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function can be improved in two ways:

  1. Type Safety: Using map[string]interface{} for the request body is not type-safe. Defining a specific struct for the request improves readability and allows the compiler to catch potential errors.
  2. Efficiency: The response body is read and unmarshalled, but the result is never used. If the goal is simply to consume the response body to allow for connection reuse, it's more efficient to use io.Copy(io.Discard, resp.Body).
func (c *Client) LoadIntoMemory(model string) error {
	model = normalizeHuggingFaceModelName(model)

	// Create request body with empty prompt to load model
	type loadRequest struct {
		Model string `json:"model"`
	}
	reqBody := loadRequest{
		Model: model,
	}

	jsonData, err := json.Marshal(reqBody)
	if err != nil {
		return fmt.Errorf("error marshaling request: %w", err)
	}

	// Use /api/generate endpoint with empty prompt to load model
	generatePath := "/api/generate"
	resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData))
	if err != nil {
		return c.handleQueryError(err, generatePath)
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		body, _ := io.ReadAll(resp.Body)
		return fmt.Errorf("loading %s failed with status %s: %s", model, resp.Status, string(body))
	}

	// Discard the response body to allow connection reuse.
	if _, err := io.Copy(io.Discard, resp.Body); err != nil {
		return fmt.Errorf("failed to read response body: %w", err)
	}

	return nil
}

Comment on lines +1001 to +1042
// UnloadFromMemory unloads a model from memory using the Ollama-compatible API
func (c *Client) UnloadFromMemory(model string) error {
model = normalizeHuggingFaceModelName(model)

// Create request body with empty prompt and keep_alive: "0"
reqBody := map[string]interface{}{
"model": model,
"keep_alive": "0",
}

jsonData, err := json.Marshal(reqBody)
if err != nil {
return fmt.Errorf("error marshaling request: %w", err)
}

// Use /api/generate endpoint with keep_alive: 0 to unload model
generatePath := "/api/generate"
resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData))
if err != nil {
return c.handleQueryError(err, generatePath)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return fmt.Errorf("unloading %s failed with status %s: %s", model, resp.Status, string(body))
}

// Read the response to ensure it's valid
body, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed to read response body: %w", err)
}

// Parse response to verify unload
var response map[string]interface{}
if err := json.Unmarshal(body, &response); err != nil {
return fmt.Errorf("failed to parse response: %w", err)
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function can be improved in two ways:

  1. Type Safety: Using map[string]interface{} for the request body is not type-safe. Defining a specific struct for the request improves readability and allows the compiler to catch potential errors.
  2. Efficiency: The response body is read and unmarshalled, but the result is never used. If the goal is simply to consume the response body to allow for connection reuse, it's more efficient to use io.Copy(io.Discard, resp.Body).
func (c *Client) UnloadFromMemory(model string) error {
	model = normalizeHuggingFaceModelName(model)

	// Create request body with empty prompt and keep_alive: "0"
	type unloadRequest struct {
		Model     string `json:"model"`
		KeepAlive string `json:"keep_alive"`
	}
	reqBody := unloadRequest{
		Model:      model,
		KeepAlive: "0",
	}

	jsonData, err := json.Marshal(reqBody)
	if err != nil {
		return fmt.Errorf("error marshaling request: %w", err)
	}

	// Use /api/generate endpoint with keep_alive: 0 to unload model
	generatePath := "/api/generate"
	resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData))
	if err != nil {
		return c.handleQueryError(err, generatePath)
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		body, _ := io.ReadAll(resp.Body)
		return fmt.Errorf("unloading %s failed with status %s: %s", model, resp.Status, string(body))
	}

	// Discard the response body to allow connection reuse.
	if _, err := io.Copy(io.Discard, resp.Body); err != nil {
		return fmt.Errorf("failed to read response body: %w", err)
	}

	return nil
}

@ericcurtin ericcurtin closed this Nov 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants