-
Notifications
You must be signed in to change notification settings - Fork 67
Add apt v2 #452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add apt v2 #452
Conversation
From ollama, this now works: MODEL_RUNNER_PORT=13434 make run OLLAMA_HOST="127.0.0.1:13434" ollama run ai/smollm2 OLLAMA_HOST="127.0.0.1:13434" ollama ls OLLAMA_HOST="127.0.0.1:13434" ollama ps OLLAMA_HOST="127.0.0.1:13434" ollama stop ai/smollm2 OLLAMA_HOST="127.0.0.1:13434" ollama pull ai/smollm2:135M-Q4_0 Signed-off-by: Eric Curtin <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
Summary of ChangesHello @ericcurtin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the model management capabilities by introducing new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In UnloadFromMemory, the keep_alive value is sent as the string "0" while the comment and Ollama semantics suggest a numeric 0; consider aligning the type with the expected API contract to avoid subtle incompatibilities.
- The updated unloadModel handler always returns a 200 OK GenerateResponse with Done=true regardless of whether any runners were actually unloaded or if the scheduler failed, whereas the previous implementation propagated scheduler errors/status; consider surfacing failure/zero-unload conditions via HTTP status or response fields to keep clients informed.
- main.go now registers two handlers for the root path "/" (the new one and the existing one earlier in the file), which is redundant and could be confusing; consider removing the earlier handler and keeping only the final, documented one.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In UnloadFromMemory, the keep_alive value is sent as the string "0" while the comment and Ollama semantics suggest a numeric 0; consider aligning the type with the expected API contract to avoid subtle incompatibilities.
- The updated unloadModel handler always returns a 200 OK GenerateResponse with Done=true regardless of whether any runners were actually unloaded or if the scheduler failed, whereas the previous implementation propagated scheduler errors/status; consider surfacing failure/zero-unload conditions via HTTP status or response fields to keep clients informed.
- main.go now registers two handlers for the root path "/" (the new one and the existing one earlier in the file), which is redundant and could be confusing; consider removing the earlier handler and keeping only the final, documented one.
## Individual Comments
### Comment 1
<location> `cmd/cli/desktop/desktop.go:754-763` </location>
<code_context>
- var unloadResp UnloadResponse
- if err := json.Unmarshal(body, &unloadResp); err != nil {
- return UnloadResponse{}, fmt.Errorf("failed to unmarshal response body: %w", err)
+ // Unload each model using /api/generate with keep_alive: 0
+ unloadedCount := 0
+ for _, model := range modelsToUnload {
+ if err := c.UnloadFromMemory(model); err != nil {
+ // Continue unloading other models even if one fails
+ // This matches the previous behavior where partial unloads were possible
+ continue
+ }
+ unloadedCount++
}
- return unloadResp, nil
+ return UnloadResponse{UnloadedRunners: unloadedCount}, nil
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Unload now silently succeeds even if all unload operations fail
With this change, callers no longer see errors when unload requests fail. If every UnloadFromMemory call fails, we still return success with UnloadedRunners=0, making it indistinguishable from the "nothing to unload" case and hiding real failures from users/CLI. Please return an error when unloadedCount is 0 and at least one unload failed, or otherwise surface a representative error while still attempting all unloads.
</issue_to_address>
### Comment 2
<location> `cmd/cli/desktop/desktop.go:960-956` </location>
<code_context>
+func (c *Client) LoadIntoMemory(model string) error {
</code_context>
<issue_to_address>
**suggestion (performance):** LoadIntoMemory fully buffers and parses the response without using it
In both LoadIntoMemory and UnloadFromMemory, the body is fully read and unmarshaled into a map, but that result is never used. If you only need to confirm a 200 status and valid JSON, consider discarding the body (e.g., io.Copy to io.Discard) or using json.Decoder without storing the result, or even skipping JSON parsing if non-200s are already handled as errors. This reduces allocations and overhead, especially for large responses.
</issue_to_address>
### Comment 3
<location> `pkg/ollama/handler.go:489-498` </location>
<code_context>
+ // Unload the model directly via the scheduler
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Unload response reports success even if no runners were actually unloaded
The updated unloadModel always returns GenerateResponse{Done: true, DoneReason: "unload"}, even when UnloadModels returns 0 (e.g., unknown model name or no matching runners). Consider using a different DoneReason (e.g., "not_loaded") when unloadedCount == 0, or at least logging a warning so operators can distinguish a successful unload from a no-op.
Suggested implementation:
```golang
// Unload the model directly via the scheduler
unloadedCount := h.scheduler.UnloadModels(ctx, []string{modelName}, "")
if unloadedCount == 0 {
h.log.Warnf("unloadModel: no runners were unloaded for model %s (model may not be loaded)", sanitizedModelName)
} else {
h.log.Infof("unloadModel: unloaded %d runner(s) for model %s", unloadedCount, sanitizedModelName)
}
```
To fully implement your suggestion about the API response:
1. Locate the return statement for this handler (likely returning a GenerateResponse or similar struct).
2. After the `unloadedCount` is computed, before returning, branch on `unloadedCount`:
- If `unloadedCount == 0`, set `DoneReason` (or equivalent field) to something like `"not_loaded"` and consider a boolean flag (if available) to indicate that this was a no-op.
- If `unloadedCount > 0`, keep the existing `"unload"` DoneReason.
3. Ensure any tests that assert on the DoneReason or handler behavior are updated or new tests are added to cover both cases (unloadedCount == 0 and unloadedCount > 0).
4. If there is a centralized place where DoneReasons are enumerated/validated, add `"not_loaded"` there as well.
</issue_to_address>
### Comment 4
<location> `pkg/inference/scheduling/scheduler.go:380-387` </location>
<code_context>
- if err := json.NewEncoder(w).Encode(unloadedRunners); err != nil {
- http.Error(w, fmt.Sprintf("Failed to encode response: %v", err), http.StatusInternalServerError)
- return
+// UnloadModels unloads the specified models from the backend.
+// Returns the number of unloaded runners.
+func (s *Scheduler) UnloadModels(ctx context.Context, models []string, backend string) int {
+ unloadRequest := UnloadRequest{
+ Models: models,
+ Backend: backend,
}
+ return s.loader.Unload(ctx, unloadRequest)
}
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Expose backend selection in UnloadModels more explicitly
Most callers (e.g., unloadModel) currently pass an empty backend string. Since this is now where backend selection is enforced (replacing the previous HTTP field), please document how an empty backend is handled (e.g., “all” vs “default”) and, if possible, validate the backend name before building the UnloadRequest so typos don’t result in a silent no-op.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Unload each model using /api/generate with keep_alive: 0 | ||
| unloadedCount := 0 | ||
| for _, model := range modelsToUnload { | ||
| if err := c.UnloadFromMemory(model); err != nil { | ||
| // Continue unloading other models even if one fails | ||
| // This matches the previous behavior where partial unloads were possible | ||
| continue | ||
| } | ||
| unloadedCount++ | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Unload now silently succeeds even if all unload operations fail
With this change, callers no longer see errors when unload requests fail. If every UnloadFromMemory call fails, we still return success with UnloadedRunners=0, making it indistinguishable from the "nothing to unload" case and hiding real failures from users/CLI. Please return an error when unloadedCount is 0 and at least one unload failed, or otherwise surface a representative error while still attempting all unloads.
| if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated { | ||
| return fmt.Errorf("load failed with status %s: %s", resp.Status, string(body)) | ||
| } | ||
| return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): LoadIntoMemory fully buffers and parses the response without using it
In both LoadIntoMemory and UnloadFromMemory, the body is fully read and unmarshaled into a map, but that result is never used. If you only need to confirm a 200 status and valid JSON, consider discarding the body (e.g., io.Copy to io.Discard) or using json.Decoder without storing the result, or even skipping JSON parsing if non-200s are already handled as errors. This reduces allocations and overhead, especially for large responses.
| // UnloadModels unloads the specified models from the backend. | ||
| // Returns the number of unloaded runners. | ||
| func (s *Scheduler) UnloadModels(ctx context.Context, models []string, backend string) int { | ||
| unloadRequest := UnloadRequest{ | ||
| Models: models, | ||
| Backend: backend, | ||
| } | ||
| return s.loader.Unload(ctx, unloadRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Expose backend selection in UnloadModels more explicitly
Most callers (e.g., unloadModel) currently pass an empty backend string. Since this is now where backend selection is enforced (replacing the previous HTTP field), please document how an empty backend is handled (e.g., “all” vs “default”) and, if possible, validate the backend name before building the UnloadRequest so typos don’t result in a silent no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces new CLI commands (load, start, stop) for managing models in memory, leveraging an Ollama-compatible API. The changes are a good step towards improving model lifecycle management. However, the review has identified a critical copy-paste error in main.go that will prevent compilation. Additionally, there are several opportunities for improvement, including refactoring duplicated code in the new command implementations, enhancing error handling to provide better user feedback, and optimizing client methods for better performance and type safety.
| // Add Ollama API compatibility layer (only register with trailing slash to catch sub-paths) | ||
| ollamaHandler := ollama.NewHandler(log, modelManager, scheduler, nil) | ||
| router.Handle(ollama.APIPrefix+"/", ollamaHandler) | ||
|
|
||
| // Register root handler LAST - it will only catch exact "/" requests that don't match other patterns | ||
| router.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { | ||
| // Only respond to exact root path | ||
| if r.URL.Path != "/" { | ||
| http.NotFound(w, r) | ||
| return | ||
| } | ||
| w.WriteHeader(http.StatusOK) | ||
| w.Write([]byte("Docker Model Runner is running")) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func (c *Client) Unload(req UnloadRequest) (UnloadResponse, error) { | ||
| unloadPath := inference.InferencePrefix + "/unload" | ||
| jsonData, err := json.Marshal(req) | ||
| if err != nil { | ||
| return UnloadResponse{}, fmt.Errorf("error marshaling request: %w", err) | ||
| } | ||
|
|
||
| resp, err := c.doRequest(http.MethodPost, unloadPath, bytes.NewReader(jsonData)) | ||
| if err != nil { | ||
| return UnloadResponse{}, c.handleQueryError(err, unloadPath) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| body, _ := io.ReadAll(resp.Body) | ||
| return UnloadResponse{}, fmt.Errorf("unloading failed with status %s: %s", resp.Status, string(body)) | ||
| } | ||
| var modelsToUnload []string | ||
|
|
||
| body, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| return UnloadResponse{}, fmt.Errorf("failed to read response body: %w", err) | ||
| // If All is true, get all running models first | ||
| if req.All { | ||
| backends, err := c.PS() | ||
| if err != nil { | ||
| return UnloadResponse{}, fmt.Errorf("failed to get running models: %w", err) | ||
| } | ||
| for _, backend := range backends { | ||
| // Filter by backend if specified | ||
| if req.Backend == "" || backend.BackendName == req.Backend { | ||
| modelsToUnload = append(modelsToUnload, backend.ModelName) | ||
| } | ||
| } | ||
| } else { | ||
| modelsToUnload = req.Models | ||
| } | ||
|
|
||
| var unloadResp UnloadResponse | ||
| if err := json.Unmarshal(body, &unloadResp); err != nil { | ||
| return UnloadResponse{}, fmt.Errorf("failed to unmarshal response body: %w", err) | ||
| // Unload each model using /api/generate with keep_alive: 0 | ||
| unloadedCount := 0 | ||
| for _, model := range modelsToUnload { | ||
| if err := c.UnloadFromMemory(model); err != nil { | ||
| // Continue unloading other models even if one fails | ||
| // This matches the previous behavior where partial unloads were possible | ||
| continue | ||
| } | ||
| unloadedCount++ | ||
| } | ||
|
|
||
| return unloadResp, nil | ||
| return UnloadResponse{UnloadedRunners: unloadedCount}, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Unload function, errors from c.UnloadFromMemory(model) are silently ignored. While it's good to continue processing other models, swallowing errors can hide important issues from the user and make debugging difficult. It would be better to collect these errors and return them as a single, combined error at the end of the function.
func (c *Client) Unload(req UnloadRequest) (UnloadResponse, error) {
var modelsToUnload []string
// If All is true, get all running models first
if req.All {
backends, err := c.PS()
if err != nil {
return UnloadResponse{}, fmt.Errorf("failed to get running models: %w", err)
}
for _, backend := range backends {
// Filter by backend if specified
if req.Backend == "" || backend.BackendName == req.Backend {
modelsToUnload = append(modelsToUnload, backend.ModelName)
}
}
} else {
modelsToUnload = req.Models
}
// Unload each model using /api/generate with keep_alive: 0
var unloadedCount int
var errs []error
for _, model := range modelsToUnload {
if err := c.UnloadFromMemory(model); err != nil {
// Continue unloading other models even if one fails, but collect errors.
errs = append(errs, fmt.Errorf("failed to unload model %q: %w", model, err))
continue
}
unloadedCount++
}
return UnloadResponse{UnloadedRunners: unloadedCount}, errors.Join(errs...)
}| func newLoadCmd() *cobra.Command { | ||
| const cmdArgs = "MODEL [MODEL ...]" | ||
| c := &cobra.Command{ | ||
| Use: "load " + cmdArgs, | ||
| Short: "Load models into memory", | ||
| RunE: func(cmd *cobra.Command, modelArgs []string) error { | ||
| if len(modelArgs) == 0 { | ||
| return fmt.Errorf( | ||
| "'docker model load' requires at least one MODEL.\n\n" + | ||
| "Usage: docker model load " + cmdArgs + "\n\n" + | ||
| "See 'docker model load --help' for more information.", | ||
| ) | ||
| } | ||
|
|
||
| // Load each model | ||
| for _, model := range modelArgs { | ||
| if err := desktopClient.LoadIntoMemory(model); err != nil { | ||
| return handleClientError(err, fmt.Sprintf("Failed to load model %s", model)) | ||
| } | ||
| cmd.Printf("Loaded model: %s\n", model) | ||
| } | ||
|
|
||
| return nil | ||
| }, | ||
| ValidArgsFunction: completion.NoComplete, | ||
| } | ||
| return c | ||
| } | ||
|
|
||
| func newStartCmd() *cobra.Command { | ||
| const cmdArgs = "MODEL [MODEL ...]" | ||
| c := &cobra.Command{ | ||
| Use: "start " + cmdArgs, | ||
| Short: "Start models (alias for load)", | ||
| RunE: func(cmd *cobra.Command, modelArgs []string) error { | ||
| if len(modelArgs) == 0 { | ||
| return fmt.Errorf( | ||
| "'docker model start' requires at least one MODEL.\n\n" + | ||
| "Usage: docker model start " + cmdArgs + "\n\n" + | ||
| "See 'docker model start --help' for more information.", | ||
| ) | ||
| } | ||
|
|
||
| // Load each model (start is an alias for load) | ||
| for _, model := range modelArgs { | ||
| if err := desktopClient.LoadIntoMemory(model); err != nil { | ||
| return handleClientError(err, fmt.Sprintf("Failed to start model %s", model)) | ||
| } | ||
| cmd.Printf("Started model: %s\n", model) | ||
| } | ||
|
|
||
| return nil | ||
| }, | ||
| ValidArgsFunction: completion.NoComplete, | ||
| } | ||
| return c | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions newLoadCmd and newStartCmd contain a significant amount of duplicated code. To improve maintainability and reduce redundancy, you can extract the common logic into a single factory function. This function would create the cobra.Command and accept parameters for the parts that differ, such as the command name, short description, and the verb used in output messages (Loaded/Started).
func newModelLoadCmd(use, short, verb string) *cobra.Command {
const cmdArgs = "MODEL [MODEL ...]"
c := &cobra.Command{
Use: use + " " + cmdArgs,
Short: short,
RunE: func(cmd *cobra.Command, modelArgs []string) error {
if len(modelArgs) == 0 {
return fmt.Errorf(
"'docker model %s' requires at least one MODEL.\n\n"+
"Usage: docker model %s "+cmdArgs+"\n\n"+
"See 'docker model %s --help' for more information.",
use, use, use,
)
}
// Load each model
for _, model := range modelArgs {
if err := desktopClient.LoadIntoMemory(model); err != nil {
return handleClientError(err, fmt.Sprintf("Failed to %s model %s", use, model))
}
cmd.Printf("%s model: %s\n", verb, model)
}
return nil
},
ValidArgsFunction: completion.NoComplete,
}
return c
}
func newLoadCmd() *cobra.Command {
return newModelLoadCmd("load", "Load models into memory", "Loaded")
}
func newStartCmd() *cobra.Command {
return newModelLoadCmd("start", "Start models (alias for load)", "Started")
}| for _, model := range modelArgs { | ||
| if err := desktopClient.UnloadFromMemory(model); err != nil { | ||
| return handleClientError(err, fmt.Sprintf("Failed to stop model %s", model)) | ||
| } | ||
| cmd.Printf("Stopped model: %s\n", model) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop for stopping models currently exits on the first error. This prevents subsequent models in the argument list from being stopped. It would be more robust to attempt to stop all specified models, collect any errors that occur, and report them at the end. This ensures that a single failure doesn't halt the entire operation.
| for _, model := range modelArgs { | |
| if err := desktopClient.UnloadFromMemory(model); err != nil { | |
| return handleClientError(err, fmt.Sprintf("Failed to stop model %s", model)) | |
| } | |
| cmd.Printf("Stopped model: %s\n", model) | |
| } | |
| for _, model := range modelArgs { | |
| if err := desktopClient.UnloadFromMemory(model); err != nil { | |
| // It's better to report errors and continue, rather than stopping on the first failure. | |
| cmd.PrintErrf("Failed to stop model %s: %v\n", model, err) | |
| continue | |
| } | |
| cmd.Printf("Stopped model: %s\n", model) | |
| } |
| // LoadIntoMemory loads a model into memory using the Ollama-compatible API | ||
| func (c *Client) LoadIntoMemory(model string) error { | ||
| model = normalizeHuggingFaceModelName(model) | ||
|
|
||
| // Create request body with empty prompt to load model | ||
| reqBody := map[string]interface{}{ | ||
| "model": model, | ||
| } | ||
|
|
||
| jsonData, err := json.Marshal(reqBody) | ||
| if err != nil { | ||
| return fmt.Errorf("error marshaling request: %w", err) | ||
| } | ||
|
|
||
| // Use /api/generate endpoint with empty prompt to load model | ||
| generatePath := "/api/generate" | ||
| resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData)) | ||
| if err != nil { | ||
| return c.handleQueryError(err, generatePath) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| body, _ := io.ReadAll(resp.Body) | ||
| return fmt.Errorf("loading %s failed with status %s: %s", model, resp.Status, string(body)) | ||
| } | ||
|
|
||
| // Read the response to ensure it's valid | ||
| body, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to read response body: %w", err) | ||
| } | ||
|
|
||
| // Parse response to verify success | ||
| var response map[string]interface{} | ||
| if err := json.Unmarshal(body, &response); err != nil { | ||
| return fmt.Errorf("failed to parse response: %w", err) | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can be improved in two ways:
- Type Safety: Using
map[string]interface{}for the request body is not type-safe. Defining a specific struct for the request improves readability and allows the compiler to catch potential errors. - Efficiency: The response body is read and unmarshalled, but the result is never used. If the goal is simply to consume the response body to allow for connection reuse, it's more efficient to use
io.Copy(io.Discard, resp.Body).
func (c *Client) LoadIntoMemory(model string) error {
model = normalizeHuggingFaceModelName(model)
// Create request body with empty prompt to load model
type loadRequest struct {
Model string `json:"model"`
}
reqBody := loadRequest{
Model: model,
}
jsonData, err := json.Marshal(reqBody)
if err != nil {
return fmt.Errorf("error marshaling request: %w", err)
}
// Use /api/generate endpoint with empty prompt to load model
generatePath := "/api/generate"
resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData))
if err != nil {
return c.handleQueryError(err, generatePath)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return fmt.Errorf("loading %s failed with status %s: %s", model, resp.Status, string(body))
}
// Discard the response body to allow connection reuse.
if _, err := io.Copy(io.Discard, resp.Body); err != nil {
return fmt.Errorf("failed to read response body: %w", err)
}
return nil
}| // UnloadFromMemory unloads a model from memory using the Ollama-compatible API | ||
| func (c *Client) UnloadFromMemory(model string) error { | ||
| model = normalizeHuggingFaceModelName(model) | ||
|
|
||
| // Create request body with empty prompt and keep_alive: "0" | ||
| reqBody := map[string]interface{}{ | ||
| "model": model, | ||
| "keep_alive": "0", | ||
| } | ||
|
|
||
| jsonData, err := json.Marshal(reqBody) | ||
| if err != nil { | ||
| return fmt.Errorf("error marshaling request: %w", err) | ||
| } | ||
|
|
||
| // Use /api/generate endpoint with keep_alive: 0 to unload model | ||
| generatePath := "/api/generate" | ||
| resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData)) | ||
| if err != nil { | ||
| return c.handleQueryError(err, generatePath) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| body, _ := io.ReadAll(resp.Body) | ||
| return fmt.Errorf("unloading %s failed with status %s: %s", model, resp.Status, string(body)) | ||
| } | ||
|
|
||
| // Read the response to ensure it's valid | ||
| body, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to read response body: %w", err) | ||
| } | ||
|
|
||
| // Parse response to verify unload | ||
| var response map[string]interface{} | ||
| if err := json.Unmarshal(body, &response); err != nil { | ||
| return fmt.Errorf("failed to parse response: %w", err) | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can be improved in two ways:
- Type Safety: Using
map[string]interface{}for the request body is not type-safe. Defining a specific struct for the request improves readability and allows the compiler to catch potential errors. - Efficiency: The response body is read and unmarshalled, but the result is never used. If the goal is simply to consume the response body to allow for connection reuse, it's more efficient to use
io.Copy(io.Discard, resp.Body).
func (c *Client) UnloadFromMemory(model string) error {
model = normalizeHuggingFaceModelName(model)
// Create request body with empty prompt and keep_alive: "0"
type unloadRequest struct {
Model string `json:"model"`
KeepAlive string `json:"keep_alive"`
}
reqBody := unloadRequest{
Model: model,
KeepAlive: "0",
}
jsonData, err := json.Marshal(reqBody)
if err != nil {
return fmt.Errorf("error marshaling request: %w", err)
}
// Use /api/generate endpoint with keep_alive: 0 to unload model
generatePath := "/api/generate"
resp, err := c.doRequest(http.MethodPost, generatePath, bytes.NewReader(jsonData))
if err != nil {
return c.handleQueryError(err, generatePath)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return fmt.Errorf("unloading %s failed with status %s: %s", model, resp.Status, string(body))
}
// Discard the response body to allow connection reuse.
if _, err := io.Copy(io.Discard, resp.Body); err != nil {
return fmt.Errorf("failed to read response body: %w", err)
}
return nil
}
No description provided.