Skip to content

Conversation

@e-gineer
Copy link
Contributor

Summary

Fixes a bug where sanitiseTableName() in the interactive client doesn't escape unicode or emoji characters in table names, causing SQL errors and potential security issues.

PostgreSQL requires identifier escaping for non-ASCII characters, but the function only checked for spaces, hyphens, and uppercase letters.

Changes

  • Commit 1: Added comprehensive test demonstrating the bug with various unicode/emoji cases
  • Commit 2: Implemented fix by adding containsNonASCII() helper to detect and escape non-ASCII characters

Test Results

Before fix (Commit 1):

--- FAIL: TestSanitiseTableName/unicode_table_name (0.00s)
--- FAIL: TestSanitiseTableName/emoji_in_table_name (0.00s)
--- FAIL: TestSanitiseTableName/qualified_with_unicode (0.00s)
--- FAIL: TestSanitiseTableName/mixed_unicode_and_ascii (0.00s)
--- FAIL: TestSanitiseTableName/cyrillic_characters (0.00s)
--- FAIL: TestSanitiseTableName/arabic_characters (0.00s)

After fix (Commit 2):

--- PASS: TestSanitiseTableName (0.00s)
    --- PASS: TestSanitiseTableName/unicode_table_name (0.00s)
    --- PASS: TestSanitiseTableName/emoji_in_table_name (0.00s)
    --- PASS: TestSanitiseTableName/qualified_with_unicode (0.00s)
    --- PASS: TestSanitiseTableName/mixed_unicode_and_ascii (0.00s)
    --- PASS: TestSanitiseTableName/cyrillic_characters (0.00s)
    --- PASS: TestSanitiseTableName/arabic_characters (0.00s)
PASS

Impact

  • ✅ Fixes SQL errors with international table names
  • ✅ Improves internationalization support
  • ✅ Prevents potential SQL injection with crafted unicode
  • ✅ Maintains backward compatibility (all existing tests pass)

Severity

MEDIUM - SQL errors / potential security issue

Files Modified

  • pkg/interactive/interactive_helpers_test.go (new file - test)
  • pkg/interactive/interactive_client_autocomplete.go (production fix)

PostgreSQL requires identifier escaping for unicode and emoji characters,
but sanitiseTableName() only checks for spaces, hyphens, and uppercase.

This test demonstrates the bug by showing that table names with:
- Chinese characters (用户)
- Emoji (😀)
- French accented characters (données)
- Cyrillic characters (таблица)
- Arabic characters (جدول)

are not being escaped as they should be.

The test fails on all unicode/emoji cases, proving the bug exists.
@e-gineer e-gineer force-pushed the fix-4801-unicode-escape branch from 8a89595 to 0b0a131 Compare November 14, 2025 10:21
PostgreSQL requires escaping for non-ASCII characters in identifiers.
The sanitiseTableName() function previously only checked for spaces,
hyphens, and uppercase letters, but missed unicode and emoji characters.

Added containsNonASCII() helper function to detect any characters with
rune values > 127 (non-ASCII). Now properly escapes table names with:
- Unicode characters (Chinese, Cyrillic, Arabic, etc.)
- Emoji
- Accented characters

This prevents SQL errors and potential injection issues when using
international table names in the interactive client.
@e-gineer e-gineer requested a review from pskrbasu November 15, 2025 12:33
@e-gineer
Copy link
Contributor Author

This PR correctly fixes the unicode/emoji escaping bug, but it raises an interesting UX question worth discussing:

Current Behavior: Text ≠ Output

Looking at interactive_client_autocomplete.go:87-103:

// Qualified tables
{Text: qualifiedTableName, Output: qualifiedTableName}  // Line 88

// Unqualified tables  
{Text: tableName, Output: sanitiseTableName(tableName)}  // Line 103

The inconsistency:

  • What user sees in autocomplete: MyTable (clean, unescaped)
  • What gets inserted into SQL: "MyTable" (escaped)

This violates WYSIWYG principles and could be surprising to users.

The Question

Should we show the escaped form in autocomplete to match what actually gets inserted?

// Option 1 (current): Clean display, escaped output
{Text: "MyTable", Output: "\"MyTable\""}
           ↑ user seesgets inserted

// Option 2 (WYSIWYG): Show escaped form
{Text: "\"MyTable\"", Output: "\"MyTable\""}
              ↑ user seesgets inserted (same!)

Trade-offs

Current approach (clean display):

  • ✅ Cleaner, more readable autocomplete list
  • ✅ Easier visual scanning
  • ✅ Matches behavior of some SQL tools
  • ❌ Not WYSIWYG - surprising when quotes appear
  • ❌ Users can't tell which tables will be escaped

WYSIWYG approach (show escaped):

  • ✅ Transparent - what you see = what you get
  • ✅ Educational - helps users understand PostgreSQL identifier rules
  • ✅ No surprises
  • ✅ Better for developer tool (vs consumer app)
  • ❌ More visual clutter
  • ❌ Harder to scan when many tables have quotes

My Take

For Steampipe (a developer tool), transparency > visual cleanliness. Developers expect to see the actual SQL that will be used. The "ugliness" of quotes is actually useful information.

Worth discussing whether to show the escaped form in the Text field as well, either in this PR or as a follow-up improvement.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants