Skip to content

Conversation

@ericokuma
Copy link
Contributor

WHY: To enable unredacted session replays for non-EU/CA users while maintaining privacy compliance for regulated regions. This change implements capture-time masking based on user's GeoIP location, as recommended by PostHog (APP-582).

WHAT: Configured PostHog session_recording to dynamically apply masking:

  • Utilizes GeoIP data to identify EU/CA users and sets an is_regulated_region person property.
  • Applies a custom maskInputFn that masks inputs only for regulated users (passwords are always masked).
  • Integrated into posthog.init loaded and onFeatureFlags callbacks for timely GeoIP evaluation.

Checklist:

  • Covered by tests
  • Ran it and it works as intended
  • Reviewed the diff before requesting a review
  • Checked for unhandled edge cases
  • Linked the issues it closes
  • Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
  • Intend to cherry-pick into the release branch
  • I'm proud of this work!

Linear Issue: APP-582

Open in Cursor Open in Web

@ericokuma ericokuma marked this pull request as ready for review November 20, 2025 02:53
@ericpgreen2
Copy link
Contributor

Code Organization Suggestion

Lines 57-125 have several function declarations nested inside initPosthog, which makes the code harder to follow, test, and maintain. Consider extracting these functions to the module level:

Current issues:

  • persistRegulatedStatus, evaluateGeoRegulation, and geoMaskInputFn are all declared inside initPosthog
  • Makes unit testing difficult (can't test these functions in isolation)
  • Creates unnecessary cognitive overhead when reading the code
  • The masking function closes over isRegulatedUser which may not update correctly (race condition risk)

Suggested refactoring:

  1. Extract helper functions to module level:

    • normalizeRegionCode(value: unknown): string | undefined
    • isRegulatedRegion(country?: string, subdivision?: string): boolean
  2. Create a well-documented createGeoAwareMaskingFunction() that:

    • Returns the masking function
    • Reads regulation status directly from PostHog properties on each invocation (fixes race condition)
    • Has clear JSDoc explaining behavior for passwords vs other inputs
  3. Extract evaluateAndPersistGeoRegulation() as a standalone function that:

    • Can be easily unit tested
    • Has clear JSDoc explaining the priority logic (existing value → calculate from GeoIP → persist)

This would make the initPosthog function much simpler and focused on just initialization, while the complex logic is extracted into testable, documented functions.


Developed in collaboration with Claude Code

@ericpgreen2
Copy link
Contributor

ericpgreen2 commented Nov 25, 2025

PostHog API Concerns

The PosthogWithGeo type extension is unnecessary and problematic:

type PosthogWithGeo = typeof posthog & {
  get_property?: (property: string) => unknown;
  set_person_properties?: (properties: Record<string, unknown>) => void;
  onFeatureFlags?: (callback: () => void) => void;
};

Issues:

  1. Wrong method name: set_person_properties doesn't exist in PostHog's API. The correct method is setPersonProperties (camelCase). This means the current code is silently failing to persist the regulation status.

  2. Unnecessary type extension: All three methods (get_property, setPersonProperties, onFeatureFlags) are official public PostHog APIs already included in the type definitions. The type extension and cast bypass TypeScript's type safety for no benefit.

  3. Misleading optional chaining: Using ?. suggests these methods might not exist, when they're guaranteed to be present in the PostHog API.

Recommendation:

Remove the type extension entirely and use the posthog object directly:

// Remove this:
type PosthogWithGeo = typeof posthog & { ... };
const geoAwarePosthog = posthog as PosthogWithGeo;

// Just use posthog directly with the correct method names:
posthog.get_property(REGULATED_REGION_PROPERTY);
posthog.setPersonProperties({ [REGULATED_REGION_PROPERTY]: value });
posthog.onFeatureFlags(evaluateGeoRegulation);

This will restore proper TypeScript type checking and fix the bug where regulation status isn't being persisted.


Developed in collaboration with Claude Code

@ericpgreen2
Copy link
Contributor

Performance & Initialization Concerns

Potential Issue: The GeoIP evaluation logic runs synchronously during PostHog initialization, which could impact application load time.

Current flow:

posthog.init(POSTHOG_API_KEY, {
  // ...
  loaded: (client) => {
    client.register_for_session({
      "Rill version": rillVersion,
    });
    evaluateGeoRegulation();  // Runs synchronously in loaded callback
    geoAwarePosthog.onFeatureFlags?.(evaluateGeoRegulation);  // Also runs on every feature flag change
  },
});

Concerns:

  1. loaded callback timing: The loaded callback fires after PostHog's initial setup, but before it's fully ready. If GeoIP data hasn't been fetched yet (likely on first load), the regulation status won't be set, and you'll default to masking everything.

  2. GeoIP data availability: PostHog fetches GeoIP data asynchronously from their backend. On first page load, $geoip_country_code is almost certainly undefined, so the masking will default to regulated=true.

  3. onFeatureFlags overhead: This callback fires every time feature flags are evaluated, which could be frequent. Running evaluateGeoRegulation() on every feature flag change is wasteful since GeoIP data doesn't change during a session.

  4. Network latency: PostHog's GeoIP lookup happens server-side when they process the first event. The data comes back asynchronously and gets cached in the user's properties. This could take 100-500ms depending on network conditions.

Impact on app load:

  • Good news: The code itself is not blocking - it's all synchronous property reads from PostHog's in-memory cache
  • ⚠️ Real issue: The first session recording will likely mask everything because GeoIP data won't be available yet
  • ⚠️ Subsequent sessions: Should work correctly once PostHog has fetched and cached the GeoIP data

Recommendations:

  1. Accept the first-session limitation: Document that the first session recording for a new user will be fully masked until GeoIP data is available. This is actually safer from a privacy perspective.

  2. Remove redundant onFeatureFlags callback: Since GeoIP data is user-level and doesn't change, you only need to evaluate once when it becomes available:

    loaded: (client) => {
      client.register_for_session({ "Rill version": rillVersion });
      
      // Try to evaluate immediately
      evaluateGeoRegulation();
      
      // Only re-evaluate if we didn't get a result
      const status = posthog.get_property(REGULATED_REGION_PROPERTY);
      if (status === undefined) {
        // Wait for GeoIP data via feature flags
        posthog.onFeatureFlags(() => {
          const country = posthog.get_property(GEOIP_COUNTRY_PROPERTY);
          if (country) {
            evaluateGeoRegulation();
          }
        });
      }
    }
  3. Consider a timeout: If GeoIP data doesn't arrive within a reasonable time (e.g., 5 seconds), log a warning but continue with default masking.

Bottom line: The code won't slow down app initialization, but the masking behavior may not work correctly until the second session for new users.


Developed in collaboration with Claude Code

Copy link
Contributor

@ericpgreen2 ericpgreen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the Claude Code comments above

Copy link
Contributor Author

Got a new direction from @nishant.bangarwa: https://www.notion.so/rilldata/Reduce-Data-Obfuscation-in-PostHog-2b6ba33c8f5780d69f22cdb8d6346dea?source=copy_link

We will move away from region-based masking

@ericokuma ericokuma closed this Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants