Skip to content

Conversation

@mwilsnd
Copy link
Collaborator

@mwilsnd mwilsnd commented Dec 2, 2025

Launch Checklist

This fixes #6730

By returning raw feature data from the web worker instead of serializing, we can regain support for undefined GeoJSON properties and get a nice performance boost.

  • Confirm your changes do not include backports from Mapbox projects (unless with compliant license) - if you are not sure about this, please ask!
  • Briefly describe the changes in this PR.
  • Link to related issues.
  • Include before/after visuals or gifs if this PR includes visual changes.
  • Write tests for all new functionality.
  • Document any changes to public APIs.
  • Post benchmark scores.
  • Add an entry to CHANGELOG.md under the ## main section.

@mwilsnd mwilsnd force-pushed the fix-undefined-mvt-properties branch from 1c98c72 to db722ad Compare December 2, 2025 20:34

abort: AbortController;
vectorTile: VectorTileLike;
geoJsonFeatures?: Feature[];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be inferred from the vectorTile? It is mandatory to add this field?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so - with the update to vt-pbf I can likely simplify things further.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look it is being used in the parse method, so I wonder if this can be removed from WorkerTile class...

}

workerTile.vectorTile = response.vectorTile;
workerTile.geoJsonFeatures = response.isGeojson ? (response.vectorTile as GeoJSONWrapper).features : null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can be done better considering we have inheritance here with geojson and vector tile...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is reflecting the type of an object here more or less the same, performance-wise? No issues with changing this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less about reflection, more about OOP... if you are doing as GeoJSONWrapper in a class that doesn't suppose to know about GeoJSONWrapper then it seems weird...

'type': 'Point',
'coordinates': [0, 0]
},
'properties': {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to split this out to a different test...

@HarelM
Copy link
Collaborator

HarelM commented Dec 2, 2025

I can't say I'm loving this, seems to heckish, but I can't place my finger on a specific issue, what other alternatives do we have to solve the bug?

return {
vectorTile: geojsonWrapper,
isGeojson: true
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we convert the GeoJSON data to a vector tile here, the data is encoded as a UInt8Array. This can be transferred to the main thread with zero overhead via the WebWorker transfer API.

The plain GeoJSON data cannot take advantage of the transfer API and will be much slower.

See https://developer.chrome.com/blog/transferable-objects-lightning-fast/ for more info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested against my performance benchmark (add 100k points, move one, https://jsbin.com/bubejanoha/edit?html,output) this branch is ~100% slower than main.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running your tool, I'm seeing about a 15% regression on my system over an average of 40 samples in Firefox. Could you add this test to the benchmarks? I think it'd be really valuable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gotta step out for the afternoon but I'll try to get back to these questions in the next ~12 hrs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running your tool, I'm seeing about a 15% regression on my system over an average of 40 samples in Firefox.

Are you sure you're running npm run build-dist on this branch and on main? The dist build is not generated automatically as part of the dev server. I'm certain that you'll see a measurable and noticeable difference.

Could you add this test to the benchmarks? I think it'd be really valuable.

I added a GeoJSONDiff benchmark that measures perf on a relatively small (10k features) source. You can manually edit the bench to test a large (100k features) source but the runtime is prohibitively long to include in the standard suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Prior to merging main my local branch was behind enough to not include your GeoJSONDiff benchmark.

If you pull in the latest changes from main you can compare this branch to main with one command. It takes a minute to run. Good time to get a glass of water, bother your pet, etc.

npm run benchmark -- GeoJSONSourceUpdateData && npm run benchmark -- GeoJSONSourceSetData 

Indirectly looking up properties after querying the feature index might be the best approach if it works.

I think we can make it work! The one big hurdle to navigate will be supporting GeoJSON sources with string feature IDs or no feature IDs. I'm happy to chat more, help brainstorm or prototype but I respect that you're the primary issue owner here.

Copy link
Contributor

@lucaswoj lucaswoj Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a couple other performance solutions worth considering...

  1. pin to the previous vector-tile-js version
  2. fork vector-tile-js to maintain support for null
  3. accept the breaking change (how many users are really differentiating between null and undefined in queryRenderedFeatures results?)
  4. use a "sentinel" value to represent null in vector tiles (like the string "__MAPLIBRE_NULL_VALUE__")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 & 4 are certainly worth considering but I'm hesitant about using an older or modified version of vector-tile-js. For better or worse the current in-memory tile representation used is coupled to MVT, trying to ignore the specification (even if a minor deviation) doesn't sound too appealing to me.

While I question the usefulness of undefined/null GeoJSON properties, that specification allows for them so if possible I'd prefer a solution that complies with both formats. Other options like custom serialization, a forked vector tile deserializer or tile feature culling on the main thread would all add more complexity compared to a simple sentinel, so I'm currently leaning more towards that approach. An indirect lookup from the feature index would be the runner up, with the perk of not needing a special reserved property value.

@HarelM What are your thoughts on the possible approaches here? If we're fine introducing another reserved string then this whole PR becomes a lot more simple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that specification allows for them so if possible I'd prefer a solution that complies with both formats

queryRenderedFeatures has never supported the GeoJSON specification. This project's stance has consistently been that GeoJSON gets converted to a vector tile internally and that's what comes out of qRF. For example, it's never supported arrays or nested objects. Support for the null value was a quirk and does not indicate a broader commitment to full support of the GeoJSON specification.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but there are a lot of bugs/feature requests on that topic, people keep finding it cumbersome.
If we could use feature index and be able to return the original feature with the original geometry from updatable that could be a significant win in terms of user expectations as users don't really care that we use MVT under the hood, they want their data back the same way they entered it.
But this might be a breaking change, for good, but still a breaking change.
If we manage to do it properly it's a good reason to start v6 breaking changes version.
So if this is possible I'm in favor of solving geojson quirkiness once and for all.

@mwilsnd
Copy link
Collaborator Author

mwilsnd commented Dec 2, 2025

If it's the case that copying the raw feature data is slower than serializing the tile to protobuf, then the remaining options to fix this issue are a bit more involved:

  • Custom binary serialization for GeoJSON worker tiles
  • Perform feature tile culling on the main thread and pass those results to the feature index (we already get feature culling from geojsonvt on the worker, which has to run anyways for creating the buckets)
  • Revert to mapbox/vector-tile-js v2.0.3 and pin it there, accept operating outside the vector tile specification
    Or highly unlikely: petition to review the vector tile specification with support for undefined properties

I'm not certain what the best approach here is, we can't get away with just serializing to MVT anymore unless we pin an old version and stay there or get the spec changed.

@lucaswoj
Copy link
Contributor

lucaswoj commented Dec 2, 2025

Probably not a small lift, but can we serialize to MLT now?

@HarelM
Copy link
Collaborator

HarelM commented Dec 2, 2025

No, we don't have a js derializer for mlt at the moment... :-/

@lucaswoj
Copy link
Contributor

lucaswoj commented Dec 2, 2025

I'm just getting up to speed on this issue, so I apologize if this is already answered somewhere.

Why can't we simply filter out any GeoJSON property keys that have undefined or null values?

@mwilsnd
Copy link
Collaborator Author

mwilsnd commented Dec 2, 2025

While filtering would avoid serializing the now unsupported properties, that would result in those values not making it into the feature index. Since the GeoJSON spec explicitly supports null/undefined properties, we want to see those keys persist in queryRenderedFeatures.

@HarelM
Copy link
Collaborator

HarelM commented Dec 2, 2025

Because these are valid geojson definitions...
Not a great argument, but still...

@HarelM
Copy link
Collaborator

HarelM commented Dec 2, 2025

I'm guessing mapbox should have the same issue when I think about it.
Have we tried opening an issue in the relevant repo?
Maybe this can be made as an optional flag to the encoder so that by default it throws but it can be configured not to?

@lucaswoj
Copy link
Contributor

lucaswoj commented Dec 2, 2025

Did we return null / undefined from queryRenderedFeatures before #6131?

Is the goal of this PR restore pre-#6131 behavior? or to expand our support for the GeoJSON spec in a new way?

@HarelM
Copy link
Collaborator

HarelM commented Dec 2, 2025

As a side note, performance is less important than "error handling", i.e. if we need to solve this and the only way is by degrading performance, then we'll need to bite the bullet.
In the past this used to work without throwing exceptions, I'm not 100% sure that was the behavior, but it should be easy to check.

@mwilsnd
Copy link
Collaborator Author

mwilsnd commented Dec 2, 2025

Undefined properties were previously supported. vt-pbf explicitly handles undefined so this has been the status quo for quite a while. I'm not sure what prompted Mapbox to make the change in vector-tile-js, or why it was done in a minor version increment, but I'd like to restore the old behavior.

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.39%. Comparing base (2cf5d85) to head (619322b).

Files with missing lines Patch % Lines
src/source/vector_tile_worker_source.ts 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6803      +/-   ##
==========================================
- Coverage   92.42%   92.39%   -0.03%     
==========================================
  Files         288      288              
  Lines       23807    23817      +10     
  Branches     5056     5064       +8     
==========================================
+ Hits        22003    22006       +3     
- Misses       1804     1811       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lucaswoj
Copy link
Contributor

lucaswoj commented Dec 3, 2025

Revert to mapbox/vector-tile-js v2.0.3 and pin it there, accept operating outside the vector tile specification

Are there any immediate consequences to doing this? I can help implement a longer term solution if this is an acceptable temporary bandaid.

I've spent about 100 hours optimizing GeoJSON#updateData performance over the past few months. I am highly motivated to help find a solution that both preserves that perf improvement for me and fixes this behavior for you.

@HarelM
Copy link
Collaborator

HarelM commented Dec 3, 2025

A user who needs this fixed can simply rollback to previous msplibre version, so we can take the time and fix this properly.

Is it possible to use JSON.strinfigy with TextEncoder to pass the geojson data to the main thread instead of coverting it to MVT? There are other limitations of MVT that do no apply to geojson (like objects in properties) that I'll be happy to solve too.

@HarelM
Copy link
Collaborator

HarelM commented Dec 3, 2025

If we are willing to entertain other formats, we can have a look at geobuf and geoarrow.
I don't know the implication of package size though...

@ingalls
Copy link

ingalls commented Dec 3, 2025

@HarelM We've been benefiting enormously from the performance question on GeoJSON sources that @lucaswoj has been working on over the last couple of months. We have 10-100k points updated on a regular basis as we have live first responder locations for all of Colorado, as such we are likely more sensitive to performance loss as we've been working hard to try to get things as smooth as possible.

As such if the undefined/null values are something that we want to support moving forward, would it make more sense to simply fork the vector-tile-js library or pin it to the previous versions as a stop gap?

@lucaswoj
Copy link
Contributor

lucaswoj commented Dec 3, 2025

Another thought: the GeoJSONSource already has all the raw GeoJSON features on the main thread (GeoJSONSource._data), so we could read the properties directly from there instead of sending it b back and forth between threads.

@HarelM
Copy link
Collaborator

HarelM commented Dec 3, 2025

I think we came to the conclusion this is not the case for data set as irl, didn't we?

@lucaswoj
Copy link
Contributor

lucaswoj commented Dec 3, 2025

I think we came to the conclusion this is not the case for data set as irl, didn't we?

It wouldn't be very hard to send the full GeoJSON data to the main thread in just that case

@lucaswoj
Copy link
Contributor

lucaswoj commented Dec 4, 2025

EDIT: moved to thread #6803 (comment)

@HarelM
Copy link
Collaborator

HarelM commented Dec 4, 2025

Fetching the features on the main thread would still need to slice them up to tiles for querying which is also heavy.
It maybe can be done lazily...

@lucaswoj
Copy link
Contributor

lucaswoj commented Dec 4, 2025

Fetching the features on the main thread would still need to slice them up to tiles for querying which is also heavy. It maybe can be done lazily...

Could we keep using FeatureIndex for the spatial query, then look up the original feature properties by ID from GeoJSONSource#_data.updatable at the end? That way we get the undefined/null properties without re-slicing on the main thread.

@lucaswoj
Copy link
Contributor

lucaswoj commented Dec 4, 2025

I think we came to the conclusion this is not the case for data set as irl, didn't we?

Just created #6811 to provide the GeoJSON data back to the main thread when it's loaded from a URL

@HarelM
Copy link
Collaborator

HarelM commented Dec 5, 2025

@lucaswoj that's a very interesting solution, see my comment here:
#6803 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geoJSON source's updateData throws with undefined properties

4 participants