The Geoconnex scheduler crawls water data metadata on a schedule and synchronizes with a graph database.
- it crawls sitemaps with
nabu harvestand downloads it to an S3 bucket - it syncs data between the S3 bucket and the graph database using
nabu sync
For more information about the Geoconnex project generally and how it aims to improve water data infrastructure, see the Geoconnex docs.
Important
You must have uv installed for package management
Install dependencies and spin up necessary Docker services:
make deps && make devThen go to localhost:3000
- Spin up all services as containers including user code and local db/s3 containers (make sure to set the
DAGSTER_POSTGRES_HOSTenv var todagster_postgres)
make prodSpin up user code and essential services but not storage (You will need to specify your db/s3 endpoints and any other remote services in the .env file)
make cloudProdAll cloud deployment and infrastructure as code work is contained within the harvest.geoconnex.us repo
- All env vars must be defined in
.envat the root of the repo - The
.env.examplefile will be copied to.envif it does not exist
- Spin up the local dev environment
- Run
make testfrom the root to execute tests - If you use VSCode, run the task
dagster devin the debug panel to run the full pipeline with the ability to set breakpoints
This repository is a heavily modified version of gleanerio/scheduler and is licensed under Apache 2.0