Skip to content

internetofwater/scheduler

Repository files navigation

Geoconnex Scheduler

codecov

The Geoconnex scheduler crawls water data metadata on a schedule and synchronizes with a graph database.

  • it crawls sitemaps with nabu harvest and downloads it to an S3 bucket
  • it syncs data between the S3 bucket and the graph database using nabu sync

For more information about the Geoconnex project generally and how it aims to improve water data infrastructure, see the Geoconnex docs.

Local / Development Quickstart

Important

You must have uv installed for package management

Install dependencies and spin up necessary Docker services:

make deps && make dev

Then go to localhost:3000

Dockerized / Production Quickstart

  • Spin up all services as containers including user code and local db/s3 containers (make sure to set the DAGSTER_POSTGRES_HOST env var to dagster_postgres)
make prod

Spin up user code and essential services but not storage (You will need to specify your db/s3 endpoints and any other remote services in the .env file)

make cloudProd

All cloud deployment and infrastructure as code work is contained within the harvest.geoconnex.us repo

Configuration

  • All env vars must be defined in .env at the root of the repo
  • The .env.example file will be copied to .env if it does not exist

Testing

  • Spin up the local dev environment
  • Run make test from the root to execute tests
  • If you use VSCode, run the task dagster dev in the debug panel to run the full pipeline with the ability to set breakpoints

Licensing

This repository is a heavily modified version of gleanerio/scheduler and is licensed under Apache 2.0

About

A scheduler for initiating Geoconnex water data crawls and synchronizing them with a graph database

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 3

  •  
  •  
  •