Csv parsing library written in pure Mojo
Add the Modular community channel (https://repo.prefix.dev/modular-community) to your pixi.toml file in the channels section.
channels = ["conda-forge", "https://conda.modular.com/max", "https://repo.prefix.dev/modular-community"]pixi add mojo_csvBy default uses all logical cores - 2
CsvReader(
in_csv: Path,
delimiter: String = ",",
quotation_mark: String = '"',
num_threads: Int = 0, # default = 0 = use all available cores - 2
)from mojo_csv import CsvReader
from pathlib import Path
from sys import exit
fn main() raises:
var csv_path = Path("path/to/csv/file.csv")
try:
var reader = CsvReader(csv_path)
except:
exit()
for i in range(len(reader)):
print(reader[i]) CsvReader(csv_path, delimiter=";", quotation_mark='|')force single threaded
CsvReader(csv_pash, num_threads = 1)use all the threads
from sys import num_logical_cores
var reader = CsvReader(
csv_path, num_threads = num_logical_cores()
)reader.raw : String # raw csv string
reader.raw_length : Int # total number of Chars
reader.headers : List[String] # first row of csv file
reader.row_count : Int # total number of rows T->B
reader.column_count : Int # total number of columns L->R
reader.elements : List[String] # all delimited elements
reader.length : Int # total number of elementscurrently the array is only 1D, so indexing is fairly manual.
reader[0] # first elementSee BENCHMARK.md for expanded info
micro file benchmark (3 rows) mini (100 rows) small (1k rows) medium file benchmark (100k rows) large file benchmark (2m rows)
✨ Pixi task (bench): mojo bench.mojo running benchmark for micro csv:
average time in ms for micro file:
0.0094 ms
-------------------------
running benchmark for mini csv:
average time in ms for mini file:
0.0657 ms
-------------------------
running benchmark for small csv:
average time in ms for small file:
0.317 ms
-------------------------
running benchmark for medium csv:
average time in ms for medium file:
24.62 ms
-------------------------
running benchmark for large csv:
average time in ms for large file:
878.6 ms
Dict Reader and CsvWriter are in Beta
Large file benchmark (2,000,000 rows): Large: 1280.5 ms
- 2D indexing
- CsvWriter
- CsvDictReader
- SIMD optimization within each thread
- Async Chunking
- Streaming support for very large files
- Memory pool for reduced allocations
- Progress callbacks for long-running operation
