Mojo Csv

Csv parsing library written in pure Mojo

usage

Add the Modular community channel (https://repo.prefix.dev/modular-community) to your pixi.toml file in the channels section.

channels = ["conda-forge", "https://conda.modular.com/max", "https://repo.prefix.dev/modular-community"]

pixi add mojo_csv

Usage

By default uses all logical cores - 2

 CsvReader(       
    in_csv: Path,
    delimiter: String = ",",
    quotation_mark: String = '"',
    num_threads: Int = 0, # default = 0 = use all available cores - 2
 )

from mojo_csv import CsvReader
from pathlib import Path
from sys import exit

fn main() raises:
    var csv_path = Path("path/to/csv/file.csv")
    try:
        var reader = CsvReader(csv_path)
    except:
        exit()
    for i in range(len(reader)):
        print(reader[i])

Delimiters

    CsvReader(csv_path, delimiter=";", quotation_mark='|')

Threads

force single threaded

CsvReader(csv_pash, num_threads = 1)

use all the threads

from sys import num_logical_cores

var reader = CsvReader(
    csv_path, num_threads = num_logical_cores()
)

Attributes

reader.raw : String # raw csv string
reader.raw_length : Int # total number of Chars
reader.headers : List[String] # first row of csv file
reader.row_count : Int  # total number of rows T->B
reader.column_count : Int # total number of columns L->R
reader.elements : List[String] # all delimited elements
reader.length : Int # total number of elements

Indexing

currently the array is only 1D, so indexing is fairly manual.

reader[0] # first element

Performance

See BENCHMARK.md for expanded info

micro file benchmark (3 rows) mini (100 rows) small (1k rows) medium file benchmark (100k rows) large file benchmark (2m rows)

✨ Pixi task (bench): mojo bench.mojo                                                                                                                                                      running benchmark for micro csv:
average time in ms for micro file:
0.0094 ms
-------------------------
running benchmark for mini csv:
average time in ms for mini file:
0.0657 ms
-------------------------
running benchmark for small csv:
average time in ms for small file:
0.317 ms
-------------------------
running benchmark for medium csv:
average time in ms for medium file:
24.62 ms
-------------------------
running benchmark for large csv:
average time in ms for large file:
878.6 ms

Experimental

Dict Reader and CsvWriter are in Beta

=== DictCsvReader Performance ===

Small file benchmark (1,000 rows): Small Single-threaded: 0.6154 ms Small Threaded: 0.5044 ms

Medium file benchmark (100,000 rows): Medium: 42.04 ms

Large file benchmark (2,000,000 rows): Large: 1280.5 ms

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
docs		docs
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
BENCHMARK.md		BENCHMARK.md
LICENSE		LICENSE
README.md		README.md
mojo_csv_logo.png		mojo_csv_logo.png
pixi.lock		pixi.lock
pixi.toml		pixi.toml
test_parse.mojo		test_parse.mojo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mojo Csv

usage

Usage

Delimiters

Threads

Attributes

Indexing

Performance

Experimental

=== DictCsvReader Performance ===

Small file benchmark (1,000 rows): Small Single-threaded: 0.6154 ms Small Threaded: 0.5044 ms

Medium file benchmark (100,000 rows): Medium: 42.04 ms

Future Improvements

About

Uh oh!

Releases 7

Packages

Languages

License

Phelsong/mojo_csv

Folders and files

Latest commit

History

Repository files navigation

Mojo Csv

usage

Usage

Delimiters

Threads

Attributes

Indexing

Performance

Experimental

=== DictCsvReader Performance ===

Small file benchmark (1,000 rows): Small Single-threaded: 0.6154 ms Small Threaded: 0.5044 ms

Medium file benchmark (100,000 rows): Medium: 42.04 ms

Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages