Feather.jl Documentation
Feather.jl
provides a pure Julia library for reading and writing feather-formatted binary files, an efficient on-disk representation of a DataFrame
.
For more info on the feather and related Arrow projects see the links below:
feather: https://github.com/wesm/feather
Arrow: https://arrow.apache.org/
High-level interface
Feather.read
โ Function.Feather.read{T <: Data.Sink}(file, sink_type::Type{T}, sink_args...)
=> T
Feather.read(file, sink::Data.Sink)
=> Data.Sink
Feather.read
takes a feather-formatted binary file
argument and "streams" the data to the provided sink
argument, a DataFrame
by default. A fully constructed sink
can be provided as the 2nd argument (the 2nd method above), or a Sink can be constructed "on the fly" by providing the type of Sink and any necessary positional arguments (the 1st method above).
Keyword arguments:
nullable::Bool=true
: will return columns asNullableVector{T}
types by default, regarldess of # of null values. When set tofalse
, columns without null values will be returned as regularVector{T}
use_mmap::Bool=true
: indicates whether to use systemmmap
capabilities when reading the feather file; on some systems or environments, mmap may not be available or reliable (virtualbox env using shared directories can be problematic)append::Bool=false
: indicates whether the feather file should be appended to the providedsink
argument; note that column types between the feather file and existing sink must match to allow appendingtransforms
: aDict{Int,Function}
orDict{String,Function}
that provides transform functions to be applied to feather fields or columns as they are parsed from the feather file; note that feather files can be parsed field-by-field or entire columns at a time, so transform functions need to operate on scalars or vectors appropriately, depending on thesink
argument's preferred streaming type; by default, aFeather.Source
will stream entire columns at a time, so a transform function would take a singleNullableVector{T}
argument and return an equal-lengthNullableVector
Examples:
# default read method, returns a DataFrame
df = Feather.read("cool_feather_file.feather")
# read a feather file directly into a SQLite database table
db = SQLite.DB()
Feather.read("cool_feather_file.feather", SQLite.Sink, db, "cool_feather_table")
Feather.write
โ Function.Feather.write{T <: Data.Source}(io, source::Type{T}, source_args...)
=> Feather.Sink
Feather.write(io, source::Data.Source)
=> Feather.Sink
Write a Data.Source
out to disk as a feather-formatted binary file. The two methods allow the passing of a fully constructed Data.Source
(2nd method), or the type of Source and any necessary positional arguments (1st method).
Keyword arguments:
append::Bool=false
: indicates whether thesource
argument should be appended to an existing feather file; note that column types between thesource
argument and feather file must match to allow appendingtransforms
: aDict{Int,Function}
orDict{String,Function}
that provides transform functions to be applied to source fields or columns as they are streamed to the feather file; note that feather sinks can be receive data field-by-field or entire columns at a time, so transform functions need to operate on scalars or vectors appropriately, depending on thesource
argument's allowed streaming types; by default, aFeather.Sink
will stream entire columns at a time, so a transform function would take a singleNullableVector{T}
argument and return an equal-lengthNullableVector
Examples:
df = DataFrame(...)
Feather.write("shiny_new_feather_file.feather", df)
Feather.write("sqlite_query_result.feather", SQLite.Source, db, "select * from cool_table")