Odo migrates between many formats. These include
in-memory structures like
also data outside of Python like CSV/JSON/HDF5 files, SQL databases,
data on remote machines, and the Hadoop File System.
odo takes two arguments, a source and a target for a data transfer.
>>> from odo import odo >>> odo(source, target) # load source into target
It efficiently migrates data from the source to the target.
The target and source can take on the following forms
|Object||Object||An instance of a
So the following lines would be valid inputs to
>>> odo(df, list) # create new list from Pandas DataFrame >>> odo(df, ) # append onto existing list >>> odo(df, 'myfile.json') # Dump dataframe to line-delimited JSON >>> odo('myfiles.*.csv', Iterator) # Stream through many CSV files >>> odo(df, 'postgresql://hostname::tablename') # Migrate dataframe to Postgres >>> odo('myfile.*.csv', 'postgresql://hostname::tablename') # Load CSVs to Postgres >>> odo('postgresql://hostname::tablename', 'myfile.json') # Dump Postgres to JSON >>> odo('mongodb://hostname/db::collection', pd.DataFrame) # Dump Mongo to DataFrame
If the target in
odo(source, target) already exists, it must be of a type that
supports in-place append.
>>> odo('myfile.csv', df) # this will raise TypeError because DataFrame is not appendable
To convert data any pair of formats
odo.odo relies on a network of
pairwise conversions. We visualize that network below
A single call to
odo may traverse several intermediate formats calling on
several conversion functions. These functions are chosen because they are
fast, often far faster than converting through a central serialization format.