Welcome to Optimus’s documentation!¶
As data scientists, we care about extracting the best information out of our data. Data is the new soil, you have to get in and get your hands dirty, without cleaning and preparing it, it just useless.
Data preparation accounts for about 80% of the work of data scientists, so having a solution that connects to your database or file system, uses the most important framework for machine learning and data science at the moment (Apache Spark) and that can handle lots of information, working both in a cluster in a parallelized fashion or locally on your laptop is really important to have.
Say Hi! to Optimus and visit our web page.
Prepare, process and explore your Big Data with fastest open source library on the planet using Apache Spark and Python (PySpark).
- Column Operations
- cols.append(col_name=None, value=None)
- cols.select(columns=None, regex=None, data_type=None)
- cols.rename(columns_old_new=None, func=None)
- cols.keep(columns=None, regex=None)
- cols.move(column, position, ref_col)
- cols.unnest(columns, mark=None, n=None, index=None)
- cols.impute(input_cols, output_cols, strategy=”mean”)
- cols.apply_by_dtypes(columns, func, func_return_type, args=None, func_type=None, data_type=None)
- User Define Functions in Optimus
- cols.apply_expr(columns, func=None, args=None, filter_col_by_dtypes=None, verbose=True)
- cols.count_uniques(columns, estimate=True)
- cols.replace(columns, search_and_replace=None, value=None, regex=None)
- cols.nest(input_cols, output_col, shape=None, separator=” “)
- Row Operations
- Feature Engineering with Optimus
- Machine Learning with Optimus