Python data exploration library that integrates a Pandas-like user experience with various database systems to provide analysts with a familiar environment while scaling out the analytical operations over a large data cluster for Big Data analysis.
Note: This library serves as a proof-of-concept code accompanying our research paper. Not all dataframe functions are/can be implemented and fully tested. Proceed with caution.
- Python >= 3.3
- Pip
- Java >= 1.8
- Clone the repository
- Install from source (pip install . )
- AsterixDB
- MongoDB (Community Edition)
- Neo4j
- PostgreSQL
- InfluxDB (In progress)
Example usages of PolyFrame can be found under 'notebooks' folder.
- Scale-independent Data Analysis with Database-backed Dataframes: a Case Study. EDBT/ICDT Workshops 2021
[notebook] [paper] - Exploratory Data Analysis with Database-backed Dataframes: A Case Study on Airbnb Data. IEEE BigData 2021: 3119-3129
[notebook] [paper]