Data Preprocessing
- The first 3 steps
- Raw data (Input)→ Data extraction → Data analysis → Data preparation
<use slides …>
- Data are extracted from raw data source.
- Data extraction - extract only the required data
- The extracted data can be separated into 03 main categories
- Structured - Data store in tabular format
- Unstructured - Media files
- Streaming
- Data analysis - extracted data are undergo analysis.
- Data preparation - we need to prepare the data to be deploy in the model, by transforming the data (vectorizing) and as well as feature engineering (understanding the actual set of data which need to be fed in to the model)
- Data preparation = Transforming data + Feature engineering
Model Validation
- Means ?
- Some times this validation phase is automated but sometimes this validation is doing manually.
- In order to get online predictions, we need to deploy the model to a online prediction service (like to a endpoint in Vertex AI).\
- model.joblib as well as model.jkl files too support for GCP services.
- AI platform prediction service
- TensorFlow
- XGBoost
- Scikit-Learn
- Pylourch