A4.2 Data preprocessing (HL only)

All knowledge base resources related to A4.2 Data preprocessing (HL only).

This section focuses on essential data preprocessing techniques used to improve the performance and accuracy of machine learning models. Students will examine the significance of data cleaning in removing errors and inconsistencies, ensuring that input data is reliable and usable. They will also explore the roles of feature selection and dimensionality reduction in simplifying datasets—retaining the most relevant information while reducing complexity, improving model efficiency, and minimizing overfitting in high-dimensional data spaces.