Introduction to residual analysis
Residual analysis is a critical step in validating the performance of a model. By examining the residuals, we can gain insights into how well the model's predictions align with the observed data - here we take Zwift's velocities as the oberseved data. The KDE plot (Kernel Density Estimate) of the residuals, along with summary statistics such as mean, skewness, standard deviation (std), minimum (min), maximum (max), and percentiles, offers a comprehensive view of the distribution and behavior of the residuals. In this analysis, we explore the characteristics of the residuals
KDE plot of residuals:
The KDE plot visualizes the distribution of the residuals. It provides a smooth estimate of the probability density function of the data. By analyzing the KDE plot, we can make the following observations:
Centered around Zero: A KDE plot centered around zero (the x-axis) indicates that, on average, the model's predictions are close to the observed values. It suggests that the model is relatively unbiased.
Symmetric and Bell-shaped: A symmetric and bell-shaped KDE plot indicates that the residuals are approximately normally distributed. It suggests that the model captures the underlying patterns and uncertainties in the data effectively.
Spread: The width of the KDE plot reflects the spread or variability of the residuals. A narrow plot indicates low variability, suggesting consistent predictions. Conversely, a wide plot suggests higher variability, indicating more dispersed predictions.
Outliers: Peaks or spikes in the KDE plot can indicate outliers or regions with large errors in the model's predictions. These points warrant further investigation to understand potential model deficiencies.
Skewness: The KDE plot's symmetry or skewness gives insights into the distribution's asymmetry. A symmetric plot indicates minimal skewness, while an asymmetric plot suggests potential biases in the model.
Tails: The tails of the KDE plot reveal the presence of extreme residuals, which may influence the model's overall performance.
Residual distribution:
Summary Statistics:
Summary statistics provide numerical measures that further quantify the characteristics of the residuals.
Mean: The mean represents the central tendency of the residuals. A mean close to zero indicates that, on average, the model's predictions are accurate.
Skewness: Skewness measures the asymmetry of the residual distribution. A skewness value near zero suggests symmetric data, while positive or negative values indicate right or left skew, respectively.
Standard Deviation (std): The standard deviation quantifies the spread of the residuals around the mean. A smaller std indicates less variability and tighter clustering around the mean.
Min and Max: The minimum and maximum values identify the range of the residuals. They help detect potential outliers or extreme values.
Percentiles: Percentiles (e.g., 25th, 50th, 75th) divide the data into four equal parts. They offer insights into the data's central spread and help assess the distribution's shape.
mean | std | min | 25% | 50% | 75% | max | skew |
---|---|---|---|---|---|---|---|
0.0208 | 0.3130 | -0.5508 | -0.1218 | -0.0002 | 0.1197 | 4.1808 | 7.4240 |