Finley Breeze, Ruhella Hossain, Michael Mayo, James McKelvie
Background: Clinic non-attendance is associated with poorer health outcomes and costs $29m per annum. Ini- tiatives to improve attendance typically involve expensive and ineffective brute-force strategies.
Purpose: To develop a predictive model for ophthalmic- clinic attendance.
Methods: Nationwide ophthalmology clinic data was aggregated for analysis. Variables included patient age, District Health Board (DHB), ethnicity, clinic appointment type, sex and deprivation quintile. Feature engineering of the training dataset was completed with binary encoding of predictive categorical variables. Age was the only numerical feature. Logistic regression models were evalu- ated with performance measures of area under the curve (AUC), sensitivity, specificity and precision. Model weighting was adjusted to account for the highly imbal- anced dataset. Ten-fold cross validation was used.
Results: Data included 3.1 million clinic appointments with 5.9% non-attendance rate. Raw data was divided for model training (90%) and testing (10%) to enable a robust validation framework. An overall model sensitivity of 73%, specificity of 69%, AUC of 0.777 and precision of 12.8% was achieved. Precision increased significantly when the model was constrained to DHBs with modest increases in non-attendance rates. A DHB with 9.9% non-attendance achieved precision of up to 22%.
Conclusion: It is possible to use machine learning algo- rithms to predict clinic non-attendance. The AUC con- firms this model enables clinically useful predictions of clinic attendance. The model AUC in the current study outperforms most previously published predictive models of attendance in the literature. This level of discrimina- tion is high enough to be used in advanced scheduling methods and targeted public health interventions