Forecasting admissions in psychiatric hospitals before and during Covid-19: a retrospective study with routine data | Scientific Reports –


We included all inpatient admissions from 01 January 2017 to 31 December 2020 to nine hospitals in Hesse, Germany. These hospitals are part of a common service provider and account for about half of all inpatient mental health care in the state of Hesse. Aggregated admission numbers per day were obtained from the hospital administrations and did not contain individual patient data. Returns after planned interruptions, such as home leave, were excluded. Multiple separate admissions of the same patient were counted individually. Admissions to the departments of child and adolescent psychiatry and admissions to the departments for psychosomatic medicine were excluded.

We obtained weather and climate data from the Climate Data Centre of Germany’s National Meteorological Service17. We used the gtrendsR package version 1.5.1 to query Google trend data for Hesse, Germany18. School holidays and public holidays were obtained from publicly available calendars.


We used machine learning and time series models to predict the number of hospital admissions for each day of year 2019 and 2020. The machine learning models were (a) gradient boosting with trees (XGB)19, (b) support vector machines (SVM)20 and (c) elastic nets21. The time series models were a) exponential smoothing state space models (ETS)22, (b) exponential smoothing state space models with screening for Box-Cox transformation, ARMA errors and trend and seasonal components (TBATS)23 and (c) additive models with non-linear trends fitted by seasonal effects (PROPHET)24. The selection of modelling approaches was based on their performance in previous research and cannot be exhaustive. However, several other examples were successfully used for forecasting in the corona context and might be relevant to the interested reader16,25,26. We compared models forecasting a week in advance, a month in advance and a whole year one week in advance.


Our machine learning model used calendrical variables, climate and weather data, google trend data, Fourier terms and lagged number of admissions as features. All features are provided with a detailed explanation in Table S1. The calendrical features were day of day of the week, weekend, public holiday, school holiday, quarter of the year, month of the year, bridge days, i.e. days between a public holiday and the weekend and the end of the year, i.e. the days between Christmas and new year’s eve. The climate and weather data were wind speed, cloudiness, air pressure, precipitation depth and type, duration of sunshine, snow height, air temperature and humidity. Since the weather of future days was unknown at the point of prediction we used lagged values, i.e. the weekly model used the weather 7 days before the predicted day and the monthly models used the weather data 28 days before the predicted day. We did not use weather data for the yearly model.

Google trend data were retrieved using the gtrendsR package18 in the R environment for statistical computing27. We used the German translations of the following keywords in google trend data: depression, sadness, sad, suicide, mania, fear, panic, dread, addiction, dependence, alcohol, drugs, schizophrenia, psychosis and hallucinations. The relative frequency of searches for these key words in the region of Hesse, Germany, was used as feature. As for the weather data, we used lagged values of google trend data. The weekly models used the number of admissions 14 days before the predicted day, because the number of admissions was not known yet on day 7 before prediction, as additional feature and the monthly model used these values with a lag of 35. Our time series models did not use feature variables.

Training and testing

We used prospectively sliding time windows to validate (2018) and test (2019 and 2020) model performance. The final weekly models predicted each day of one full week of hospital admissions seven days in advance. We tested one model for each week and study site in 2019 and 2020, thereby incrementally prolonging the training period and forwarding the 7-day testing period each by one week. The monthly models each predicted 28 days of hospital admissions in advance and the incremental slides were 28 days. In the yearly models, we predicted the whole year of 2019 and 2020 each one week before the years started.

We compared model performance with the Root-Mean- Squared-Error (RMSE), the R2, the Mean Absolute Error (MAE) and a seasonal Mean Absolute Scaled Error (sMASE) as follows28:

$$Observation\;at\;time\;t = Y_{t}$$

$$Forecast\;of\;Y_{t} = F_{t}$$

$$Forecast\;error = e_{t} = Y_{t} – F_{t}$$

$$MSE = mean\left( {e_{t}^{2} } \right)$$

$$RMSE = \sqrt {MSE}$$

$$R^{2} = correlation\left( {Y_{t} ,F_{t} } \right)^{2}$$

$$MAE = mean\left( {\left| {e_{t} } \right|} \right)$$

$$sMASE = \frac{{MAE}}{{seasonally\;adjusted\;naive~\;MAE}}$$

The sMASE was calculated by dividing the MAE of our weekly, monthly and yearly forecasts by the MAE derived from a naïve forecast based on the number of admissions observed 14, 35 and 364 days before the predicted day, respectively. Variable importance was calculated for each variable in the best performing model using model specific metrics, i.e. in the case of elastic nets the absolute value of the coefficients after standardizing each feature. An advantage of model-specific metrics compared to model-agnostic measures is that they should be better in accounting for collinearity between features29.

Ethics approval and consent to participate

Our study did not involve individual patient data but summed numbers of admissions per day. The ethics committee of the Medical School Hannover confirmed that our study did not require ethical oversight.