Is it possible to estimate the number of patients with COVID-19 admitted to intensive care units and general wards using clinical and telemedicine data?

BACKGROUND
Gabaldi et al. utilized telemedicine data, web search trends, hospitalized patient characteristics, and resource usage data to estimate bed occupancy during the COVID-19 pandemic. The results showcase the potential of data-driven strategies to enhance resource allocation decisions for an effective pandemic response.


OBJECTIVE
To develop and validate predictive models to estimate the number of COVID-19 patients hospitalized in the intensive care units and general wards of a private not-for-profit hospital in São Paulo, Brazil.


METHODS
Two main models were developed. The first model calculated hospital occupation as the difference between predicted COVID-19 patient admissions, transfers between departments, and discharges, estimating admissions based on their weekly moving averages, segmented by general wards and intensive care units. Patient discharge predictions were based on a length of stay predictive model, assessing the clinical characteristics of patients hospitalized with COVID-19, including age group and usage of mechanical ventilation devices. The second model estimated hospital occupation based on the correlation with the number of telemedicine visits by patients diagnosed with COVID-19, utilizing correlational analysis to define the lag that maximized the correlation between the studied series. Both models were monitored for 365 days, from May 20th, 2021, to May 20th, 2022.


RESULTS
The first model predicted the number of hospitalized patients by department within an interval of up to 14 days. The second model estimated the total number of hospitalized patients for the following 8 days, considering calls attended by Hospital Israelita Albert Einstein's telemedicine department. Considering the average daily predicted values for the intensive care unit and general ward across a forecast horizon of 8 days, as limited by the second model, the first and second models obtained R² values of 0.900 and 0.996, respectively and mean absolute errors of 8.885 and 2.524 beds, respectively. The performances of both models were monitored using the mean error, mean absolute error, and root mean squared error as a function of the forecast horizon in days.


CONCLUSION
The model based on telemedicine use was the most accurate in the current analysis and was used to estimate COVID-19 hospital occupancy 8 days in advance, validating predictions of this nature in similar clinical contexts. The results encourage the expansion of this method to other pathologies, aiming to guarantee the standards of hospital care and conscious consumption of resources.


BACKGROUND
Developed models to forecast bed occupancy for up to 14 days and monitored errors for 365 days.


BACKGROUND
Telemedicine calls from COVID-19 patients correlated with the number of patients hospitalized in the next 8 days.


❚ INTRODUCTION
COVID-19 transmission rates in Brazil sustain a scenario that can be classified as a pandemic state, and the country faces waves of variable demands for physical and human resources to treat patients with COVID-19. (1,2)Meanwhile, the priority of health service managers and the government (3) has been to provide proper sizing of its resources to guarantee the access of the population to appropriate care.It is necessary to ensure that patients, at all severity levels, who seek assistance during the pandemic have access to supplies, medicines, equipment, beds, and medical staff appropriate to their needs; (4) aiming to improve their outcomes and minimize the negative impacts of the pandemic.
In this context, initiatives to support the decisionmaking process regarding demand predictability and resource sizing have become more relevant as Brazil emerged as the epicenter of the COVID-19 pandemic. (5)lanning a volume-based resource allocation is an optimization alternative, (6) considering that the demand for experienced intensive care unit professionals, ventilation and monitoring devices, drugs used for tracheal intubation protocols, and hospital beds has varied significantly since the beginning of the pandemic, competing with other demands for hospital care.
An increasing application of machine learning techniques has been observed in datasets containing patients' clinical variables to predict the degradation of clinical conditions, mortality, and length of stay (LOS), (7)(8)(9) as well as an approach utilizing time-series analysis to estimate hospitalizations and admissions in health institutions using the number of telemedicine visits (10) or interest in terms related to pathology symptoms in web search engines (e.g., Google Trends). (11)his context highlights the opportunity to develop statistical models to predict the number of patients hospitalized due to COVID-19 and help hospital managers plan for beds, human resources, and other input sizing and availabilities.

❚ OBJECTIVE
To develop, implement, and monitor a predictive model to estimate the number of patients hospitalized due to COVID-19, segmenting patients by departmentintensive care units and general wards-whenever possible.
❚ METHODS This retrospective study was conducted at Hospital Israelita Albert Einstein (HIAE), a private not-for-profit 624-bed hospital in São Paulo, Brazil, which earmarked 300 beds for COVID-19 patients in March 2021, at the peak of the pandemic.
Two separate models were developed to achieve the study objectives: Model 1: Hospital occupancy was estimated by projecting the hospital admissions and discharges of COVID-19 patients separately.
Hospital admissions were estimated separately for the general ward and intensive care unit (ICU), including semi-intensive care beds, and analyzed using a time-series approach, including the evaluation of seasonality, trends, and residues.
To capture the internal transfer behavior that relocated patients between the general ward and the ICU, an estimation was also proposed considering the rate of patients transferred from the general ward to the ICU as a function of LOS, considering data from the previous 30 days.
Discharges for the 14 subsequent days were estimated using an individual LOS predictive model.To build this model, the first stage consisted of a retrospective investigation of the existing correlations between clinical variables and patients' LOS.Data were extracted from the medical charts of 4,741 patients admitted to the hospital with COVID-19 between March 2020 and July 2021.
Patients with a total LOS >45 days were excluded from the analysis because of the low representativeness of the group (<5% of the study population), and the possibility of reallocating those patients to non-COVID beds during their hospitalization, considering the clinical assessment, testing, and criteria established by global health institutions.
Additionally, for the estimation of LOS, patients who died during hospitalization, those who were transferred to another hospital, or whose discharge was requested by the patient were excluded from the analysis to isolate a pattern of behavior from observed hospitalizations and eliminate confounding factors.Firstly, feeding the model with patients who died, opted for home-care treatment, or transferred to another hospital may distort the expected impact for severity indicators such as the ventilation device use or ICU bed occupancy, resulting in an increase in the expected LOS in cases of regular discharge.Secondly, death, home care, and transfers to other hospitals represented >5% of the total number of COVID-19 hospitalizations in HIAE.
The clinical variables considered and analyzed for the LOS model included age group, sex, use of an extracorporeal membrane oxygenation (ECMO) device during the stay, use of non-invasive and invasive ventilation devices in the last 24 hours, occupancy of ICU or semi-ICU in the previous 24 hours, prescription of vasoactive drugs, and prescription of hemodialysis, with the considered criteria for inclusion detailed in the results section.
To facilitate data comprehension, an initial exploratory univariate analysis was performed, calculating the median einstein (São Paulo).2024;22:1-9 and mean LOS of discharged patients.,Comparisons between groups were carried out using the Mann-Whitney U test (12) and the Kruskal-Wallis test. (13)dditionally, as a multivariate approach, the impact of daily changes registered in the selected set of variables was evaluated using a random forest algorithm to dynamically predict the LOS for each individual, where the most important features were ranked and selected using the Gini importance index. (14)odel 2: The correlation between the number of telemedicine visits related to COVID-19 symptoms, the volume of web searches for terms associated with COVID-19 originating in São Paulo, Brazil, and the number of patients hospitalized with COVID-19 at HIAE were evaluated.
The data used to build the second model were collected from three sources: i) the daily volume of searches in Google for the term "sintomas COVID-19" in São Paulo, Brazil, extracted from Google Trends-a public database containing a normalized interest score calculated based on the search volume for a given search term and location over a selected period; ii) A dataset containing the daily number of visits to the telemedicine department of HIAE, considering three distinct groupings: Group 1, patients assigned diagnosis codes related to COVID-19 (ICD-10-CM from chapter X: J00-J99, Diseases of the respiratory system; chapter XXI: Z00-Z99, Factors influencing health status and contact with health services; COVID-19' specific ICD-10-CM code U07.1; and ICD-10-CM codes related to its symptoms: R05, R06, R07.0 and R50); Group 2: patients exclusively assigned the COVID-19-specific ICD-10-CM code U07.1; and, Group 3: the daily number of visits to the telemedicine department, without filtering ICD-10-CM codes; iii) A dataset containing the target variables: daily number of admissions and patients hospitalized due to COVID-19 at HIAE, considering only the coronavirus specific ICD-10-CM code B34.2.
At the time of data collection, the telemedicine department still did not assign the COVID-19 specific ICD-10 CM code B34.2 to identify confirmed cases of COVID-19 in its calls; therefore, this specific code was not considered in the development of the model.
The Pearson Correlation between telemedicine attendance volumes, Google Trends interest scores, and the daily normalized weekly moving averages of hospitalized patients was calculated to investigate the lag that maximized the correlation function between the number of hospitalized patients and the given series, using a regression model. (15)erformance metrics: performance monitoring of the proposed models employed accuracy metrics of point forecasts-mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) as a function of the forecast horizon in days.Considering the average daily predicted values for the ICU and general ward across a forecast horizon, R² was also calculated to demonstrate the adherence of the projected values to the observed values. (16)

Confidentiality and ethical approval
This study was approved by the Ethics Committee of Hospital Israelita Albert Einstein (HIAE), CAAE: 51937121.6.0000.0071;# 5.136.309.Patient confidentiality was preserved by anonymizing the medical records.The requirement for informed consent was waived by the institutional review board prior to data collection and analysis.

❚ RESULTS
Model 1: between March 2020 and July 2021, 5,414 patients were admitted to HIAE due to COVID-19, considering all types of discharges (home care, external transfers, and deaths, which were excluded from LOS modeling) and patients who were hospitalized for more than 45 days.
Admissions: by analyzing the behavior of the overall number of new hospitalizations during the specified period, segmentation of the number of new admissions by department (general ward and ICU) was proposed to minimize the error of the expected flow for each projected day, attempting to extract seasonal factors and decrease the order of magnitude of the residues.
During the analyzed period, the average (± standard deviation) number of daily patient admissions in the general ward was 7.74 (± 4.66), while new hospitalizations in the ICU was 2.85 (± 2.08).
Based on the decomposition of the time series of admissions, a seasonal component was identified for admissions in the general ward by day of the week, registering lower volumes during weekends (Table 1) and, considering the presenting characteristics, the naïve estimation of the number of admissions in the general ward included the seasonal components and the moving average of hospitalizations in the seven previous days.No seasonal patterns were identified for ICU admissions; attaching this fact to the low magnitude of average daily hospitalizations, the study modeled the daily ICU hospitalizations exclusively with its weekly moving average.Transfers: transfers between departments were estimated using a population model, considering the rate of patients transferred by the day of hospitalization since admission, calculated for 30 days prior to the projection date.
Discharges: between March 2020 and July 2021, 4,741 COVID-19 patients were discharged from HIAE after applying the exclusion criteria detailed in the Methods section to reduce the confounding factors.The selected group presented an average LOS of 8.59 days with a standard deviation of 5.78 days.A Shapiro-Wilk normality test showed with 95% confidence that the data were not normally distributed, directing the modeling of the phenomenon of interest to nonlinear strategies.
Variables that could impact LOS were discussed with ICU physicians, and their impact was evaluated by calculating the average LOS segmented by epidemiological characteristics (sex and age ranges) and severity, proxied by the use of supportive ventilation, prescription of ECMO, hemodialysis, vasoactive drugs, and ICU occupancy.For the ICU analysis, the separate impact of hospitalization in an ICU or semi-ICU on the previous day was evaluated.
The significance of the difference in LOS between the groups was tested using the Mann-Whitney U test with a 95% confidence interval.For age group stratification, the Kruskal-Wallis test was applied, followed by the Dunn test, which compared the groups individually.The proposed univariate approximation estimated the predictive potential of the subsequent inclusion of variables in the model.The results are summarized in table 2. Age, sex, use of ICU and semi-ICU, and use of noninvasive and invasive ventilation, including ECMO, were statistically correlated with patients' LOS.
The prescription of ECMO was discarded while training the model because, although it indicated hospitalizations of greater complexity and severity, it was a rare intervention, observed in only 0.34% of the sample, and the group of patients with an ECMO prescription showed the highest variability among the analyzed groups, represented by its interquartile range.
A multivariate approach was proposed with a random forest regression algorithm intended to dynamically estimate the LOS for each patient on a daily basis, evaluating the impact of the class of bed occupied (ICU, semi-ICU, or general adopted treatments, and specific medication prescription (mechanical ventilation devices, among others) in the last 24 hours alongside the personal characteristics of each patient, such as sex and age group.
The final step regarding the selection of available variables was performed alongside the model training routine, setting a cutoff for the Gini Importance Index, also known as the Mean Decrease in Impurity, to evaluate the feature relevance in a multivariate context.The variables included in the LOS predictive model had Gini importance indices >0.001, as presented in Table 3.
The Random Forest Model, which was proposed to estimate the total daily LOS for all hospitalized patients, including the detailed criteria above, achieved the following metrics: R² = 0.69, MAE = 2.93 days, and RMSE = 4.09 days.
To deliver outputs that the hospital managers could directly interpret, the results of the estimators were combined so that their composition established a predictive model capable of estimating the number of patients hospitalized in the ICU and general ward per day for the subsequent 14 days, with daily updates and performance monitoring.
Model 2: in the second model, the historical series explored in the analysis were pre-processed following two main operations-the calculation of weekly moving averages to smooth their daily variations, and standardization of the model using the z-score to favor visualization and allow the intuition of an optimal correlation value for pairs of series.
The three time series were plotted: i) Telemedicine: number of callers diagnosed with COVID-19 specific coding (ICD-10-CM U07.1); number of callers with COVID-19-related ICD-10-CM codes (specific coding, symptoms and chapters related to respiratory diseases); and ii) Google Trends: daily interest scores for the search term "sintomas COVID-19" in São Paulo.The time series of hospitalized patients lagged by 1-14 days, and for each combination of series, the Pearson Correlation Index was calculated iteratively.The results are presented in figure 1.
residuals tend to be observed as we move away from the projection date.
A comparison between the averages of the predicted and observed values by department is illustrated in figures 2, 3, and 4.   The number of telemedicine visits by patients coded with a COVID-19 diagnosis (ICD-10-CM U07.1) was highlighted as the best predictor of hospitalized patients.Once peak values differed by >0.001 for delays of seven and eight days, the delay that had the greatest anticipation power for the proposed model (8 days) was chosen.
In a scenario in which telemedicine data were unavailable, Google Trends interest scores for the terms related to symptoms appeared as a predictor, resulting in a Pearson Correlation Index of 0.895 when lagging the hospitalized patients' series by 9 days.The final model consisted of a time series linear regression comprising the predictors to estimate the number of hospitalized patients based on the telemedicine series. (17)

Performance monitoring
The results of the two models were followed for 365 days (between May 20, 2021, and May 20, 2022), considering that the first model has predictions for hospitalized patients segmented by department (general ward and ICU) over a 14-day interval and that the second model using the number of telemedicine visits predicts the overall number of hospitalized patients over an 8-day interval.
When comparing models, the evaluation of their performance indicators is limited by the lowest forecast horizon of 8 days.Computing the average error values, including different time windows, could benefit the model with the smallest forecast horizon, as larger einstein (São Paulo).2024;22:1-9 The adherence of the values predicted by the proposed models to the numbers observed during these 365 days can be reinforced by the performance indicators detailed in the Methods section, which also allowed the identification of the strengths and opportunities in both models.Figures 5, 6, and 7 present the ME, MAE, and RMSE values as the forecast horizon increases (the illustrated indicators are accessible in table 1S form in Supplementary Material).
The ME statistic allows the identification of possible biases in the proposed models.A positive value indicates that the predictions obtained using the referred model are consistently higher than the observed value; in this case, the number of hospitalized patients is overestimated.All predictions have a positive bias, and the model that processes clinical data overestimates the number of hospitalized patients the most, mainly because of error propagation involving the prediction of patient occupancy in the ICU.
The MAE accuracy measure, which is a measure minimized by the median, allows the identification of the models that can most accurately estimate the daily number of hospitalized patients.Despite the ME values tending to zero for the predictions of hospitalizations in the general ward for the first model in 10-day forecast horizons, rejecting the presence of bias in the set of estimations, the result is obtained through predictions with greater residues.As a complementary approach, the RMSE highlights the model that provides the best estimates for the average number of cases in a given period.
The analysis of all three statistics showed that the model built with telemedicine data outperformed the first model when considering the predictions made for the total number of hospitalized patients, also highlighting its capability of maintaining errors below 10 beds for the integrity of its forecast horizon, as evidenced by the MAE.
The first model shows significant deterioration over a period >3 days; however, its ability to distinguish between the classifications of occupied beds sets a gain for resource planning, even though higher reliability indices are limited to short-term predictions.

❚ DISCUSSION
The developed models resulted in reliable estimations of the expected number of COVID-19 hospitalizations at HIAE, which were updated daily and used by HIAE managers to define the allocation of beds to COVID-19 and non-COVID-19 patients, mainly by adjusting the volume of elective surgeries that the hospital would be able to perform in the subsequent 7 days, the hiring of qualified professionals to meet the expected demand, and the proper sizing of its supplies.
Daily capacity adjustments were performed at HIAE during the worst days of the pandemic based on the proposed telemedicine model and were also used by the hospital to prepare for the Delta and Omicron phases of increase in the observed number of hospitalized patients.The development and implementation of a datadriven tool to support the decision-making process in a hospital management environment showed that at an atypical moment of great concern with the evolution of the COVID-19 pandemic, maturity in data management, quality, security, and analysis allows a health institution to benefit from the information generated during dayto-day operations.
The detailed approach assessed the inclusion of clinical variables and hospital data aimed at delivering estimations that accurately capture the nuances of demand behavior, trends of increases and decreases in new cases and hospitalizations, and the epidemiological profile of hospitalized patients. (18)As an alternative to population models, which became popular throughout the COVID-19 pandemic, and aiming for estimates that comprise geographic regions in their entirety, models can be developed by applying algorithms oriented to the projection of time series, such as ARIMA (19) and Convolutional Neural Networks.
The isolation of the hospital context and its variables implies a trade-off, given that the variables that can explain the remaining variance of the studied phenomenon may be unavailable as the pandemic context evolves dynamically.Among these caveats are the emergence of new variants, vaccines and vaccine coverage, effective medications, mask usage, isolation rates, and the guidelines adopted in care.
The validation of predictions of this nature encourages the creation of new models for other pathologies, exploring correlations of events before hospitalizations, and studying patterns of care involving a diagnosed patient and their clinical evolution.It also highlights the importance of joining forces, between different areas of the same institution and between various institutions, sharing complementary data, and improving the use of data for decision-making.

❚ CONCLUSION
The model that estimates the number of COVID-19 hospitalizations in a private not-for-profit hospital based on telemedicine could accurately anticipate the increase and decrease in the volume of patients with a lag of 8 days, demonstrating its usefulness for the effectivemanagement of beds and general resources for caring for patients with COVID-19.The readiness to provide data regarding the volume of care, hospitalizations, patient characteristics, and clinical interventions can help identify patterns to understand the pathology better and provide more accurate decisions regarding the allocation of resources.

❚ ACKNOWLEDGEMENTS
We state that the only sponsor of the study in question was the Hospital Israelita Albert Einstein, where all authors are employed.Because the coordination and preparation of studies and articles was established as a core activity for the positions occupied by the participants in this study, prospecting for new sponsors and requesting additional funding was not necessary.The publication of this study was not contingent upon sponsor approval.

Figure 2 .Figure 3 .
Figure 2. Comparison between the daily number of hospitalized patients and the mean values predicted by the described models

Figure 4 .
Figure 4. Comparison between the daily number of hospitalized patients in the intensive care unit and the mean values predicted by Model 1

Figure 1 .
Figure 1.Pearson Correlation Index calculated between all available time series and the lagged series of daily hospitalized patients

Figure 5 .Figure 6 .Figure 7 .
Figure 5. Prediction performance monitoring.Calculated mean error for the proposed models by forecast horizon (days)

Table 1 .
Seasonal factors calculated after analyzing the behavior of admissions to general wards per day of the week einstein (São Paulo).2024;22:1-9

Table 1S .
Performance metrics as a function of the forecast horizon