Predicting Survival of Intensive Care Unit Patients with Support Vector Machines

Authors

Josh Hollandsworth

Brad Lipson

Eric Miller

Published

December 1, 2023

Introduction

Every day sensors and systems are capturing a virtual flood of data points and feeding those values into powerful artificial intelligence and machine learning systems to derive classifications and predict a myriad of outcomes. These systems help us do a broad spectrum of things from fraud detection to interactions with smart home systems. Given the number of data points that are present in the medical field, it is of no surprise that machine learning is increasingly being leveraged to elevate patient care.

For as long as most of us remember, our interactions with physicians have included the gathering of data points to help them detect illness and track progression of disease. Common data points are age, weight, height, temperature, blood pressure, list of any current symptoms and etc. Physicians then use their education and years of practice to provide diagnoses and help us live happy and healthy lives. But what if this was process could be supported with machine learning, and brought into a more critical care setting?

We can. Thanks to machine learning we can use data from the countless sensors and measurements taken by medical staff in Intensive Care Units (ICU) to predict the survivability of patients under care. This data can then help teams organize around certain cases to help ensure the best allotment of resources and highest attention to the most dire of cases. A method we will use in this report is through the construction of Support Vector Machines or SVM’s.

SVM’s are a machine learning methodology that uses supervised learning based on historic cases to help train models that can then be used on other cases. Models created by Support Vector Machines are used to create clusters of 2 distinct groups of data based on the creation of a maximally-marginal hyperplane.(Han and Pei 2012). A maximally-marginal hyperplane can be explained as a line that can be drawn between two clusters that separates their members with the greatest distance between members of each cluster that are their nearest.

While we did not find many cases of using support vector machines for ICU patient survival, the use of Support Vector Machines in medicine is not a novel approach. In “Using Support Vector Machines for Diabetes Mellitus Classification from Electronic Medical Records” by Adeoye the author leveraged electronic medical records to help classify individuals with and without diabetes. Meanwhile, Zhou et al where able to use Support Vector Machines to predict the prognosis of severe, acute myocardial infarction with 92% accuracy from electronic medical records(Zhou 2023)

While Support Vector Machines can be a great utility used to cluster and predict outcomes, their usage in Medicine is not without problems. One problem noted by Liu et al is that the sheer number of data points available can make it hard to find features (data points of importance) for use in model construction.(Liu June 2018) In fact Liu goes as far as describing the use of Principal Component Analysis to chose which features to include in model construction. Additionally, support vector machines work best when the data can be mapped linearly, however unlike other methodologies this is not a strict requirement. Should data not be easily linearly separable the use of a kernel function or kernel trick allows data to be mapped from one space to another for optimal construction of the maximally marginal hyperplane (Han and Pei 2012) (Mohan et al. 2020)

Methods

Data Source

To build our model for predicting patient survivability, we are using data from the Medical Information Mart for Intensive Care (MIMIC). We are specifically targeting the MIMIC III data set. You can find more information about the MIMIC III dataset at https://mimic.mit.edu/docs/iii/ The data set itself is rather large so we have pre-processed the data slightly in order to generate a file of comma separated values rather than consume directly from their highly normalized format.

Statistical Modeling

Given that Support Vector Machines operate on both linear and non-linear data, we must look at how a Support Vector Machine manipulates non-linear data to a linear space to perform classification. This operation is done via a kernel function or “kernel trick”. The general formula for a kernel function is as follows.

\[ K(X_i, X_j) = \Phi(X_i)\Phi(X_j) \]

Where \(X_i, X_j\) is a tuple.

Once the data linear (and therefore linearly separable), we can define our maximally marginal hyperplane. A general formula for the hyperplane is as follows \[ W \times X + b = 0 \]

This formula has 2 components of import. The first is a \(W\), a weight vector and \(b\) a scalar bias.

Since \(W\) is a simple weighting vector it would be in the form of

\[ W = \{w_1, w_2, \dots, w_n \} \]

The tuples that lie the closest to the margins of the maximal marginal hyperplane are the actual support vectors.

Using the formulas above, if our support vectors where at \(y_i = 1\) and \(y_i = -1\) our hyperplane margins would be defined as

\[ H_1: W_0 + W_{1}X_{1} + W_{2}X_{2} + \dots + W_{n}X_{n} \ge 1 \]

and

\[ H_2: W_0 + W_{1}X_{1} + W_{2}X_{2} + \dots + W_{n}X_{n} \le -1 \]

Model

In order to forecast the possibility of patient death during their hospital stay, a support vector machine (SVM) model had to be developed and evaluated. A particular subset of the MIMIC dataset, a vast collection of ICU electronic health records, was used to train the algorithm. Age, gender, hospital expiration flag, mean heart rate, mean systolic blood pressure, mean respiration rate, mean temperature, mean white blood cell count, minimum platelet count, maximum creatinine, and mean lactate were among the variables included in the dataset.

The dataset included 4,591 ICU patients from a single center. The outcome was binary - died or survived during hospitalization. 741 patients (16%) died while 3,850 (84%) survived. The model aimed to predict this binary mortality outcome using demographics, vital signs, and lab results as predictors. An SVM with radial kernel was chosen due to its ability to handle binary classification problems.

Several assumptions were made in the modeling approach. The dataset was assumed to be representative of the broader ICU population at the institution. It was also assumed that the selected predictors sufficiently captured factors associated with mortality risk.

Initial attempts at developing a model leveraged a linear kernel and optimized for the cost hyperparameter. Our data were partitioned into two sets; a training and a test set. The training set included 80% of all records selected randomly. The test set was comprised of the remaining records accounting for 20% of the total data set. Predictors selected included age, gender, heart rate, blood pressure, respiratory rate, temperature, white blood cell count, platelet count, creatinine, and lactate levels.

These models had high F1 Scores as well as good ROC-AUC values however they were flawed and predicted that everyone would survive. Upon further investigation this was found to be due to a class imbalance problem. Given the imbalance in the dataset, one could have predicted that everyone survived and been more accurate than this initial model.

To resolve this problems and still build a significant model the training test set were down-sampled by selecting 95% of the minority cases(in witch patients die) and an equivalent number of the majority case (in which the patient survives). The effects of down-sampling resulted in a training set that was comprised of 50% minority cases and 50% majority cases. Consequently the test set was much larger and constructed from the observations that were not included in the training set.

Code

# Create a subset of the entire data set to just include the rows we have determined are of interest
# convert the gender field to an `is_male` flag. Also discretize the heart rate and age
model_data <- mimic_data %>%
  select(
    icustay_id,
    age,
    gender,
    hospital_expire_flag,
    heartrate_mean,
    sysbp_mean,
    resprate_mean,
    tempc_mean,
    wbc_mean,
    platelet_min,
    creatinine_max,
    lactate_mean
  ) %>%
  mutate(
    is_male = factor(case_when(gender == "M" ~ 1, TRUE ~ 0)),
    age_range = factor(
      case_when(
        age <= 18 ~ "<=18",
        age > 18 & age <= 40 ~ "19 to 40",
        age > 40 & age <= 60 ~ "41 to 60",
        age > 60 ~ ">=61"
      )
    ),
    heart_rate = factor(
      case_when(
        heartrate_mean < 60 ~ "<60",
        heartrate_mean >= 61 & heartrate_mean <= 80 ~ "60 to 80",
        heartrate_mean >= 81 & heartrate_mean <= 100 ~ "81 to 100",
        heartrate_mean > 100 ~ "above 100"
      )
    ),
    survival = factor(case_when(
      hospital_expire_flag == 1 ~ "DIED", TRUE ~ "SURVIVED"
    ))
  ) %>%
  select(-gender,-heartrate_mean,-age,-hospital_expire_flag) %>%
  drop_na()


# use z score standardization on our features to see what we get
scaled_model_data <- model_data %>%
  select(-icustay_id) %>%
  mutate_if(is.numeric, scale) %>%
  cbind(icustay_id = model_data$icustay_id)

#create survival factor of levels for graphing
survivalLevels <- attributes(model_data$survival)$levels

# set a random seed so that this is repeatable.
set.seed(params$rand_seed)

# our data are imbalanced such that the patients tend to survive (hospital expire flag = 0), to resolve this issue we will downsample 
# the surviving patients and construct a training set based on an 80% selection of all of the survival == "DIED"
# then randomly select and equal number of observations where survival == "DIED"
amount_to_sample = floor(sum(scaled_model_data$survival == "DIED") * 0.95)
train <-
  scaled_model_data %>% group_by(survival) %>% sample_n(size = amount_to_sample) %>% ungroup()
test <- scaled_model_data %>% anti_join(train, by = "icustay_id")

Nu Classification

Code

set.seed(params$rand_seed)
nuModel <- e1071::svm(survival ~ . - icustay_id,
                      type="nu-classification",
                      kernel="radial",
                      data=train,
                      probability=TRUE,
                      gamma=params$nu_gamma,
                      nu=params$nu_nu,
                      scale=TRUE)

nuPredictions <- predict(nuModel, test, probability=TRUE)
nuProbabilities <- attr(nuPredictions, "probabilities")[,1]
nuROC <- pROC::roc(as.factor(test$survival), nuProbabilities)
nuConfusionMatrix <- confusionMatrix(nuPredictions, test$survival, mode="everything", positive = "SURVIVED")

#to appropriately build the roc graph, buid a label order based on survivalLevels and the positive case from the confusion matrix
nuOrdering <- c( nuConfusionMatrix$positive,survivalLevels[which(survivalLevels != nuConfusionMatrix$positive)])

nuROCRPredictions <- ROCR::prediction(nuProbabilities, test$survival, label.ordering = nuOrdering)
nuPerformance <- ROCR::performance(nuROCRPredictions, "tpr", "fpr")

fourfoldplot(as.table(nuConfusionMatrix), color=c("navy", "lightblue"), main="nu-classification Confusion Matrix")

Code

plot(nuPerformance, colorize = FALSE, main="ROC Curve for nu-classification")
abline(a=0,b=1)
mtext(paste("AUC =", round(as.numeric(nuROC$auc), 4)," F1 =", round(as.numeric(nuConfusionMatrix$byClass['F1']), 4)))

C Classification

Code

set.seed(params$rand_seed)

cModel <- e1071::svm(survival ~ . - icustay_id,
                      type="C-classification",
                      kernel="radial",
                      data=train,
                      probability=TRUE,
                      cost=params$c_cost,
                      gamma=params$c_gamma,
                      scale=TRUE)

cPredictions <- predict(cModel, test, probability=TRUE)
cProbabilities <- attr(cPredictions, "probabilities")[,1]
cROC <- pROC::roc(as.factor(test$survival), cProbabilities)
cConfusionMatrix <- confusionMatrix(cPredictions, test$survival)

#to appropriately build the roc graph, buid a label order based on survivalLevels and the positive case from the confusion matrix
cOrdering <- c(survivalLevels[which(survivalLevels != cConfusionMatrix$positive)], cConfusionMatrix$positive)

cROCRPredictions <- ROCR::prediction(cProbabilities, test$survival, label.ordering = cOrdering)
cPerformance <- ROCR::performance(cROCRPredictions, "tpr", "fpr")

fourfoldplot(as.table(cConfusionMatrix), color=c("navy", "lightblue"), main="C-classification Confusion Matrix")

Code

plot(cPerformance, colorize = FALSE, main="ROC Curve for C-Classification")
abline(a=0,b=1)
mtext(paste("AUC =", round(as.numeric(cROC$auc), 4)))

Analysis and results (Primary Model)

Multiple models were built, tuned, and evaluated for the greatest possible accuracy, ROC AUC, and F1 Score values. The final model leveraged Nu-Classification with a nu value of 0.75 and a gamma value of 0.02

It can be inferred that our support vector machine (SVM) model demonstrated moderate accuracy in predicting hospital mortality. The predictive accuracy of the model was higher for patients who survived compared to those who did not. According to the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve, the model can tell the difference between patients who will die and those who will not. The area under the receiver operating characteristic (ROC) curve was determined to be 0.7711 and an F1 Score of 0.85

Our investigation has a number of limitations. In the beginning, the model was trained and assessed using a relatively small dataset. Additionally, it should be mentioned that the dataset used in this research was only sourced from one hospital. As a result, there is a chance that patients from various institutions will not benefit from the model’s performance. Moreover, it’s critical to remember that not all attributes that might be involved in mortality prediction were included in our analysis.

Despite these limitations, our study provides evidence that Support Vector Machine (SVM) models can be used to accurately predict survival in hospitalized patients with a high degree of precision. Further research is needed to validate the model’s performance on a larger and more diverse dataset and investigate the practical implications of using the model to predict patient survival.

The goal of applying the support vector machine (SVM) model is to improve patient care by applying it to various therapeutic settings. The following are some possible uses for the model, including to identify those who are more likely to die, allowing for the administration of more intensive medical measures. Also, we can create a mechanism to decide which patients are admitted to intensive care units and in what order. The goal of this research is to offer a thorough framework that will direct treatment decisions for patients who present with complicated medical issues.

This support vector machine (SVM) model shows a great deal of promise as a useful tool for improving patient care and predicting mortality. Further research is necessary to validate the model’s performance on a larger and more diverse dataset and investigate the practical implications of using the model to predict survival.

Challenger Modeling

A challenger variant was constructed by reducing the number of features. The objective was to assess whether a more modest model could attain comparable performance.

Age, gender, heart rate, systolic blood pressure, respiratory rate, and lactate level were utilized as predictors by the challenger. Once more, an SVM with a radial kernel was implemented for classification.

The evaluation and selection of models adhered to the identical procedure as the initial model. The optimal hyper parameters found where a gamma of 0.17 and a nu of 0.87

The challenger model achieved marginally inferior accuracy and specificity scores of 83.45%. Nevertheless, its reduced feature count enhances interpretability. The comparable performance achieved with fewer inputs gives us the minimum number of predictors required.

Code

set.seed(params$rand_seed)
challnger_test <- test %>% select(sysbp_mean, resprate_mean, tempc_mean, platelet_min, is_male, age_range, heart_rate, survival, icustay_id)
challnger_train <- train %>% select(sysbp_mean, resprate_mean, tempc_mean, platelet_min, is_male, age_range, heart_rate, survival, icustay_id)

Nu Classification

Code

set.seed(params$rand_seed)
nuChallengerModel <- e1071::svm(survival ~ . - icustay_id,
                      type="nu-classification",
                      kernel="radial",
                      data=challnger_train,
                      probability=TRUE,
                      gamma=params$nu_challenger_gamma,
                      nu=params$nu_challenger_nu,
                      scale=TRUE)

nuChallengerPredictions <- predict(nuChallengerModel, challnger_test, probability=TRUE)
nuChallengerProbabilities <- attr(nuChallengerPredictions, "probabilities")[,1]
nuChallengerROC <- pROC::roc(as.factor(challnger_test$survival), nuChallengerProbabilities)
nuChallengerConfusionMatrix <- confusionMatrix(nuChallengerPredictions, challnger_test$survival)

#to appropriately build the roc graph, buid a label order based on survivalLevels and the positive case from the confusion matrix
nuChallengerOrdering <- c(survivalLevels[which(survivalLevels != nuChallengerConfusionMatrix$positive)], nuChallengerConfusionMatrix$positive)

nuChallengerROCRPredictions <- ROCR::prediction(nuChallengerProbabilities, challnger_test$survival, label.ordering = nuOrdering)
nuChallengerPerformance <- ROCR::performance(nuChallengerROCRPredictions, "tpr", "fpr")

fourfoldplot(as.table(nuChallengerConfusionMatrix), color=c("navy", "lightblue"), main="nu-classification Challenger Confusion Matrix")

Code

plot(nuChallengerPerformance, colorize = FALSE, main="ROC Curve for nu-classification (Challenger)")
abline(a=0,b=1)
mtext(paste("AUC =", round(as.numeric(nuChallengerROC$auc), 4)))

C Classification

Code

set.seed(params$rand_seed)
cChallengerModel <- e1071::svm(survival ~ . - icustay_id,
                      type="C-classification",
                      kernel="radial",
                      data=challnger_train,
                      probability=TRUE,
                      cost=params$c_challenger_cost,
                      gamma=params$c_challenger_gamma,
                      scale=TRUE)

cChallengerPredictions <- predict(cChallengerModel, challnger_test, probability=TRUE)
cChallengerProbabillities <- attr(cChallengerPredictions, "probabilities")[,1]
cChallengerROC <- pROC::roc(as.factor(challnger_test$survival), cChallengerProbabillities)
cChallengerConfusionMatrix <- confusionMatrix(cChallengerPredictions, challnger_test$survival)

#to appropriately build the roc graph, buid a label order based on survivalLevels and the positive case from the confusion matrix
cChallengerOrdering <- c(survivalLevels[which(survivalLevels != cChallengerConfusionMatrix$positive)], cChallengerConfusionMatrix$positive)

cChallengerROCRPredictions <- ROCR::prediction(cChallengerProbabillities, challnger_test$survival, label.ordering = cChallengerOrdering)
cChallengerPerformance <- ROCR::performance(cChallengerROCRPredictions, "tpr", "fpr")

fourfoldplot(as.table(cChallengerConfusionMatrix), color=c("navy", "lightblue"), main="C-classification Confusion Matrix (Challenger)")

Code

plot(cChallengerPerformance, colorize = FALSE, main="ROC Curve for C-Classification (Challenger)")
abline(a=0,b=1)
mtext(paste("AUC =", round(as.numeric(cChallengerROC$auc), 4)))

Data and Visualization

Code

library(gtsummary)

mimic_data %>%
mutate(
  gender = case_when(gender == "M" ~ "male",
                     gender == "F" ~ "female",
                     TRUE ~ gender),
  survived = case_when(hospital_expire_flag == 1 ~ "Died",
                     hospital_expire_flag == 0 ~ "Survived")
) %>%  
select(
  age, 
  gender, 
  survived, 
  heartrate_mean,
  sysbp_mean,
  resprate_mean,
  tempc_mean,
  wbc_mean,
  platelet_min,
  creatinine_max,
  lactate_mean
) %>%
  tbl_summary(
    type = list(age ~ 'continuous2',
    gender ~ 'categorical', resprate_mean ~ 'continuous2',
    heartrate_mean ~ 'continuous2',
    tempc_mean ~ 'continuous2',
    wbc_mean ~ 'continuous2',
    platelet_min ~ 'continuous2',
    creatinine_max ~ 'continuous2',
    lactate_mean ~ 'continuous2',
    sysbp_mean ~ 'continuous2'),
    label = list(
      age ~ "Patient Age",
      gender ~ "Patient Sex",
      heartrate_mean ~ "Heart Rate",
      sysbp_mean ~ "Systolic Blood Pressure",
      resprate_mean ~ "Respiration Rate",
      tempc_mean ~ "Body Temperature (c)",
      wbc_mean ~ "White Blood Cell Count",
      platelet_min ~"Platelet Count",
      creatinine_max ~"Creatinine Level",
      lactate_mean ~"Lactate Level"
       ),
      statistic = all_continuous() ~ c("{median}({p25}, {p75})", "{min}, {max}"),
      by = survived 
  ) %>%
  add_overall(last = TRUE) %>%
  bold_labels() %>%
  italicize_levels() %>%   as_gt() %>%
  gt_theme_dark() %>%
  tab_options(
    table.background.color = "#d8e4ea",
    column_labels.background.color="#5092c2",
    table.align = "left"
  )

Characteristic	Died, N = 741¹	Survived, N = 3,818¹	Overall, N = 4,559¹
Patient Age
Median(IQR)	73(60, 83)	65(53, 78)	67(54, 80)
Range	17, 91	17, 91	17, 91
Patient Sex
female	339 (46%)	1,639 (43%)	1,978 (43%)
male	402 (54%)	2,179 (57%)	2,581 (57%)
Heart Rate
Median(IQR)	92(77, 106)	87(76, 99)	88(76, 100)
Range	47, 155	36, 139	36, 155
Systolic Blood Pressure
Median(IQR)	108(100, 120)	115(106, 126)	114(105, 126)
Range	70, 175	76, 195	70, 195
Unknown	2	6	8
Respiration Rate
Median(IQR)	21.5(18.4, 24.9)	18.9(16.6, 21.9)	19.3(16.8, 22.4)
Range	11.3, 40.6	9.5, 40.4	9.5, 40.6
Unknown	0	1	1
Body Temperature (c)
Median(IQR)	36.62(36.11, 37.19)	36.87(36.47, 37.32)	36.82(36.41, 37.31)
Range	31.60, 39.71	32.61, 40.10	31.60, 40.10
Unknown	20	83	103
White Blood Cell Count
Median(IQR)	13(9, 18)	12(8, 15)	12(8, 16)
Range	0, 404	0, 207	0, 404
Platelet Count
Median(IQR)	166(95, 253)	180(126, 245)	178(122, 246)
Range	8, 951	5, 1,297	5, 1,297
Unknown	1	5	6
Creatinine Level
Median(IQR)	1.60(1.00, 2.60)	1.10(0.80, 1.70)	1.20(0.90, 1.90)
Range	0.20, 14.40	0.10, 27.80	0.10, 27.80
Unknown	1	1	2
Lactate Level
Median(IQR)	2.55(1.70, 4.50)	1.80(1.30, 2.55)	1.90(1.35, 2.75)
Range	0.40, 20.85	0.30, 16.80	0.30, 20.85
¹ n (%)

Patient Demographics

Patient Age

Code

ggplot(mimic_data, aes(x=age))+ 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Patient Sex

Code

mimic_data <- mimic_data %>% mutate(
  gender = case_when(gender == "M" ~ "male",
                     gender == "F" ~ "female",
                     TRUE ~ gender)
)

patient_sex_viz <- mimic_data %>%
  group_by(gender) %>%
  summarise(N = n()) %>%
  mutate(
    gender = as.factor(gender),
    pos = cumsum(N) - N/2,
    label = paste(N,  " ", gender, "\npatients\n(", 100*round(N/sum(N), 2), "%)", sep="")
  )

ggplot(patient_sex_viz, aes(x = "", y = N, fill = gender)) +
  geom_bar(stat = "identity", width=1, color="white", position = "stack") +  
  coord_polar(theta = "y", direction = -1, clip = "off") +
  theme_economist(base_family="ITC Officina Sans") + 
  theme(
    legend.position="none",
    line=element_blank(),
    axis.title.x=element_blank(),
    axis.text.x=element_blank(), #remove x axis labels
    axis.ticks.x=element_blank(), #remove x axis ticks
    axis.title.y=element_blank(),
    axis.text.y=element_blank(),  #remove y axis labels
    axis.ticks.y=element_blank()  #remove y axis ticks
  ) + 
  geom_text(aes(y = pos, label = label), color = "white", size=6) +
  scale_fill_economist(labels=NULL)

Patient Survival

Code

mimic_data <- mimic_data %>% mutate(
  survived = case_when(hospital_expire_flag == 1 ~ "Died",
                     hospital_expire_flag == 0 ~ "Survived")
)

ggplot(mimic_data, aes(x = survived, fill=survived)) +
  geom_bar(color="white") +  
  theme_economist(base_family="ITC Officina Sans") +
  scale_fill_economist(labels=NULL)

Vital Signs

Heart rate

Code

ggplot(mimic_data, aes(x=heartrate_mean)) + 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Blood pressure: Median systolic blood pressure 134 mmHg (Q1–Q3: 116–154 mmHg); median diastolic blood pressure 78 mmHg (Q1–Q3: 66–90 mmHg)

Code

ggplot(mimic_data, aes(x=sysbp_mean)) + 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Respiratory rate

Code

ggplot(mimic_data, aes(x=resprate_mean)) + 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Temperature

Code

ggplot(mimic_data, aes(x=tempc_mean)) + 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Oxygen saturation: Median 96% (Q1–Q3: 93–99%)

Code

ggplot(mimic_data, aes(x=spo2_mean)) + 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Laboratory Values

White blood cell count: Median 10.5 × 10^9 cells/L (Q1–Q3: 7.5–14.5 × 10^9 cells/L)

Code

ggplot(mimic_data, aes(x=wbc_mean)) + 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Neutrophil count: Median 7.5 × 10^9 cells/L (Q1–Q3: 5.4–11.2 × 10^9 cells/L)

NOT FOUND IN DATA

Lymphocyte count: Median 1.7 × 10^9 cells/L (Q1–Q3: 1.0–2.5 × 10^9 cells/L)

NOT FOUND IN DATA

Platelet count: Median 178 × 10^9 cells/L (Q1–Q3: 125–240 × 10^9 cells/L)

Code

ggplot(mimic_data, aes(x=platelet_min)) + 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Creatinine: Median 1.0 mg/dL (Q1–Q3: 0.8–1.3 mg/dL)

Code

ggplot(mimic_data, aes(x=creatinine_max)) + 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Bilirubin: Median 0.8 mg/dL (Q1–Q3: 0.5–1.2 mg/dL)

NOT FOUND IN DATA #### Lactate dehydrogenase: Median 250 U/L (Q1–Q3: 190–330 U/L)

Code

ggplot(mimic_data, aes(x=lactate_mean)) + 
  geom_histogram( color="#e9ecef", fill="#188bc2", alpha=0.9, position = 'identity') +
  theme_economist(base_family="ITC Officina Sans")

Conclusion

In conclusion, it can be inferred that our support vector machine (SVM) model demonstrated moderate accuracy in predicting hospital mortality. The predictive accuracy of the model was higher for patients who survived compared to those who did not. According to the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve, the model can tell the difference between patients who will die and those who will not.

The findings of this study indicate that Support Vector Machines (SVMs) have the potential to serve as a valuable tool for forecasting patient mortality in hospital settings. Nevertheless, additional investigation is required to substantiate these results among a broader and more heterogeneous sample.

The study has certain limitations that should be acknowledged. The present investigation is subject to various constraints. Initially, the investigation was carried out on a limited cohort of individuals. Furthermore, the research was carried out exclusively inside the confines of a solitary medical facility, potentially limiting the applicability of the findings to other healthcare institutions. Furthermore, the study failed to account for other potential confounders of the ICU patients.

Areas for Future Exploration

Subsequent investigations should examine the outcomes of this study among a broader and more heterogeneous sample. Further investigation is warranted to explore the application of Support Vector Machines (SVMs) in predicting additional clinical outcomes, including the duration of hospitalization and rates of patient readmission

References

Adeoye, et al., Abiodun O. 2021. “Utilizing Support Vector Machines for Diabetes Mellitus Classification from Electronic Medical Records,” International Journal of Advanced Computer Science and Information Technology (IJACSIT) 11 (10): 120–14.

Bansal, Malti, Apoorva Goyal, and Apoorva Choudhary. 2022. “A Comparative Analysis of k-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory Algorithms in Machine Learning.” Decision Analytics Journal 3: 100071. https://doi.org/https://doi.org/10.1016/j.dajour.2022.100071.

Chris J. Sidey-Gibbons, Jenni A. M. Sidey-Gibbons &. 2019. “Machine Learning in Medicine: A Practical Introduction.” BMC Medical Research Methodology Volume 19 (64).

Cristianini, Nello, and Bernhard Scholkopf. Fall 2002. “"Support Vector Machines and Kernel Methods: The New Generation of Learning Machines. (Articles).” AI Magazine 23 (3): 31.

Fouodo, et al, Cesaire. 2022. “Support Vector Machines for Survival Analysis with r.” R Journal 14 (2): 92–107.

Greco, Massimiliano, Pier F Caruso, and Maurizio Cecconi. 2020. “Artificial Intelligence in the Intensive Care Unit.” In Seminars in Respiratory and Critical Care Medicine, 42:002–9. 01. Thieme Medical Publishers, Inc. 333 Seventh Avenue, 18th Floor, New York, NY ….

Han, Kamber, J., and J. Pei. 2012. Data Mining: Concepts and Techniques. Morgan Kaufmann.

Houthooft, Rein, Joeri Ruyssinck, Joachim van der Herten, Sean Stijven, Ivo Couckuyt, Bram Gadeyne, Femke Ongenae, et al. 2015. “Predictive Modelling of Survival and Length of Stay in Critically Ill Patients Using Sequential Organ Failure Scores.” Artificial Intelligence in Medicine 63 (3): 191–207.

Hu, Wei Huang, Xiangfen, and Qiang Wu. n.d. “A New Support Vector Machine Algorithm for Data Mining.” Knowledge-Based Systems 112 (2016): 118–28.

Ismail, et al, Gaber A. 2020. “An Approach Using Support Vector Machines to Predict Hospital Readmission.” Journal of Medical Systems 44 (9): 1–10.

Karatzoglou, Alexandros, David Meyer, and Kurt Hornik. 2006. “Support Vector Machines in r.” Journal of Statistical Software 15 (9): 1–28. https://doi.org/10.18637/jss.v015.i09.

Liu, X. X., Chen. June 2018. “Mortality Prediction Based on Imbalanced High-Dimensional ICU Big Data.” Computers in Industry 98 (June 2018): 218–25.

Mantovani, Rafael G., André L. D. Rossi, Joaquin Vanschoren, Bernd Bischl, and André C. P. L. F. de Carvalho. 2015. “Effectiveness of Random Search in SVM Hyper-Parameter Tuning.” In 2015 International Joint Conference on Neural Networks (IJCNN), 1–8. https://doi.org/10.1109/IJCNN.2015.7280664.

Mohan, Lalit, Janmejay Pant, Priyanka Suyal, and Arvind Kumar. 2020. “Support Vector Machine Accuracy Improvement with Classification.” In 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), 477–81. https://doi.org/10.1109/CICN49253.2020.9242572.

Pölsterl, Sebastian, Nassir Navab, and Amin Katouzian. 2015. “Fast Training of Support Vector Machines for Survival Analysis.” In Machine Learning and Knowledge Discovery in Databases, edited by Annalisa Appice, Pedro Pereira Rodrigues, Vítor Santos Costa, João Gama, Alípio Jorge, and Carlos Soares, 243–59. Cham: Springer International Publishing.

Sapankevych, Nicholas I., and Ravi Sankar. 2009. “Time Series Prediction Using Support Vector Machines: A Survey.” IEEE Computational Intelligence Magazine 4 (2): 24–38. https://doi.org/10.1109/MCI.2009.932254.

Veropoulos, Konstantinos, Colin Campbell, Nello Cristianini, et al. 1999. “Controlling the Sensitivity of Support Vector Machines.” In Proceedings of the International Joint Conference on AI, 55:60. Stockholm.

Xu, Lihong Li, Fei, and Zhihua Zhou. 2010. “SVM Kernels for Data Mining: A Comparative Study.” Proceedings of the 2010 SIAM International Conference on Data Mining (SDM), 585–96.

Yonas B. Dibike, Dimitri Solomatine, Slavco Velickov, and Michael B. Abbott. 2001. “Model Induction with Support Vector Machines: Introduction and Applications.” Journal of Computing in Civil Engineering 15 (3).

Zeng, Zhi-Qiang, Hong-Bin Yu, Hua-Rong Xu, Yan-Qi Xie, and Ji Gao. 2008. “Fast Training Support Vector Machines Using Parallel Sequential Minimal Optimization.” In 2008 3rd International Conference on Intelligent System and Knowledge Engineering, 1:997–1001. https://doi.org/10.1109/ISKE.2008.4731075.

Zhou, et al, Xingyu. 2023. “Using Support Vector Machines for Deep Mining of Electronic Medical Records in Order to Predict Prognosis of Severe, Acute Myocardial Infarction.” Frontiers in Cardiovascular Medicine 10: 918.