Predicting Survival of Intensive Care Unit Patients with Support Vector Machines
2023-12-01
SVM’s and Our Data Source
With Eric Miller
History and Background
Developed by Vapnik and Chervonenkis in 1964 then revised in 1992 to incorporate Non-linear classifiers
Support Vector Machines (SVM) employs supervised learning using past data to train models for new cases
SVM models are used to form clusters of two distinct data groups
SVM establishes a hyperplane to maximize the margin between data groups
The hyperplane acts as a line separating the clusters
The goal is to ensure the greatest separation between neighboring members of each cluster
Data Source
Our team is utilizing data from the Medical Information Mart for Intensive Care (MIMIC)
We are focusing on the MIMIC III dataset.
The dataset was pre-processed to isolate the most statistically significant features
Additional information about the MIMIC III dataset can be found online at the Massachusetts Institute of Technologies website linked in the paper. https://mimic.mit.edu/docs/iii/.
Methods - Kernal Function
Support Vector Machines operate on both linear and non-linear data by using a kernel function or “kernel trick” to manipulate non-linear data into a linear space for classification. The general formula for a kernel function is as follows, where \(X_i, X_j\) is a tuple.
\[
K(X_i, X_j) = \Phi(X_i)\Phi(X_j)
\]
Methods - Hyperplane
Once the data is linearly separable, we can define our maximally marginal hyperplane. A general formula for the hyperplane is as follows \[
W \times X + b = 0
\]
This formula has 2 components of import. The first is a \(W\) , a weight vector and \(b\) a scalar bias.
Since \(W\) is a simple weighting vector it would be in the form of the below
\[
W = \{w_1, w_2, \dots, w_n \}
\]
Methods - Support Vectors
The tuples that lie the closest to the margins of the maximal marginal hyperplane are the actual support vectors.
Using the formulas on the previous slide, if our support vectors where at \(y_i = 1\) and \(y_i = -1\) our hyperplane margins would be defined as
\[
H_1: W_0 + W_{1}X_{1} + W_{2}X_{2} + \dots + W_{n}X_{n} \ge 1
\]
and
\[
H_2: W_0 + W_{1}X_{1} + W_{2}X_{2} + \dots + W_{n}X_{n} \le -1
\]
SVM Vizualization
Modelling ICU Patient survival from the MIMIC dataset
With Josh Hollandsworth
Hello all, I’m Josh and im going to walk you through the our journey developing a model to predict ICU Patient Survival
We started with 12 features that we thought would play an important role in predicting patient survival We also wanted to see if the our full feature set was necessary or if we could build a reduced model that trained quicker while supplying siimilar out comes. To facilitate this we built a challenger model
Of course the most important part of modelling was kernel selection and tuning which we will jump into next.
Important Concepts
used the e1071 r package
leveraged the tune(svm, ...)
function for tuning our model hyperparameters
However before we go much further lets get some basic terminology out of the way and outline some technological decisions we made.
First we selected the e1071 r package to build, train, and test our model. There are alternatives out there however e1071 appears to be the most popular
Secondly we didn’t really know what the best hyperparameters or even kernel where so we leveraged the tune method to try a bunch of tunings to see what produced the best accuracy
Our intial primary model
was of type “C-Classification”
leveraged a linear kernel
only tuned the cost hyperparameter
we selected C-classification which is the default because it selects the response variable as a binary classifier we also tested a linear kernel because nothing jumped out at us when doing data exploration that our data would not be easily linearly seperable we tuned cost which is the penalty for an incorrect prediction. we did this by passing a list of cost values from 0.001 to 100 with each step being a factor of 10
Initial Model Analysis
On our ROC Curve we can see that our model has roughly a 74% accuracy as that is the area under the ROC Curve. Ideally that number would be higher but it was acceptable as anything above the 50% diagonal line means that our model is more accurate than flipping a coin
Our confusion matrix however was a disaster. As you can see, despite having an acceptable accuracy, our model was very optimistic and predicted that everyone who entered ICU would survive. While we would love for this to match reality, we know its not true. Additionally a model that assumes everyone survives is of little value given
What was wrong with our model
Class imbalance !!!
86% of the time, patients surived
14% of the time, patients died
Random sampling could exacerbate this problem
Our problem was that we had class imbalance. It seems to follow reason that a lot of patients who enter ICU will survive, they just need higher levels of monitoring and care. However of course this is not always the case. The end result was that our data set was majorily comprised of patients sruviving.
This problem could be exacerbated by random sampling which could chose an even smaller percentage of the minority case(patient death)
Fixing the model
We attempted 2 strategies to fix the
Oversampling of minority case
Ensure that minority case was a larger percentage of the training set via selection with replacement
Downsampling majority case
Take a large percentage of the minority cases, select and EQUIVALENT number of majority cases for a 50/50 split
First we attempted to oversample our minority case by selecting ensuring that they met a certain precentage of the train data set and allowed random sampling to occur with replacement (meaning we could select the same observation multiple times) This improved the model but not significantly Secondly we downsampled the majority case by selecting 95% of the minority case, then randomly selecting and EQUIVALENT number of the majority cases for a 50/50 split This proved better than oversampling, but still wasnt optimal
Tuning the new model
tested using a rbf (radial bias kernel)
switched to nu-classification
tuned for nu and gamma
using grid search strategy and a bunch of packages
when we tuned we decided to try a few other options of the e1071 package. First we switched to a radial basis kernel. R secondly we switched to nu classification. Which according the documentation is the same as C classification but it constrains the nu value between 0 and 1, nu its self is related to the ratio of support vectors and the ratio of the training error we now had to tune for nu and gamma hyperparemeters we had no clue in how to do this and since our model took a large hit in accuracy by downsampling we wanted to try a bunch of things and select the best to do this we leveraged a grid search strategy which is effectively a brute force attemp at finding hyperparameters
Tuning the new model…more problems
Tune SVM is VERY VERY SLOW
does not leverage mulitiple cpu cores
operates sequentially
doParallel and foreach to the rescue
still slow but much quicker at the same time
given we had two hyperparameters to tune and wanted the best accuracy we attempted to use grid search with tune svm. I set this up on my machine, walked away, came back a few hours later and nothing had completed. eventually my machine crashed Discovered do parallel and foreach to allow me to pass the variables in and build and validate models, aggregate results and pick the best parameters
Tuning the new model and paying more for it
This is a fun graph to show the out comes, this is on a an 8 core 10th gen intel cpu. Training took around 45 minutes for the nu classification and 20 minutes c-classification
My CPU ran near 96 degrees celsius @ 4.8 ghz for the duration of the test
Tuning Results
nu-classification won out
Primary Model
Challenger model
Model Results (Challenger)
Results and Conclusions
With Brad Lipson
Data
Characteristic
Died , N = 741
Survived , N = 3,818
Overall , N = 4,559
Patient Age
Median(IQR)
73(60, 83)
65(53, 78)
67(54, 80)
Range
17, 91
17, 91
17, 91
Patient Sex
female
339 (46%)
1,639 (43%)
1,978 (43%)
male
402 (54%)
2,179 (57%)
2,581 (57%)
Heart Rate
Median(IQR)
92(77, 106)
87(76, 99)
88(76, 100)
Range
47, 155
36, 139
36, 155
Systolic Blood Pressure
Median(IQR)
108(100, 120)
115(106, 126)
114(105, 126)
Range
70, 175
76, 195
70, 195
Unknown
2
6
8
Data Continued
Characteristic
Died , N = 741
Survived , N = 3,818
Overall , N = 4,559
Respiration Rate
Median(IQR)
21.5(18.4, 24.9)
18.9(16.6, 21.9)
19.3(16.8, 22.4)
Range
11.3, 40.6
9.5, 40.4
9.5, 40.6
Unknown
0
1
1
Body Temperature (c)
Median(IQR)
36.62(36.11, 37.19)
36.87(36.47, 37.32)
36.82(36.41, 37.31)
Range
31.60, 39.71
32.61, 40.10
31.60, 40.10
Unknown
20
83
103
White Blood Cell Count
Median(IQR)
13(9, 18)
12(8, 15)
12(8, 16)
Range
0, 404
0, 207
0, 404
Data Continued again
Characteristic
Died , N = 741
Survived , N = 3,818
Overall , N = 4,559
Platelet Count
Median(IQR)
166(95, 253)
180(126, 245)
178(122, 246)
Range
8, 951
5, 1,297
5, 1,297
Unknown
1
5
6
Creatinine Level
Median(IQR)
1.60(1.00, 2.60)
1.10(0.80, 1.70)
1.20(0.90, 1.90)
Range
0.20, 14.40
0.10, 27.80
0.10, 27.80
Unknown
1
1
2
Lactate Level
Median(IQR)
2.55(1.70, 4.50)
1.80(1.30, 2.55)
1.90(1.35, 2.75)
Range
0.40, 20.85
0.30, 16.80
0.30, 20.85
Results
The findings indicate that our support vector machine (SVM) model had a test set accuracy of 74.74%.
The model exhibited a sensitivity rate of 74.87% and a specificity rate of 63.89%
The study yielded a positive predictive value of 99.41% and a negative predictive value of 3%.
The area under the receiver operating characteristic (ROC) curve was determined to be 0.7711
The F1 score was determined to be 0.8541.
Model Conclusions
Our support vector machine (SVM) model demonstrated moderate accuracy in predicting hospital mortality.
The predictive accuracy of the model was higher for patients who survived compared to those who did not.
The model can tell the difference between patients who will die and those who will not.
Considerations for improvement
Potential to serve as a valuable tool for forecasting patient mortality in hospital settings.
Additional investigation is required to substantiate these results among a broader and more heterogeneous sample.
Constraints since the investigation was carried out on a limited cohort of individuals.
Data from just one hospital, should include globally
Considerations for improvement (Cont’d)
Should improve diversity in future studies
Limit the applicability of the findings to other healthcare institutions.
Failed to account for other potential confounders of the ICU patients.
Future Studies
Examine the outcomes of this study in a bigger, more heterogeneous sample.
Further investigation is warranted to explore the application of SVMs in predicting additional clinical outcomes
Should study duration of hospitalization and rates of patient readmission
Predict length of stay in hospital
Predict length of time patients may live with certain conditions, depending on severity
Thank you! Any Questions?