Using EEG Signals & ML to Identify Genetic Predisposition to Alcoholism

Abstract

An implementation and analysis of various machine learning algorithms applied to electroencephalography (EEG) data. The data was collected by the Neurodynamic Laboratory of Colorado University, Boulder, and is hosted by the University of California, Irvine. We explored several algorithms, including Neural Networks, Random Forest, and Support Vector Machines, to gain insights from the EEG recordings.

Data

The dataset comprises measurements taken from 64 electrodes placed on the scalps of subjects, sampled at a rate of 256 Hz (with a 3.9-millisecond epoch) for a duration of 1 second. The subjects were divided into two groups: alcoholic and control. Each subject was exposed to either a single stimulus (labeled as S1) or two stimuli (S1 and S2), which were images selected from the renowned 1980 Snodgrass and Vanderwart picture set. In instances where two stimuli were shown, they could be presented in one of two conditions: a matched condition, where S1 was identical to S2, or a non-matched condition, where S1 differed from S2. Each subject provided between 50 to 120 readings, and each reading was treated as an individual sample for the analysis.

Process

The majority of classifiers employed similar preprocessing techniques for the EEG data. Initially, the dataset was divided into two sets: training (70%) and testing (30%). Following this split, both sets underwent normalization. For the neural network models, the preprocessing included normalizing the data between 0-255, resizing it to a 224×224 matrix, and then replicating each value across three channels to represent each sample as an image.

After preprocessing, the training data was utilized to fit our models. Subsequently, these models were leveraged to predict outcomes on the testing data, enabling us to assess the performance and accuracy of the algorithms.

Results

ModelAccuracy
Neural Net75%
Convolutional Neural Net (CNN)75%
Ridge Regression75%
Stochastic Gradient Dissent73%
Perceptron73%
Inception V-371%
Passive-Aggressive71%
Random Forest70%
VGG 1668%
K-nearest Neighbors63%
Support Vector Machine (SVM)62%

Conclusion

EEG data can be utilized to determine an individual’s susceptibility to alcoholism using fairly straightforward classifiers. Enhancing accuracy is achievable through the identification of critical metrics and the implementation of hybrid approaches like Ensemble learning.