Year 1 Medic PRE SSC Practical

2024-02-13

Part 1: Overview

Learning Objectives

  • Learn about the steps involved in research.
  • Learn about some statistical techniques (regression and meta analysis).
  • Get experience with data visualization and data analysis in JASP.
  • Interpret and understand our results
  • Understand what you’ll be assessed on.

Introduction

There is a lot of variability in drug doses: this is both due to the many different types of drugs with different potencies, but also because each person has a different optimal dose based on their unique physiology and symptoms. The below plots show firstly, daily antipsychotic dose with all drugs mixed in together, and then daily doses converted to chlorpromazine-equivalents.

In the first plot, you can see that its skewed to the left with lots of people on low doses, which suggests a high drug potency. At the other end you have a tail with a few people taking really big doses. This suggests that those people are on not quite so potent drugs. In other words, you need more of it for the same effect.

Antipsychotic Dose variability in Cardiff COGS

Antipsychotic Dose variability in Cardiff COGS

In the second plot, we have our chlorpromazine equivalent daily doses. There is still a lot of variability here, but in this instance we know its likely not due to differences in the type of drug taken but more to do with the person taking the drug. There is a huge range in terms of the amount of drug people are taking, so its quite evident that what works for one person, won’t necessarily work for someone else - this is where personalised medicine could come into play.

Chlorpromazine-equivalent Antipsychotic Dose variability in Cardiff COGS

Chlorpromazine-equivalent Antipsychotic Dose variability in Cardiff COGS


This PRE

In our practical, we are going to be looking at what factors predict this variability. Everyone will have a different data set with information on their daily dose, pharmacogenomics-inferred enzyme activity, and some demographic variables.

Everyone will have a different data set containing 100 different people, with information on all of these variable below. The analysis will be performed in two stages:

  1. Regression analysis exploring the predictors of daily antipsychotic dose. This will allow us to see whether any of these variables are associated with variation in antipsychotic dose, or whether it is down to chance/other non-included variables.
  2. Meta analysis of pharmacogenomic variables and daily antipsychotic dose. This will allow us to combine the results from all of our separate analyses to get a better idea of the impact of Pharmacogenomic variation on antipsychotic dose.


The table below shows data from the first 6 people in my data set, just as an example of what this might look like.
ID age sex weight daily_dose CYP1A2_MP CYP1A2_AS CYP2D6_MP CYP2D6_AS CYP3A4_MP
1 53 Female 62 300 Normal Metaboliser 1 Rapid Metaboliser 2.5 Normal Metaboliser
2 55 Female 56 400 Ultra-rapid Metaboliser 3 Rapid Metaboliser 2.5 Normal Metaboliser
3 65 Female 72 425 Ultra-rapid Metaboliser 3 Normal Metaboliser 1.0 Normal Metaboliser
4 35 Male 94 100 Rapid Metaboliser 2 Normal Metaboliser 1.5 Normal Metaboliser
5 45 Female 62 450 Ultra-rapid Metaboliser 3 Normal Metaboliser 1.5 Normal Metaboliser
6 61 Male 67 200 Normal Metaboliser 1 Rapid Metaboliser 2.5 Normal Metaboliser

The statistical analysis in the practical session will be conducted in JASP which is freely available here (https://jasp-stats.org/download/) with tutorials here (https://jasp-stats.org/how-to-use-jasp/)



Part 2: The research process

Overall, we are going to be breaking down our practical session into 5 main steps:

  1. Identifying our research question / hypothesis
  2. Collecting data: our data is simulated
  3. Descriptive Statistics
  4. Data Visualisation
  5. Inferential Statistics: this is our linear regression + meta analyses



1. Research Question / Hypothesis

  • Hypotheses are testable statements, therefore it states a potential relationship between variables. For example, “pharmacogenomic variation is associated with antipsychotic dose in people with schizophrenia”.

    • This may be divided up into an ‘alternative’ or ‘null’ hypothesis. The alternative hypothesis (also called H1) is a statement of difference, such as the example given above. In contrast, the null hypothesis (or H0) is a statement of no difference, for example, “pharmacogenomic variation is not associated with antipsychotic dose in people with schizophrenia”.
    • This distinction is very important for inferential statistics, as we are using these statistical tests to see whether the data is consistent with the null hypothesis or not:
      • If our data is consistent with the null hypothesis, we accept it - this means that our data does not suggest that there is any effect or association present.
      • If our data is not consistent with the null hypothesis, we reject the null hypothesis - this means our data suggests that there is an effect or association present, and that the variability we are seeing is likely not down to chance variation.
    • Finally, hypotheses may be directional or non-directional. As the name suggests, directional hypotheses give a predicted direction of effect (e.g., faster enzyme activity is associated with increased antipsychotic dose in people with schizophrenia”). Whereas for non-directional hypotheses, just saying that you predict a difference is enough.
  • In contrast to this, a research question is less precise, and doesn’t make any claims. For example, “what is the impact of pharmaocogenomic variation on antipsychotic dose in people with schizophrenia”?



2. Data Collection

The data is simulated because its easier from an ethical perspective to not use ‘real’ data. However, the variables that are available for you to analyse are similar to what you’d see in a real world data set, and you’ll be analysing everything in the same way too.

The data can be downloaded from this page (put a link to Shiny App in here). Everyone will download a different data set, which is linked to their student IDs.



3. Descriptive Statistics

Descriptive statistics help you to get a general overview, or a summary of the data. This can be through looking at measures of central tendency (e.g., mean, median, mode), or dispersion (e.g., standard deviation, standard error, range). Additionally, you may use plots to explore the distribution of your data. Plotting your data can help you to see its spread, the variability, help you spot outliers, see the direction of effects, and they can even be used to test assumptions and select statistical models.

These are some of the descriptive statistics we’ll be looking at today. This may be because they are useful in their own right, or because they can be used to calculate our inferential statistics:

Central Tendency * Mean: a number that is representative or a typical value of the data (‘average’). To get it, we add everything up and divide by number of terms. Unfortunately, it can be affected by outliers/skewed data sets. * Median: the centre of the data. We find this by ordering all of our data and then finding the mid-point. This works best for ordinal level data, or for skewed data sets. * Mode: this is the most frequently seen score in the data set. It is most often used for categorical data, and is not so helpful for continuous variables, particularly when they’re precise.

Dispersion * Standard Deviation: a measure of dispersion around the mean, so you can get an idea of the spread of the data. * Standard deviation is related to variance, with the difference being that standard deviation is the square root of the variance. By square rooting the variance, this puts everything back into original units that the data is in, making things easier to understand. * Range: the difference between the highest value and the lowest value in a data set.


This is how you can run descriptive statistics in JASP:

Calculating descriptive statistics in JASP

Calculating descriptive statistics in JASP



4. Data Visualisation

This is how you can create plots in JASP. These might be to explore the distribution of different variables:

Plotting in JASP

Plotting in JASP


Or to look at the relationships between different variables:

Plotting in JASP 2

Plotting in JASP 2



5. Inferential Statistics

Inferential statistics can be used to test our hypothesis – specifically whether the data is consistent with the null hypothesis.

  • Hypotheses Recap: A hypothesis is a testable statement, which can be directional or non-directional.
  • H0 (Null) = statement of no difference (i.e., drug dose does not vary as a function of pharmacogenomic variables)
  • H1 (Alternative) = statement of difference (i.e., cdrug dose does vary as a function of pharmacogenomic variables).
  • If data is consistent with the null, we accept the null hypothesis, H0.
  • If not and the data we have is unlikely to have occurred under the conditions of the null hypothesis by chance, then we reject the null hypothesis. This leads us to believe the alternative hypothesis.


To do inferential testing we need to pick a test and an alpha value:

  • The alpha value is an arbitrarily defined probability that we use to determine whether results are significant or not.
    • In a lot of research, the p < 0.05 threshold is used.
    • This means if our test statistic has a p-value below 0.05 we reject the null hypothesis. However, if it is above 0.05, then we accept it.
  • The p-value is the probability of seeing the observed result if the null hypothesis is true.
    • As your p-value gets smaller and smaller, then there is increasing evidence against the null hypothesis. We call results below the p-value threshold ‘significant’.
    • However, as we mentioned these p-value thresholds are defined by us, so really it’s more useful to look at the size of the p-value as opposed to whether it crosses a given threshold.


  • Effect size = quantification of the size of an effect.
    • Significance ≠ Big Effect
  • GWAS often find many significant genetic loci that have small effect sizes.
    • For example, there are many genetic loci significantly associated with schizophrenia at genome-wide level (a stricter p-value cut-off than p = 0.05) but each locus alone has a very small effect.
  • Effect sizes can be standardised (e.g., Cohen’s d) or unstandardised (e.g., regression coefficient).

“The effect size is the main finding of a quantitative study. While a p-value can inform the reader whether an effect exists, the P value will not reveal the size of the effect.” Sullivan et al., (2012).


Regression / Linear Model

This is a model or an equation that describes the strength and direction of associations between an outcome variable, or a dependent variable and any included covariates. In this way, we can see precisely how changing a predictor X, affects our outcome Y.

The model can also be used for prediction, whereby we’d use the Y intercept (a) and the slope (b) to predict Y.

Components of a Regression Analysis

Components of a Regression Analysis

The random error term, also referred to as a residual component, is the difference between the actual value of Y and the value predicted by the regression model. This allows it to consider other things not included in our model which also affects our outcome, so basically all the variation that’s left over once we’ve included our predictors in the model. These residuals are seen in the graph here as the distance between each data point and the regression line. Residuals are a particularly important concept with respect to assumptions that underlie the regression analysis (more information on this can be found in the reference section).

In JASP, a linear model can be performed by following these steps:

Performing a Linear Regression in JASP

Performing a Linear Regression in JASP


Meta Analysis

Meta analyses are a way of using statistics to combine data from many studies into an overall estimate of effect. This statistical method is preferred to combining data from many studies together, which can be problematic for a few reasons:

  • Accessing data can be tricky. First and foremost, because of ethics - people might not have ethical approval to share the data with you. Secondly because it can be difficult to get in contact with researchers, or they may not want to share their data with you.
  • There can also be problems with measurement. People might have measured their variables in different ways, which would make it difficult to combine data. Additionally, you also wouldn’t be able to compare the respective effect statistics directly.


Meta analysis solves these problems in two ways.

  1. Instead of using raw data, all you need to do a meta-analysis is an effect size and measure of standard error for each study. These should be reported in published articles or pre-prints so it’s a lot easier to get started with as you just need to start combing through the literature.
  2. Effect sizes used in meta-analyses are standardised. The idea behind standardisation is that for the whole data set, the mean is 0 and the standard deviation is 1. For each value you take away the mean of the data set and then divide by the standard deviation. Standardisation may be applied to the raw variables prior to data analysis, but also to effect sizes derived from inferential statistics. In practice, this means that results are in terms of standard deviation changes in the predictor as opposed to the original units, and thus allows for easy comparison. We don’t need to do this for our practical today, because we are all using the same units but its just something to bear in mind for the future! If you do end up doing this, then you don’t need to standardise by hand, you can just get a computer to do it for you, so its really no effort at all.

Meta analyses in medicine are a fantastic way to evaluate the strength of evidence available on a disease or treatment. You’re working with a much bigger sample size because you’re combining so many studies together – this should give you much more statistical power to detect effects (if present). The ultimate goal of a meta analysis is to summarise existing knowledge from all suitable previous studies, to get the most accurate estimate for your effect of interest.



Meta Analysis In Depth

A meta-analysis is a pooled estimate of the true effect. It is pooled because it is aggregating observed effects from a number of different studies; it weights these effects, and then averages them out.

An important idea to understand is the distinction between the observed effect and the true effect. The true effect is the actual relationship between variables in a population. However, when this is measured or assessed, there’s usually some form of error introduced, which is why the observed effect might vary from study to study and will probably not fall on what is, in actuality, the ‘true effect’. Some error might be in the form of sampling variation, other error might come about because the studies are not the same, for example including different variables, or measuring variables in different ways


One key thing to consider when performing your meta-analysis is what kind of model you’re going to go for. There are two kinds that we need to be aware of here, fixed and random effect models.

  • Fixed effect models are concerned with estimating the ‘true’ effect size for the studies that are included in your analysis, instead of talking about what the true effect size might be more generally. It assumes that the true effect size is the same across all included studies and thus we also assume here that all factors that could possibly affect effect size are the same across all included studies. In other words, the only variation in the outcome should be arising due to sampling error (within study variance) as opposed to anything else.
  • The way that observed effects are weighted in fixed-effect models is by the inverse of each studies variance. Studies with more people, should have less variance, which results in them being weighted more heavily during the pooling of effect sizes than smaller studies, which tend to have higher variance.

In fixed-effect models, only the true effect, and sampling variation should be influencing your observed effect.

As you might guess, this is probably quite unlikely to happen in practice, however, for the conditions that we have, where everyone has a simulated data set arising from the same code, this is more than fine because the only error will just be sampling error. However, these are artificial circumstances, and you will rarely ever have such similar studies to work with so usually, people will opt for a random-effect model.

  • Random effect models aim to make inferences about a larger set of studies that we haven’t necessarily included in our analysis – in this way we can say that the effect is more generalisable. This model assumes that studies can vary for a number of reasons and thus that there can be trial-specific effects in addition to the random error and ‘true effect’ that we look at in the fixed effect model.
  • In other words, you have two sources of error – the within study error coming from sampling error, and the between study error coming from differences across these different studies. While larger studies still are weighted more heavily than smaller, the difference is not quite so great this time around so its more balanced than a fixed-effect model.

Comparison of Fixed- and Random-Effect Meta Analyses. Table adapted from: https://slideplayer.com/slide/13808226/

Comparison of Fixed- and Random-Effect Meta Analyses. Table adapted from: https://slideplayer.com/slide/13808226/

In JASP, a meta-analysis can be performed by following these steps:

Performing a Meta Analysis in JASP

Performing a Meta Analysis in JASP


Part 3: Analysis and Interpretation of our Data


Download your data using your student ID from here.


Download your dataset by selecting SSC from the drop down menu and putting in your student number.

Download your dataset by selecting SSC from the drop down menu and putting in your student number.


Import data into JASP.


Loading data into JASP

Loading data into JASP


Explore your data with descriptive statistics.


Medication PGx Control
Daily Clozapine Dose CYP1A2 MP/AS Age
CYP2D6 MP/AS Sex
CYP3A4 MP/AS Weight


Have a go at thinking about the following questions:

  1. How many people are in your sample?
  1. What is the age range and sex split?
  1. What is our outcome variable?
  1. Which variables are categorical and which are continuous?
  1. What is the mean daily dose of antipsychotic in your sample?
  1. Describe the distribution of the outcome variable?
  1. What is the most common and least common metabolism phenotype for each enzyme?


Run a linear regression model.


Fitting a linear model

Fitting a linear model


Have a go at thinking about the following questions:

  1. Are any predictor variables significantly associated with the outcome variable?
  1. What is the likelihood that this is due to chance?
  1. How much variance in the outcome variable is explained by the included predictor variables?
  1. How much variance in the outcome variable is explained after taking into account the number of variables included in the model?
  1. Which predictor variables have the largest positive effect, and the largest negative effect on the outcome variable?
  1. Do these results make sense with respect to your knowledge of biology?


Add your effect sizes / standard errors to the summary file on the Microsoft Teams Page.


Adding your results to the combined results table.

Adding your results to the combined results table.


Run a fixed-effect meta-analysis.


Have a go at thinking about the following questions:

  1. Based on the output, do you think we have used the correct type of meta-analysis (e.g., fixed- vs random-effects) ?
  1. Based on all of our analyses, is there evidence that pharmacogenomic variables influence daily antipsychotic dose? If so, which variables appear to be associated with antipsychotic dose, and in what direction?
  1. Is there any evidence of bias among the included ‘studies’?



Part 4: Assessment

This is some key information relating to the assessment component for the PRE part of your SSC module (copied from the Year 1 SSC Handbook for 2022- 2023). Please check with the updated version to make sure you have everything sorted for your assessment.

Overall

The presentation assessment will consist of an 8-minute (PowerPoint) presentation followed by 5 minutes for questions from the assessment panel.

It is expected that students work as a group to prepare their presentation. In addition, groups should dedicate one 3-hour IL session during SSC week 3 of the Spring semester to finalise the presentation.

One member of each group will submit the presentation file to Learning Central, prior to the assessment day (see assessment calendar). The presentation must contain one slide explaining how this research could help promote health or advise patients.


As part of the professionalism domain (ME1103) and students’ professional development, in accordance with the GMC, students must provide feedback to peers by completing a PDFE online survey. A personalised link to the group’s PDFE will be sent via email by the administrative team. See assessment calendar for specific dates.

It is important to remember that all student behaviour is based on a neutral rating, i.e., 3 out of 5 until an individual has proven themselves to be more or less than this level. A score of 3 is considered appropriate for most students and means they have worked to the standard expected of a medical student. It is predicted that only 1 or 2 members of each group may attain a score of 5 for their exceptional contribution to the group activity. It is therefore inappropriate to score everyone in the group as exceptional as this demeans the contribution made by the students who went above and beyond.

This activity must be completed prior to the presentation.


Students have previously found problems with referencing correctly and therefore avoiding unfair practice, the library can offer support and guidance throughout the SSC in this area as well as other areas such as using research databases. Students should seek advice from the library where necessary (developmental LO22) as unfair practices are actively monitored, and incidents of unfair practice will be reported and may result in action being taken

Presentation

Presentations will be uploaded to a One Drive folder prior to the commencement of the assessment session.

To maintain a professional environment, groups have been divided into 4 sessions. Each group will be allocated to specific session during the assessment period. Attendance is compulsory for the entire session students are allocated to. We do not anticipate that students will need to leave the lecture theatre during the 1.5-2hr session.

During the presentation, half the group will present to the assessment panel and the other half will subsequently answer questions from the academic staff on the question panel. The Assessment panel, made up of members of the SSC working team and experienced academics will be assessing your presentations. Refer to the marking sheet (Rubric) containing the assessment criteria in Appendix B.

A warning buzzer will sound at 7 minutes and a final buzzer will sound at 8 minutes to indicate that the presentation is over. Groups must stop presenting at the final buzzer (at 8 minutes).

Checklist

To complete the project, students are required to:

Reference the material on the slides accurately
Research around the topic
Submit the presentation
Complete the online PDFE survey
Complete the presentation
Review the other presentations

Ideas

Some useful things to include in your presentation might be:

  • Introduction
    • Background - introduce the topic, define key terms, why is this important to know about
    • Your study - What are your aims and why is this important? Add in your research question / hypotheses.
  • Methods
    • Sample - participants, data collection
    • Variables - dependent variable, predictor variables, how they are measured
    • Data analysis - software used, linear model, meta analysis
  • Results
    • Descriptive statistics - not too important as the key result is the meta analysis but these would usually be included.
    • Data visualisation - a forest plot might be nice here!
    • Report the findings of your analysis
  • Discussion
    • Interpret your results - how do they fit in with the wider literature, why are they important, and how might they be applied in a medical context.
    • Strengths and limitations
    • Future Directions
    • Conclusions

For the Q & A portion of the assessment you’ll want to be confident explaining and justifying some of the decisions you made during the analysis. If you don’t know the answer it is perfectly fine to say that you do not know, and maybe follow that up with a guess.

References

This is not a reading list, but rather some helpful resources if you’re having any trouble with some of the concepts we’ve gone through today :)