{ "cells": [ { "cell_type": "markdown", "id": "a1b69222-3851-4c8f-ad5f-eda29c6c562b", "metadata": {}, "source": [ "# Conditional Probability (Discrete)" ] }, { "cell_type": "code", "execution_count": 1, "id": "ba05e68d-03c8-478e-a9c2-b66c8d75c158", "metadata": {}, "outputs": [], "source": [ "# Import some helper functions (please ignore this!)\n", "from utils import * " ] }, { "cell_type": "markdown", "id": "9e7b000d-2aa4-423e-b049-0cb5827051bb", "metadata": {}, "source": [ "**Context:** You've already spent some time conducting a preliminary exploratory data analysis (EDA) of IHH's ER data. You noticed that considering variables separately can result in misleading information. As such, today you will continue your EDA, this time also considering the *relationship between variables*. For example, you may want to know:\n", "\n", "* Are there certain conditions that are more likely to occur on certain days?\n", "* What makes a patient likely to need hospitalization?\n", "\n", "**Challenge:** So far, however, we've only seen ways of characterizing the variability/stochasticity of a univariate random phenomenon independently of other variables. So how can we consider the relationship between variables? Answer: conditional probability. \n", "\n", "**Outline:** \n", "1. Introduce and practice the concepts, terminology, and notation behind discrete conditional probability distributions (leaving continuous distributions to a later time).\n", "2. Answer the above questions using this new toolset.\n", "\n", "Before getting started, let's load in our IHH ER data:" ] }, { "cell_type": "code", "execution_count": 2, "id": "24cc97e7-cb98-4f23-a388-3973fa3cde63", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Day-of-Week | \n", "Condition | \n", "Hospitalized | \n", "Antibiotics | \n", "Knots | \n", "
---|---|---|---|---|---|
Patient ID | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
9394 | \n", "Friday | \n", "Allergic Reaction | \n", "No | \n", "No | \n", "0 | \n", "
898 | \n", "Sunday | \n", "Allergic Reaction | \n", "Yes | \n", "Yes | \n", "0 | \n", "
2398 | \n", "Saturday | \n", "Entangled Antennas | \n", "No | \n", "No | \n", "3 | \n", "
5906 | \n", "Saturday | \n", "Allergic Reaction | \n", "No | \n", "No | \n", "0 | \n", "
2343 | \n", "Monday | \n", "High Fever | \n", "Yes | \n", "No | \n", "0 | \n", "
8225 | \n", "Thursday | \n", "High Fever | \n", "Yes | \n", "No | \n", "0 | \n", "
5506 | \n", "Tuesday | \n", "High Fever | \n", "No | \n", "No | \n", "0 | \n", "
6451 | \n", "Thursday | \n", "Allergic Reaction | \n", "No | \n", "No | \n", "0 | \n", "
2670 | \n", "Sunday | \n", "Intoxication | \n", "No | \n", "No | \n", "0 | \n", "
3497 | \n", "Tuesday | \n", "Allergic Reaction | \n", "No | \n", "No | \n", "0 | \n", "
1087 | \n", "Monday | \n", "High Fever | \n", "Yes | \n", "No | \n", "0 | \n", "
1819 | \n", "Tuesday | \n", "High Fever | \n", "Yes | \n", "No | \n", "0 | \n", "
2308 | \n", "Tuesday | \n", "Allergic Reaction | \n", "No | \n", "No | \n", "0 | \n", "
6084 | \n", "Monday | \n", "High Fever | \n", "No | \n", "No | \n", "0 | \n", "
3724 | \n", "Tuesday | \n", "Allergic Reaction | \n", "Yes | \n", "Yes | \n", "0 | \n", "