{ "cells": [ { "cell_type": "markdown", "id": "a1b69222-3851-4c8f-ad5f-eda29c6c562b", "metadata": {}, "source": [ "# Conditional Probability (Discrete)" ] }, { "cell_type": "code", "execution_count": 1, "id": "ba05e68d-03c8-478e-a9c2-b66c8d75c158", "metadata": {}, "outputs": [], "source": [ "# Import some helper functions (please ignore this!)\n", "from utils import * " ] }, { "cell_type": "markdown", "id": "9e7b000d-2aa4-423e-b049-0cb5827051bb", "metadata": {}, "source": [ "**Context:** You've already spent some time conducting a preliminary exploratory data analysis (EDA) of IHH's ER data. You noticed that considering variables separately can result in misleading information. As such, today you will continue your EDA, this time also considering the *relationship between variables*. For example, you may want to know:\n", "\n", "* Are there certain conditions that are more likely to occur on certain days?\n", "* What makes a patient likely to need hospitalization?\n", "\n", "**Challenge:** So far, however, we've only seen ways of characterizing the variability/stochasticity of a univariate random phenomenon independently of other variables. So how can we consider the relationship between variables? Answer: conditional probability. \n", "\n", "**Outline:** \n", "1. Introduce and practice the concepts, terminology, and notation behind discrete conditional probability distributions (leaving continuous distributions to a later time).\n", "2. Answer the above questions using this new toolset.\n", "\n", "Before getting started, let's load in our IHH ER data:" ] }, { "cell_type": "code", "execution_count": 2, "id": "24cc97e7-cb98-4f23-a388-3973fa3cde63", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Day-of-WeekConditionHospitalizedAntibioticsKnots
Patient ID
9394FridayAllergic ReactionNoNo0
898SundayAllergic ReactionYesYes0
2398SaturdayEntangled AntennasNoNo3
5906SaturdayAllergic ReactionNoNo0
2343MondayHigh FeverYesNo0
8225ThursdayHigh FeverYesNo0
5506TuesdayHigh FeverNoNo0
6451ThursdayAllergic ReactionNoNo0
2670SundayIntoxicationNoNo0
3497TuesdayAllergic ReactionNoNo0
1087MondayHigh FeverYesNo0
1819TuesdayHigh FeverYesNo0
2308TuesdayAllergic ReactionNoNo0
6084MondayHigh FeverNoNo0
3724TuesdayAllergic ReactionYesYes0
\n", "
" ], "text/plain": [ " Day-of-Week Condition Hospitalized Antibiotics Knots\n", "Patient ID \n", "9394 Friday Allergic Reaction No No 0\n", "898 Sunday Allergic Reaction Yes Yes 0\n", "2398 Saturday Entangled Antennas No No 3\n", "5906 Saturday Allergic Reaction No No 0\n", "2343 Monday High Fever Yes No 0\n", "8225 Thursday High Fever Yes No 0\n", "5506 Tuesday High Fever No No 0\n", "6451 Thursday Allergic Reaction No No 0\n", "2670 Sunday Intoxication No No 0\n", "3497 Tuesday Allergic Reaction No No 0\n", "1087 Monday High Fever Yes No 0\n", "1819 Tuesday High Fever Yes No 0\n", "2308 Tuesday Allergic Reaction No No 0\n", "6084 Monday High Fever No No 0\n", "3724 Tuesday Allergic Reaction Yes Yes 0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Import a bunch of libraries we'll be using below\n", "import pandas as pd\n", "import matplotlib.pylab as plt\n", "import numpyro\n", "import numpyro.distributions as D\n", "import jax\n", "import jax.numpy as jnp\n", "\n", "# Load the data into a pandas dataframe\n", "csv_fname = 'data/IHH-ER.csv'\n", "data = pd.read_csv(csv_fname, index_col='Patient ID')\n", "\n", "# Print a random sample of patients, just to see what's in the data\n", "data.sample(15, random_state=0)" ] }, { "cell_type": "markdown", "id": "bf4f3873-14b9-4011-984e-9a449230cb81", "metadata": {}, "source": [ "## Terminology and Notation\n", "\n", "As with (non-conditional) discrete probability, the statistical language---terminology and notation---we introduce here will allow us to precisely specify to a computer how to model our data. In the future, we will translate statements in this language directly into code that a computer can run.\n", "\n", "**Concept.** Conditional probabilities allow us to ask questions of the form, \"given that $A$ is true, what's the probability of $B$?\". Although simple, this idea is actually quite powerful; all *predictive models* you may have heard of (e.g. regression, classification, etc.) are formulated using *conditional distributions*. To see what we mean, let's start with an example.\n", "\n", "**Example.** Suppose you're working at the IHH ER, and you want to *predict* what is the probability that the next patient comes in with `Condition == \"Intoxication\"`. Given previously collected data, you can estimate this probability by counting the number of patients for which `Condition == \"Intoxication\"` and dividing by the total number of patients:\n", "\\begin{align}\n", "\\text{Probability of intoxication} = \\frac{\\text{Number of patients with intoxication}}{\\text{Total number of patients}}\n", "\\end{align}\n", "\n", "We'll call this probability the \"naive predictor.\" Now, let's compute this naive predictor on our IHH ER data:" ] }, { "cell_type": "code", "execution_count": 3, "id": "847891ff-d424-4755-a966-203e2b733c47", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Portion with Intoxication (Naive Predictor) = 0.171\n" ] } ], "source": [ "num_intoxicated = len(data[data['Condition'] == 'Intoxication'])\n", "num_total = len(data)\n", "naive_probability_of_intoxication = num_intoxicated / float(num_total)\n", "\n", "print('Portion with Intoxication (Naive Predictor) =', round(naive_probability_of_intoxication, 3))" ] }, { "cell_type": "markdown", "id": "61fa7e31-d8b3-438c-9fe1-6cfc754ba5cc", "metadata": {}, "source": [ "However, you also know that even in far reaches of the outer universe, beings work Mondays through Fridays, taking Saturdays and Sundays off. Therefore, you suspect intoxication may be more likely to occur on weekends. You decide to check whether your intuition is true here. If it's true, will you improve your ability to predict how likely the next patient is to come with intoxication?\n", "\n", "We can modify the naive predictor above as follows to condition on the day of the week:\n", "\\begin{align}\n", "\\text{Probability of intoxication given day $d$} = \\frac{\\text{Number of patients with intoxication on day $d$}}{\\text{Total number of patients on day $d$}}\n", "\\end{align}" ] }, { "cell_type": "code", "execution_count": 4, "id": "12b07c7f-73c3-46b4-a08c-f16b643e30c4", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']\n", "\n", "probabilities = []\n", "\n", "# Iterate over the days of the week\n", "for day in days_of_week:\n", " # Select all patients that came in on the specific day of the week\n", " patients_on_day = data[(data['Day-of-Week'] == day)]\n", "\n", " # Of the selected patients, further select patients with intoxication\n", " patient_intoxicated_on_day = patients_on_day[patients_on_day['Condition'] == 'Intoxication']\n", "\n", " # Compute the portion of patients with intoxication on this day\n", " portion_intoxicated_on_day = float(len(patient_intoxicated_on_day)) / float(len(patients_on_day))\n", "\n", " probabilities.append(portion_intoxicated_on_day)\n", "\n", "# Plot!\n", "plt.bar(days_of_week, probabilities, label='Conditional Predictor')\n", "plt.axhline(naive_probability_of_intoxication, color='red', label='Naive Predictor')\n", "\n", "# Add axis labels and titles\n", "plt.xticks(rotation=30)\n", "plt.xlabel('Day of Week')\n", "plt.ylabel('Probability of Intoxication')\n", "plt.title('Conditional Probability of Intoxication Given Day at the IHH ER')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "4e211da2-3a2e-40d6-b554-b88c1f8cf714", "metadata": {}, "source": [ "As you can see, the probability of a patient arriving with intoxication changes *significantly* from the naive predictor (above) if we consider the day of the week. Specifically, the above plot shows us that our naive predictor\n", "1. significantly *over-estimates* the probability of intoxication on weekdays, and\n", "2. significantly *under-estimates* the probability of intoxication on weekends.\n", "\n", "Using a conditional distribution, we can leverage additional information (day of the week) to improve our prediction!" ] }, { "cell_type": "markdown", "id": "9800ff5a-202d-4971-8a7d-aac8912565ea", "metadata": {}, "source": [ "**Definition and Notation:** A conditional probability is a probability distribution that changes as a function of another random variable. \n", "\n", "> Continuing with the above example, \n", "> * Let $D$ denote the day of the week.\n", "> * Let $I$ denote whether the patient arrives with intoxication.\n", "> \n", "> Here, $p_I(\\cdot)$ describes the (non-conditional) probability that a patient arrives with intoxication. It represents our *naive*, inaccurate prediction. In contrast, $p_{I | D}(\\cdot | d)$ describes the *conditional* probability of \"intoxication given the day\"---the probability of intoxication changes from weekdays to weekends. In this notation, what comes on the right side of the vertical line is the \"condition\" (here, $D = d$)." ] }, { "cell_type": "markdown", "id": "cc146ed7-d09b-44b2-aebb-15e0d9ab2cc9", "metadata": {}, "source": [ "**Sample Space or Support:** Since a discrete conditional distribution is still a discrete distribution, all notation/terminology from discrete probability still holds. \n", "\n", "> For our running example, the sample space is that of variable to the left of the line, $I$. That is, the sample space of $p_{I | D}(\\cdot | d)$ is $I \\in \\{ 0, 1 \\}$ (with 1 means intoxicated and 0 means not intoxicated). " ] }, { "cell_type": "markdown", "id": "9ef3cc8e-8bca-41ea-a913-90561854b110", "metadata": {}, "source": [ "**Probability Mass Function (PMF):** The PMF is, again, that of the variable to the left of the vertical line. What makes a conditional probability different from a non-conditional distribution, however, is that the parameter of the distribution is now a *function of the condition*.\n", "> In our example, the PMF is that of a Bernoulli random variable (since $I$ can only take on two values). Since it's a *conditional* distribution, it's parameter depends on the condition (the day $D = d$). We can write this as follows:\n", "> \\begin{align}\n", "p_{I | D}(\\cdot | d) = \\mathrm{Ber}(\\rho(d)),\n", "\\end{align}\n", "> where\n", "> \\begin{align}\n", "p_{I | D}(i | d) = \\underbrace{\\rho(d)^{i} \\cdot \\left(1 - \\rho(d) \\right)^{1 - i}}_{\\text{Bernoulli PMF (see Wikipedia)}},\n", "\\end{align}\n", "> and where\n", "> \\begin{align} \\rho(d) &= \\begin{cases}\n", "0.1 & \\text{if $d$ is weekday} \\\\\n", "0.4 & \\text{if $d$ is weekend} \n", "\\end{cases} \n", "\\end{align}\n", "> In a sense, a conditional probability is the \"if/else-expression of probability.\"" ] }, { "cell_type": "markdown", "id": "eb5b992a-2b37-4b51-8c44-797eb1f1487e", "metadata": {}, "source": [ "**Independent, Identically Distributed (i.i.d):** Just as before, a variable can be sampled i.i.d from a distribution.\n", "\n", "> Given $D = d$, we write that $I$ is sampled i.i.d from the conditional as follows: $I | d \\sim p_{I | D}(\\cdot | d)$. This means that, given the day (e.g. $d = \\mathrm{Monday}$), observing one patient with intoxication tells us nothing about the probability of observing another patient with intoxication. Note that without conditioning on the day, this is not true: observing many patients with intoxication could tell us that the current day is on a weekend, which means the probability of intoxication is higher overall." ] }, { "cell_type": "markdown", "id": "222bc7b2-8e5b-4ea5-9e0e-9ba4ba84153b", "metadata": {}, "source": [ "**Summary of Notation:**\n", "* Let $R$ and $C$ denote two RVs. \n", "* $R | c$ is then an RV describing \"$R$ given $C = c$\".\n", "* $p_{R | C}(r | c)$ is the evaluation of the conditional PMF at $r$: i.e. given that $C = c$, what's the probability that $R = r$?\n", "* $p_{R | C}(\\cdot | c)$ is the conditional PMF of $R | c$. The dot represents the fact that we're representing the *whole* distribution---we haven't yet ask about the probability of $R = r$ as above.\n", "* $R | c \\sim p_{R | C}(\\cdot | c)$ denotes that $R | c$ is sampled i.i.d. from $p_{R | C}(\\cdot | c)$" ] }, { "cell_type": "markdown", "id": "116a650c-51d2-4e4a-8b4b-f70662b015a7", "metadata": {}, "source": [ "```{admonition} Exercise: Fit conditional distributions by hand\n", "Let us define the following RVs:\n", "* $D$: Day-of-Week\t\n", "* $C$: Condition\t\n", "* $H$: Hospitalized\t\n", "* $A$: Antibiotics\n", "* $K$: Knots\n", "\n", "Our goal is to learn the distributions of the following conditional RVs:\n", "1. $C | D$\n", "2. $H | C$\n", "3. $K | C$\n", "4. $A | C, H$ (here, we condition on *two* RVs)\n", "\n", "Each one of these conditional distributions represents a *predictive model*. For example, (1) says \"given that the day is $D = d$, predict how likely is a patient to arrive with condition $C =c$\"? \n", "\n", "**Part 1:** By exploring the data (as we did in the above example for \"intoxication given day\"), empirically estimate each conditional distribution above. When we say, \"estimate the conditional distribution,\" we mean you estimate the distribution for every condition; for example, for $C | D$, we want you to empirically estimate $C | D$ for *every* $D = d$. Use the notation we introduced to write your answer. Don't forget to show your work with all the plots you generate!\n", "\n", "**Part 2:** Compare each conditional distribution with its corresponding non-conditional version (these are called *marginals*) from before. What differences do you notice? How can the differences mislead the IHH ER?\n", "```" ] }, { "cell_type": "markdown", "id": "829e2060-8c62-4194-a978-cfe77e27caad", "metadata": {}, "source": [ "## Getting Familiar with Distributions in `NumPyro`\n", "\n", "Now that we've learned some conditional distributions by hand, we'll introduce the framework we'll use to implement our ML models: `NumPyro`. And specifically, we'll introduce one of the main building blocks in `NumPyro`: distributions.\n", "\n", "**What is `NumPyro`?** `NumPyro` is a \"Probabilistic Programming Language\" based in `Jax`. It provides an interface for (nearly) direct translation of the stats/math we wrote above into code that we can use to fit to data, make predictions, and more. This will allow us to focus on the conceptual ideas behind probabilistic ML. \n", "\n", "**Instantiating Distributions in `NumPyro`.** `NumPyro` comes with many distributions already implemented. For a complete list of all available discrete distributions, check out the [this part of the documentation](https://num.pyro.ai/en/stable/distributions.html#discrete-distributions). So why use `NumPyro` instead of implementing the distributions on our own? It's easy to write subtle bugs that are hard to catch when implementing mathematical formulas in code. Also, using `NumPyro`'s distributions will help us highlight the overall *logic* of the code, instead of getting bogged down by the mathematical details. \n", "\n", "Distributions in `NumPyro` have several notable properties and methods we will rely on. Let's explore them together. First, we import the necessary components of `NumPyro`:" ] }, { "cell_type": "code", "execution_count": 5, "id": "ca957c4a-a908-48b7-8061-c504577151e5", "metadata": {}, "outputs": [], "source": [ "import jax.numpy as jnp\n", "import jax.random as jrandom\n", "import numpyro\n", "import numpyro.distributions as D" ] }, { "cell_type": "markdown", "id": "6e74c5f2-d631-48d9-993a-96a3266ab676", "metadata": {}, "source": [ "Now, let's instantiate the simplest discrete distribution we know, the Bernoulli distribution, to describe the naive predictor from earlier.\n", "\\begin{align}\n", "p_I(i) &= \\mathrm{Ber}(\\rho) = \\rho^i \\cdot (1 - \\rho)^{1 - i}\n", "\\end{align}\n", "Recall that a Bernoulli distribution takes in just one parameter, $\\rho \\in [0, 1]$, which determines the probability of sampling $I = 1$ vs. $I = 0$ (or Yes vs. No). Here let's instantiate the Bernoulli distribution with $\\rho = 0.2$." ] }, { "cell_type": "code", "execution_count": 6, "id": "f5f25a53-7f1b-41c9-97e2-460b30f4b8f2", "metadata": {}, "outputs": [], "source": [ "rho = jnp.array(0.2)\n", "p_I = D.Bernoulli(rho)" ] }, { "cell_type": "markdown", "id": "5318648b-3b9d-42cd-ae95-bf0b597df45b", "metadata": {}, "source": [ "That's it! \n", "\n", "**Evaluating the PMF of `NumPyro` Distributions.** Now, if we want to evaluate the PMF, $p_I(i)$, we can use `log_prob` method as follows (note that this returns the *log* of the PMF, so we'll have to exponentiate the result):" ] }, { "cell_type": "code", "execution_count": 7, "id": "34a2893c-1049-4f88-99c9-baa703e43086", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability of sampling a 1: 0.2\n", "Probability of sampling a 0: 0.8\n" ] } ], "source": [ "log_p_I_eq_1 = p_I.log_prob(jnp.array(1.0))\n", "print('Probability of sampling a 1:', jnp.exp(log_p_I_eq_1))\n", "\n", "log_p_I_eq_0 = p_I.log_prob(jnp.array(0.0))\n", "print('Probability of sampling a 0:', jnp.exp(log_p_I_eq_0))" ] }, { "cell_type": "markdown", "id": "c8d7f0ef-00af-4ec2-8bd5-3f1929f1553c", "metadata": {}, "source": [ "**Sampling from `NumPyro` Distributions.** `NumPyro` distributions all have a `sample` method which can be used to draw samples. It takes in two arguments:\n", "1. A random number generator \"key,\" which controls the randomness of the sample.\n", "2. A shape, describing the number of i.i.d samples you want to draw.\n", "\n", "Let's give it a go:" ] }, { "cell_type": "code", "execution_count": 8, "id": "e020379e-51ac-49b8-ac98-a532c9b1e53a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First batch drawn with key1: [0 0 1 0 0 0 0 0 0 0 1 0 0 0 0]\n", "Second batch drawn with key1: [0 0 1 0 0 0 0 0 0 0 1 0 0 0 0]\n", "Third batch drawn with key2: [0 1 1 0 0 0 0 0 0 1 0 1 0 1 1]\n" ] } ], "source": [ "shape = (15,) # Shape of i.i.d samples we wish to draw \n", "\n", "key1 = jrandom.PRNGKey(seed=0) # Create a random number generator key\n", "print('First batch drawn with key1: ', p_I.sample(key1, shape))\n", "print('Second batch drawn with key1:', p_I.sample(key1, shape))\n", "\n", "key2 = jrandom.PRNGKey(seed=1) # Create a random number generator key\n", "print('Third batch drawn with key2: ', p_I.sample(key2, shape))" ] }, { "cell_type": "markdown", "id": "f99df4ee-a979-4192-8b9a-adff43fef033", "metadata": {}, "source": [ "Notice in the above code, when using the same key twice (or the same `seed`), we get the *exact same batch of samples*. This is both a blessing and a curse. It's a blessing because this allows us to precisely control the randomness of our ML code. This will prove crucial for debugging later on. However, it can also be a curse if we accidentally use the same key in a place where we need two different sources of randomness.\n", "\n", "**Best Practice: How to Manage Your Keys.** We will follow two rules of thumb:\n", "1. Make only ONE CALL to `jrandom.PRNGKey` in your entire code.\n", "2. Never use the same key twice.\n", "\n", "But if we're restricting ourselves to only creating one key with `jrandom.PRNGKey`, how can we possibly call `sample` multiple times with different keys? `Jax` allows us to take a random key and split it into multiple different keys, each of which can be used for different purposes. This means we can create ONE KEY to control the randomness of our entire code. We can then split this key into multiple keys as needed. Here's how we can do this:" ] }, { "cell_type": "code", "execution_count": 9, "id": "70033186-6baf-4390-9cb2-18e182f54d0a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First batch drawn with key_first: [0 0 0 0 1 0 1 1 0 0 0 0 0 0 0]\n", "Second batch drawn with key_second: [0 0 1 0 0 0 0 0 0 0 0 0 1 0 0]\n", "Third batch drawn with key_third: [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]\n" ] } ], "source": [ "# Create ONE KEY to be used by your ENTIRE CODE\n", "key = jrandom.PRNGKey(seed=0)\n", "\n", "# Whenever you need to use the key for multiple purposes, split it into parts:\n", "key_first, key_second, key_third = jrandom.split(key, 3) \n", "\n", "# Use a different key for each need\n", "print('First batch drawn with key_first: ', p_I.sample(key_first, shape))\n", "print('Second batch drawn with key_second:', p_I.sample(key_second, shape))\n", "print('Third batch drawn with key_third: ', p_I.sample(key_third, shape))" ] }, { "cell_type": "markdown", "id": "2914c1e6-a93a-4c64-9309-a23d8cb62353", "metadata": {}, "source": [ "**Conditional Distributions in `NumPyro`:** Now that we've implemented the naive predictor, let's implement our better predictor, $p_{I | D}(i | d)$. Recall the only difference between this predictor and the naive predictor is that the parameter of the distribution, $\\rho$, now depends on the day, $d$." ] }, { "cell_type": "code", "execution_count": 10, "id": "925708c9-e4c5-4651-9a6e-242bb7c12ac9", "metadata": {}, "outputs": [], "source": [ "def p_intoxication_given_day(day):\n", " '''\n", " Assume day is an integer from 0 to 6 (Monday to Sunday)\n", " ''' \n", "\n", " rho_given_d = jnp.array([0.1, 0.1, 0.1, 0.1, 0.1, 0.4, 0.4])\n", " \n", " p_I_given_d = D.Bernoulli(rho_given_d[day])\n", "\n", " return p_I_given_d\n", "\n", "# Example uses\n", "p_I_given_Monday = p_intoxication_given_day(jnp.array(0))\n", "p_I_given_Saturday = p_intoxication_given_day(jnp.array(5))" ] }, { "cell_type": "markdown", "id": "0302a8cd-331f-4c0a-8706-6684b87dd8be", "metadata": {}, "source": [ "In the above, `p_I_given_Monday` and `p_I_given_Saturday` are just `NumPyro` Bernoulli distributions, so you can use their `log_prob` and `sample` functions just like before." ] }, { "cell_type": "markdown", "id": "6cb03f3a-2531-43fc-92e9-6af8d0ba4666", "metadata": {}, "source": [ "```{admonition} Exercise: Implement conditional distributions \n", "\n", "**Part 1:** Implement each one of the conditional distributions from the previous exercise in `NumPyro`, following the example of `p_intoxication_given_day` above. \n", "\n", "Note: `NumPyro` discrete distributions only work with integers, not strings. For example, instead of using $d = \\text{Monday}$, you should convert the days of the week into integers from 0 to 6 (Monday to Sunday), and instead use $d = 0$ for \"Monday\". We've created two helper functions to help you with this conversion: `convert_day_of_week_to_int` and `convert_condition_to_int`. You can use them as follows:\n", " * `convert_day_of_week_to_int(data['Day-of-Week'])`\n", " * `convert_condition_to_int(data['Condition'])`\n", "\n", "**Part 2:** Verify that your implementation is correct by sampling from each conditional distribution and eye-balling that the samples look correct. Be sure to follow the best practices above when using a random number generator key. \n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "adb71ab1-628a-415d-907c-db4f3faa6a8c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1af262fc-e160-4fef-ae88-39235f400719", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }