{ "cells": [ { "cell_type": "markdown", "id": "192f9035-a727-482e-9e19-edd6379a11f1", "metadata": {}, "source": [ "# Conformal classification using CV+ in a [`Pipeline`](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html)" ] }, { "cell_type": "markdown", "id": "64f29941-60c1-4f93-bc00-6cd2c8ad20dd", "metadata": {}, "source": [ "This tutorial demonstrates how to use `CoverForestClassifier` in a scikit-learn pipeline for CV+ conformal classification on the breast cancer dataset." ] }, { "cell_type": "code", "execution_count": 1, "id": "a1480a5a-a9e6-4e6e-a4fc-7f139bfa4be3", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from matplotlib import pyplot as plt\n", "from sklearn import tree\n", "from sklearn.datasets import load_breast_cancer\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "from coverforest import CoverForestClassifier\n", "from coverforest.metrics import average_set_size_loss, classification_coverage_score" ] }, { "cell_type": "markdown", "id": "14abb88b-19fb-41b0-88c2-3e8411ac9549", "metadata": {}, "source": [ "Load the dataset and split it into training and testing sets" ] }, { "cell_type": "code", "execution_count": 2, "id": "a22d7730-643b-491c-ba0a-61f02df2790f", "metadata": {}, "outputs": [], "source": [ "X, y = load_breast_cancer(return_X_y=True, as_frame=True)\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)" ] }, { "cell_type": "markdown", "id": "0f00091c-c344-4059-b084-fb3f92869f7b", "metadata": {}, "source": [ "## Making a pipeline" ] }, { "cell_type": "markdown", "id": "be1a9971-81df-48f5-97be-aff809806491", "metadata": {}, "source": [ "We'll create a scikit-learn pipeline that normalizes the features first before fitting with `CoverForestClassifier`." ] }, { "cell_type": "code", "execution_count": 3, "id": "b6628ee7-4fc0-4eae-88cb-7675cb36d7a8", "metadata": {}, "outputs": [], "source": [ "pipe = Pipeline(\n", " [\n", " (\"scaler\", StandardScaler()),\n", " (\n", " \"clf\",\n", " CoverForestClassifier(\n", " n_estimators=100, method=\"cv\", cv=10, random_state=0, verbose=1\n", " ),\n", " ),\n", " ]\n", ")" ] }, { "cell_type": "markdown", "id": "d230a0c3-9848-4bf2-b3db-94ffd7e8a615", "metadata": {}, "source": [ "We'll now fit the pipeline on the training data." ] }, { "cell_type": "code", "execution_count": 4, "id": "743a90f8-c03e-49e8-950a-388181763411", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Searching regularization parameters...\n", "Fitting with k = 1 and lambda = 0.1.\n" ] }, { "data": { "text/html": [ "
Pipeline(steps=[('scaler', StandardScaler()),\n", " ('clf',\n", " CoverForestClassifier(cv=10, n_estimators=100, random_state=0,\n", " verbose=1))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('scaler', StandardScaler()),\n", " ('clf',\n", " CoverForestClassifier(cv=10, n_estimators=100, random_state=0,\n", " verbose=1))])
StandardScaler()
CoverForestClassifier(cv=10, n_estimators=100, random_state=0, verbose=1)
FastRandomForestClassifier(random_state=1162135467)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
FastRandomForestClassifier(random_state=1162135467)