BigQuery DataFrames로 회귀 모델 만들기
컬렉션을 사용해 정리하기
내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.
BigQuery DataFrames API를 사용하여 펭귄의 질량에 대한 선형 회귀 모델을 만듭니다.
더 살펴보기
이 코드 샘플이 포함된 자세한 문서는 다음을 참조하세요.
코드 샘플
달리 명시되지 않는 한 이 페이지의 콘텐츠에는 Creative Commons Attribution 4.0 라이선스에 따라 라이선스가 부여되며, 코드 샘플에는 Apache 2.0 라이선스에 따라 라이선스가 부여됩니다. 자세한 내용은 Google Developers 사이트 정책을 참조하세요. 자바는 Oracle 및/또는 Oracle 계열사의 등록 상표입니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis example demonstrates creating a linear regression model to predict penguin body mass using the BigQuery DataFrames API.\u003c/p\u003e\n"],["\u003cp\u003eThe code utilizes the \u003ccode\u003ebigquery-public-data.ml_datasets.penguins\u003c/code\u003e dataset, specifically focusing on the Adelie Penguin species.\u003c/p\u003e\n"],["\u003cp\u003eThe script involves loading data, filtering by species, dropping irrelevant columns, handling null values, and splitting data into training sets.\u003c/p\u003e\n"],["\u003cp\u003eA \u003ccode\u003eLinearRegression\u003c/code\u003e model is created, trained, and scored using specified feature and label columns, and predictions are made on the test set.\u003c/p\u003e\n"],["\u003cp\u003eThe sample uses the BigQuery DataFrames library with the python language.\u003c/p\u003e\n"]]],[],null,["# Create a regression model with BigQuery DataFrames\n\nCreate a linear regression model on the body mass of penguins using the BigQuery DataFrames API.\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Use BigQuery DataFrames](/bigquery/docs/use-bigquery-dataframes)\n\nCode sample\n-----------\n\n### Python\n\n\nBefore trying this sample, follow the Python setup instructions in the\n[BigQuery quickstart using\nclient libraries](/bigquery/docs/quickstarts/quickstart-client-libraries).\n\n\nFor more information, see the\n[BigQuery Python API\nreference documentation](/python/docs/reference/bigquery/latest).\n\n\nTo authenticate to BigQuery, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for client libraries](/bigquery/docs/authentication#client-libs).\n\n from bigframes.ml.linear_model import LinearRegression\n import bigframes.pandas as bpd\n\n # Load data from BigQuery\n query_or_table = \"bigquery-public-data.ml_datasets.penguins\"\n bq_df = bpd.read_gbq(query_or_table)\n\n # Filter down to the data to the Adelie Penguin species\n adelie_data = bq_df[bq_df.species == \"Adelie Penguin (Pygoscelis adeliae)\"]\n\n # Drop the species column\n adelie_data = adelie_data.drop(columns=[\"species\"])\n\n # Drop rows with nulls to get training data\n training_data = adelie_data.dropna()\n\n # Specify your feature (or input) columns and the label (or output) column:\n feature_columns = training_data[\n [\"island\", \"culmen_length_mm\", \"culmen_depth_mm\", \"flipper_length_mm\", \"sex\"]\n ]\n label_columns = training_data[[\"body_mass_g\"]]\n\n test_data = adelie_data[adelie_data.body_mass_g.isnull()]\n\n # Create the linear model\n model = LinearRegression()\n model.fit(feature_columns, label_columns)\n\n # Score the model\n score = model.score(feature_columns, label_columns)\n\n # Predict using the model\n result = model.predict(test_data)\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=bigquery)."]]