how-to-guides

No menu items for this category

Auto PII Tagging

Auto PII tagging for Sensitive/NonSensitive at the column level is performed based on the two approaches described below.

PII Tagging is only available during Profiler Ingestion.

  1. Column Name Scanner: We validate the column names of the table against a set of regex rules that help us identify common English patterns to identify email addresses, SSN, bank accounts, etc.
  2. Entity Recognition: If the sample data ingestion is enabled, we'll validate the sample rows against an Entity Recognition engine that will bring up any sensitive information from a list of supported entities. In that case, the confidence parameter lets you tune the minimum score required to tag a column as PII.Sensitive.

Note that if a column is already tagged as PII, we will ignore its execution.

If you see an error similar to:

This is a scenario that we identified on some corporate Windows laptops. The bottom-line here is that the profiler is trying to download the Entity Recognition model but having certificate issues when trying the request.

A solution here is to manually download the model on the ingestion container / Airflow host by running:

If using Docker, you might want to customize the openmetadata-ingestion image to have this command run there by default.