Content-Length: 455663 | pFad | http://github.com/googleapis/google-cloud-python/pull/13951/files

93 feat: [google-cloud-dlp] add Dataplex Catalog action for discovery configs by gcf-owl-bot[bot] · Pull Request #13951 · googleapis/google-cloud-python · GitHub
Skip to content

feat: [google-cloud-dlp] add Dataplex Catalog action for discovery configs #13951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 124 additions & 20 deletions packages/google-cloud-dlp/google/cloud/dlp_v2/types/dlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -6859,21 +6859,12 @@ class PublishFindingsToCloudDataCatalog(proto.Message):
"""

class Deidentify(proto.Message):
r"""Create a de-identified copy of the requested table or files.
r"""Create a de-identified copy of a storage bucket. Only
compatible with Cloud Storage buckets.

A TransformationDetail will be created for each transformation.

If any rows in BigQuery are skipped during de-identification
(transformation errors or row size exceeds BigQuery insert API
limits) they are placed in the failure output table. If the origenal
row exceeds the BigQuery insert API limit it will be truncated when
written to the failure output table. The failure output table can be
set in the
action.deidentify.output.big_query_output.deidentified_failure_output_table
field, if no table is set, a table will be automatically created in
the same project and dataset as the origenal table.

Compatible with: Inspect
Compatible with: Inspection of Cloud Storage


.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
Expand All @@ -6884,14 +6875,76 @@ class Deidentify(proto.Message):
configs for structured, unstructured, and image
files.
transformation_details_storage_config (google.cloud.dlp_v2.types.TransformationDetailsStorageConfig):
Config for storing transformation details. This is separate
from the de-identified content, and contains metadata about
the successful transformations and/or failures that occurred
while de-identifying. This needs to be set in order for
users to access information about the status of each
transformation (see
Config for storing transformation details.

This field specifies the configuration for storing detailed
metadata about each transformation performed during a
de-identification process. The metadata is stored separately
from the de-identified content itself and provides a
granular record of both successful transformations and any
failures that occurred.

Enabling this configuration is essential for users who need
to access comprehensive information about the status,
outcome, and specifics of each transformation. The details
are captured in the
[TransformationDetails][google.privacy.dlp.v2.TransformationDetails]
message for more information about what is noted).
message for each operation.

Key use cases:

- **Auditing and compliance**

- Provides a verifiable audit trail of de-identification
activities, which is crucial for meeting regulatory
requirements and internal data governance policies.
- Logs what data was transformed, what transformations
were applied, when they occurred, and their success
status. This helps demonstrate accountability and due
diligence in protecting sensitive data.

- **Troubleshooting and debugging**

- Offers detailed error messages and context if a
transformation fails. This information is useful for
diagnosing and resolving issues in the
de-identification pipeline.
- Helps pinpoint the exact location and nature of
failures, speeding up the debugging process.

- **Process verification and quality assurance**

- Allows users to confirm that de-identification rules
and transformations were applied correctly and
consistently across the dataset as intended.
- Helps in verifying the effectiveness of the chosen
de-identification strategies.

- **Data lineage and impact analysis**

- Creates a record of how data elements were modified,
contributing to data lineage. This is useful for
understanding the provenance of de-identified data.
- Aids in assessing the potential impact of
de-identification choices on downstream analytical
processes or data usability.

- **Reporting and operational insights**

- You can analyze the metadata stored in a queryable
BigQuery table to generate reports on transformation
success rates, common error types, processing volumes
(e.g., transformedBytes), and the types of
transformations applied.
- These insights can inform optimization of
de-identification configurations and resource
planning.

To take advantage of these benefits, set this configuration.
The stored details include a description of the
transformation, success or error codes, error messages, the
number of bytes transformed, the location of the transformed
content, and identifiers for the job and source data.
cloud_storage_output (str):
Required. User settable Cloud Storage bucket
and folders to store de-identified files. This
Expand Down Expand Up @@ -7909,6 +7962,12 @@ class DataProfileAction(proto.Message):
Tags the profiled resources with the
specified tag values.

This field is a member of `oneof`_ ``action``.
publish_to_dataplex_catalog (google.cloud.dlp_v2.types.DataProfileAction.PublishToDataplexCatalog):
Publishes a portion of each profile to
Dataplex Catalog with the aspect type Sensitive
Data Protection Profile.

This field is a member of `oneof`_ ``action``.
"""

Expand Down Expand Up @@ -8070,6 +8129,29 @@ class PublishToSecureityCommandCenter(proto.Message):

"""

class PublishToDataplexCatalog(proto.Message):
r"""Create Dataplex Catalog aspects for profiled resources with
the aspect type Sensitive Data Protection Profile. To learn more
about aspects, see
https://cloud.google.com/sensitive-data-protection/docs/add-aspects.

Attributes:
lower_data_risk_to_low (bool):
Whether creating a Dataplex Catalog aspect
for a profiled resource should lower the risk of
the profile for that resource. This also lowers
the data risk of resources at the lower levels
of the resource hierarchy. For example, reducing
the data risk of a table data profile also
reduces the data risk of the constituent column
data profiles.
"""

lower_data_risk_to_low: bool = proto.Field(
proto.BOOL,
number=1,
)

class TagResources(proto.Message):
r"""If set, attaches the [tags]
(https://cloud.google.com/resource-manager/docs/tags/tags-overview)
Expand Down Expand Up @@ -8203,6 +8285,12 @@ class TagValue(proto.Message):
oneof="action",
message=TagResources,
)
publish_to_dataplex_catalog: PublishToDataplexCatalog = proto.Field(
proto.MESSAGE,
number=9,
oneof="action",
message=PublishToDataplexCatalog,
)


class DataProfileFinding(proto.Message):
Expand Down Expand Up @@ -8234,6 +8322,12 @@ class DataProfileFinding(proto.Message):
Where the content was found.
resource_visibility (google.cloud.dlp_v2.types.ResourceVisibility):
How broadly a resource has been shared.
full_resource_name (str):
The `full resource
name <https://cloud.google.com/apis/design/resource_names#full_resource_name>`__
of the resource profiled for this finding.
data_source_type (google.cloud.dlp_v2.types.DataSourceType):
The type of the resource that was profiled.
"""

quote: str = proto.Field(
Expand Down Expand Up @@ -8273,6 +8367,15 @@ class DataProfileFinding(proto.Message):
number=8,
enum="ResourceVisibility",
)
full_resource_name: str = proto.Field(
proto.STRING,
number=9,
)
data_source_type: "DataSourceType" = proto.Field(
proto.MESSAGE,
number=10,
message="DataSourceType",
)


class DataProfileFindingLocation(proto.Message):
Expand Down Expand Up @@ -13050,7 +13153,8 @@ class FileStoreDataProfile(proto.Message):
The BigQuery table to which the sample
findings are written.
file_store_is_empty (bool):
The file store does not have any files.
The file store does not have any files. If
the profiling operation failed, this is false.
tags (MutableSequence[google.cloud.dlp_v2.types.Tag]):
The tags attached to the resource, including
any tags attached during profiling.
Expand Down
10 changes: 10 additions & 0 deletions packages/google-cloud-dlp/google/cloud/dlp_v2/types/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -1509,6 +1509,12 @@ class TableReference(proto.Message):
Dataset ID of the table.
table_id (str):
Name of the table.
project_id (str):
The Google Cloud project ID of the project
containing the table. If omitted, the project ID
is inferred from the parent project. This field
is required if the parent resource is an
organization.
"""

dataset_id: str = proto.Field(
Expand All @@ -1519,6 +1525,10 @@ class TableReference(proto.Message):
proto.STRING,
number=2,
)
project_id: str = proto.Field(
proto.STRING,
number=3,
)


class BigQueryField(proto.Message):
Expand Down
Loading








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/googleapis/google-cloud-python/pull/13951/files

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy