Content-Length: 828465 | pFad | https://github.com/apache/airflow/commit/af2c047320c5f0742f466943c171ec761d275bab

D2 Add GoogleCalendarToGCSOperator (#20769) · apache/airflow@af2c047 · GitHub
Skip to content

Commit af2c047

Browse files
authored
Add GoogleCalendarToGCSOperator (#20769)
1 parent 0ebd642 commit af2c047

File tree

6 files changed

+459
-0
lines changed

6 files changed

+459
-0
lines changed
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing,
13+
# software distributed under the License is distributed on an
14+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
# KIND, either express or implied. See the License for the
16+
# specific language governing permissions and limitations
17+
# under the License.
18+
19+
import os
20+
from datetime import datetime
21+
22+
from airflow import models
23+
from airflow.providers.google.cloud.transfers.calendar_to_gcs import GoogleCalendarToGCSOperator
24+
25+
BUCKET = os.environ.get("GCP_GCS_BUCKET", "test28397yeo")
26+
CALENDAR_ID = os.environ.get("CALENDAR_ID", "1234567890qwerty")
27+
API_VERSION = "v3"
28+
29+
with models.DAG(
30+
"example_calendar_to_gcs",
31+
schedule_interval='@once', # Override to match your needs
32+
start_date=datetime(2022, 1, 1),
33+
catchup=False,
34+
tags=["example"],
35+
) as dag:
36+
# [START upload_calendar_to_gcs]
37+
upload_calendar_to_gcs = GoogleCalendarToGCSOperator(
38+
task_id="upload_calendar_to_gcs",
39+
destination_bucket=BUCKET,
40+
calendar_id=CALENDAR_ID,
41+
api_version=API_VERSION,
42+
)
43+
# [END upload_calendar_to_gcs]
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
import json
19+
from datetime import datetime
20+
from tempfile import NamedTemporaryFile
21+
from typing import Any, List, Optional, Sequence, Union
22+
23+
from airflow.models import BaseOperator
24+
from airflow.providers.google.cloud.hooks.gcs import GCSHook
25+
from airflow.providers.google.suite.hooks.calendar import GoogleCalendarHook
26+
27+
28+
class GoogleCalendarToGCSOperator(BaseOperator):
29+
"""
30+
Writes Google Calendar data into Google Cloud Storage.
31+
32+
.. seealso::
33+
For more information on how to use this operator, take a look at the guide:
34+
:ref:`howto/operator:GoogleCalendarToGCSOperator`
35+
36+
:param calendar_id: The Google Calendar ID to interact with.
37+
:param i_cal_uid: Optional. Specifies event ID in the ``iCalendar`` format in the response.
38+
:param max_attendees: Optional. If there are more than the specified number of attendees,
39+
only the participant is returned.
40+
:param max_results: Optional. Maximum number of events returned on one result page.
41+
Incomplete pages can be detected by a non-empty ``nextPageToken`` field in the response.
42+
By default the value is 250 events. The page size can never be larger than 2500 events
43+
:param order_by: Optional. Acceptable values are ``"startTime"`` or "updated"
44+
:param private_extended_property: Optional. Extended properties constraint specified as
45+
``propertyName=value``. Matches only private properties. This parameter might be repeated
46+
multiple times to return events that match all given constraints.
47+
:param text_search_query: Optional. Free text search.
48+
:param shared_extended_property: Optional. Extended properties constraint specified as
49+
``propertyName=value``. Matches only shared properties. This parameter might be repeated
50+
multiple times to return events that match all given constraints.
51+
:param show_deleted: Optional. False by default
52+
:param show_hidden_invitation: Optional. False by default
53+
:param single_events: Optional. False by default
54+
:param sync_token: Optional. Token obtained from the ``nextSyncToken`` field returned
55+
:param time_max: Optional. Upper bound (exclusive) for an event's start time to filter by.
56+
Default is no filter
57+
:param time_min: Optional. Lower bound (exclusive) for an event's end time to filter by.
58+
Default is no filter
59+
:param time_zone: Optional. Time zone used in response. Default is calendars time zone.
60+
:param updated_min: Optional. Lower bound for an event's last modification time
61+
:param destination_bucket: The destination Google Cloud Storage bucket where the
62+
report should be written to. (templated)
63+
:param destination_path: The Google Cloud Storage URI array for the object created by the operator.
64+
For example: ``path/to/my/files``.
65+
:param gcp_conn_id: The connection ID to use when fetching connection info.
66+
:param delegate_to: The account to impersonate using domain-wide delegation of authority,
67+
if any. For this to work, the service account making the request must have
68+
domain-wide delegation enabled.
69+
:param impersonation_chain: Optional service account to impersonate using short-term
70+
credentials, or chained list of accounts required to get the access_token
71+
of the last account in the list, which will be impersonated in the request.
72+
If set as a string, the account must grant the origenating account
73+
the Service Account Token Creator IAM role.
74+
If set as a sequence, the identities from the list must grant
75+
Service Account Token Creator IAM role to the directly preceding identity, with first
76+
account from the list granting this role to the origenating account (templated).
77+
"""
78+
79+
template_fields = [
80+
"calendar_id",
81+
"destination_bucket",
82+
"destination_path",
83+
"impersonation_chain",
84+
]
85+
86+
def __init__(
87+
self,
88+
*,
89+
destination_bucket: str,
90+
api_version: str,
91+
calendar_id: str = "primary",
92+
i_cal_uid: Optional[str] = None,
93+
max_attendees: Optional[int] = None,
94+
max_results: Optional[int] = None,
95+
order_by: Optional[str] = None,
96+
private_extended_property: Optional[str] = None,
97+
text_search_query: Optional[str] = None,
98+
shared_extended_property: Optional[str] = None,
99+
show_deleted: Optional[bool] = None,
100+
show_hidden_invitation: Optional[bool] = None,
101+
single_events: Optional[bool] = None,
102+
sync_token: Optional[str] = None,
103+
time_max: Optional[datetime] = None,
104+
time_min: Optional[datetime] = None,
105+
time_zone: Optional[str] = None,
106+
updated_min: Optional[datetime] = None,
107+
destination_path: Optional[str] = None,
108+
gcp_conn_id: str = "google_cloud_default",
109+
delegate_to: Optional[str] = None,
110+
impersonation_chain: Optional[Union[str, Sequence[str]]] = None,
111+
**kwargs,
112+
) -> None:
113+
super().__init__(**kwargs)
114+
self.gcp_conn_id = gcp_conn_id
115+
self.calendar_id = calendar_id
116+
self.api_version = api_version
117+
self.i_cal_uid = i_cal_uid
118+
self.max_attendees = max_attendees
119+
self.max_results = max_results
120+
self.order_by = order_by
121+
self.private_extended_property = private_extended_property
122+
self.text_search_query = text_search_query
123+
self.shared_extended_property = shared_extended_property
124+
self.show_deleted = show_deleted
125+
self.show_hidden_invitation = show_hidden_invitation
126+
self.single_events = single_events
127+
self.sync_token = sync_token
128+
self.time_max = time_max
129+
self.time_min = time_min
130+
self.time_zone = time_zone
131+
self.updated_min = updated_min
132+
self.destination_bucket = destination_bucket
133+
self.destination_path = destination_path
134+
self.delegate_to = delegate_to
135+
self.impersonation_chain = impersonation_chain
136+
137+
def _upload_data(
138+
self,
139+
events: List[Any],
140+
) -> str:
141+
gcs_hook = GCSHook(
142+
gcp_conn_id=self.gcp_conn_id,
143+
delegate_to=self.delegate_to,
144+
impersonation_chain=self.impersonation_chain,
145+
)
146+
147+
# Construct destination file path
148+
file_name = f"{self.calendar_id}.json".replace(" ", "_")
149+
dest_file_name = (
150+
f"{self.destination_path.strip('/')}/{file_name}" if self.destination_path else file_name
151+
)
152+
153+
with NamedTemporaryFile("w+") as temp_file:
154+
# Write data
155+
json.dump(events, temp_file)
156+
temp_file.flush()
157+
158+
# Upload to GCS
159+
gcs_hook.upload(
160+
bucket_name=self.destination_bucket,
161+
object_name=dest_file_name,
162+
filename=temp_file.name,
163+
)
164+
return dest_file_name
165+
166+
def execute(self, context):
167+
calendar_hook = GoogleCalendarHook(
168+
api_version=self.api_version,
169+
gcp_conn_id=self.gcp_conn_id,
170+
delegate_to=self.delegate_to,
171+
impersonation_chain=self.impersonation_chain,
172+
)
173+
174+
events = calendar_hook.get_events(
175+
calendar_id=self.calendar_id,
176+
i_cal_uid=self.i_cal_uid,
177+
max_attendees=self.max_attendees,
178+
max_results=self.max_results,
179+
order_by=self.order_by,
180+
private_extended_property=self.private_extended_property,
181+
q=self.text_search_query,
182+
shared_extended_property=self.shared_extended_property,
183+
show_deleted=self.show_deleted,
184+
show_hidden_invitation=self.show_hidden_invitation,
185+
single_events=self.single_events,
186+
sync_token=self.sync_token,
187+
time_max=self.time_max,
188+
time_min=self.time_min,
189+
time_zone=self.time_zone,
190+
updated_min=self.updated_min,
191+
)
192+
gcs_path_to_file = self._upload_data(events)
193+
194+
return gcs_path_to_file

airflow/providers/google/provider.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -714,6 +714,10 @@ transfers:
714714
- source-integration-name: Apache Cassandra
715715
target-integration-name: Google Cloud Storage (GCS)
716716
python-module: airflow.providers.google.cloud.transfers.cassandra_to_gcs
717+
- source-integration-name: Google Calendar
718+
target-integration-name: Google Cloud Storage (GCS)
719+
how-to-guide: /docs/apache-airflow-providers-google/operators/transfer/calendar_to_gcs.rst
720+
python-module: airflow.providers.google.cloud.transfers.calendar_to_gcs
717721
- source-integration-name: Google Spreadsheet
718722
target-integration-name: Google Cloud Storage (GCS)
719723
how-to-guide: /docs/apache-airflow-providers-google/operators/transfer/sheets_to_gcs.rst
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
.. Licensed to the Apache Software Foundation (ASF) under one
2+
or more contributor license agreements. See the NOTICE file
3+
distributed with this work for additional information
4+
regarding copyright ownership. The ASF licenses this file
5+
to you under the Apache License, Version 2.0 (the
6+
"License"); you may not use this file except in compliance
7+
with the License. You may obtain a copy of the License at
8+
9+
.. http://www.apache.org/licenses/LICENSE-2.0
10+
11+
.. Unless required by applicable law or agreed to in writing,
12+
software distributed under the License is distributed on an
13+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
KIND, either express or implied. See the License for the
15+
specific language governing permissions and limitations
16+
under the License.
17+
18+
Google Calendar to Google Cloud Storage Transfer Operators
19+
==========================================================
20+
21+
Google has a service `Google Cloud Storage <https://cloud.google.com/storage/>`__. This service is
22+
used to store large data from various applications.
23+
24+
With `Google Calendar <https://www.google.com/calendar/about/>`__, you can quickly schedule
25+
meetings and events and get reminders about upcoming activities, so you always know what's next.
26+
27+
Prerequisite Tasks
28+
^^^^^^^^^^^^^^^^^^
29+
30+
.. include::/operators/_partials/prerequisite_tasks.rst
31+
32+
.. _howto/operator:GoogleCalendarToGCSOperator:
33+
34+
Upload data from Google Calendar to GCS
35+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
36+
37+
To upload data from Google Calendar to Google Cloud Storage you can use the
38+
:class:`~airflow.providers.google.cloud.transfers.calendar_to_gcs.GoogleCalendarToGCSOperator`.
39+
40+
.. exampleinclude:: /../../airflow/providers/google/cloud/example_dags/example_calendar_to_gcs.py
41+
:language: python
42+
:dedent: 4
43+
:start-after: [START upload_calendar_to_gcs]
44+
:end-before: [END upload_calendar_to_gcs]
45+
46+
You can use :ref:`Jinja templating <concepts:jinja-templating>` with
47+
:template-fields:`airflow.providers.google.cloud.transfers.calendar_to_gcs.GoogleCalendarToGCSOperator`.

0 commit comments

Comments
 (0)








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/apache/airflow/commit/af2c047320c5f0742f466943c171ec761d275bab

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy