Content-Length: 647992 | pFad | http://github.com/googleapis/java-bigquery/pull/3747

E9 feat(bigquery): Integrate Otel in client lib by whuffman36 · Pull Request #3747 · googleapis/java-bigquery · GitHub
Skip to content

feat(bigquery): Integrate Otel in client lib #3747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

feat(bigquery): Integrate Otel in client lib #3747

wants to merge 16 commits into from

Conversation

whuffman36
Copy link
Contributor

@whuffman36 whuffman36 commented Apr 8, 2025

Adds OpenTelemetry tracing into the BigQueryImpl class, enabling in the following methods:

  • (Dataset, Table, Job, Routine) Create
  • (Dataset, Table, Job, Routine, Model, IamPolicy) Get
  • (Dataset, Table, Job, Routine, Model, IamPolicy) Update
  • (Dataset, Table, Routine, Model) Delete
  • List (Datasets, Tables, Jobs, Routines, Models)
  • Job Cancel
  • insertAll
  • testIamMethods
  • queryRpc
  • getQueryResults

This is one PR of several to fully integrate OTel into the client library, broken up into chunks to make review easier.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/java-bigquery API. labels Apr 8, 2025
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: m Pull request size is medium. labels Apr 29, 2025
@whuffman36 whuffman36 marked this pull request as ready for review April 29, 2025 19:24
@whuffman36 whuffman36 requested review from a team as code owners April 29, 2025 19:24
@whuffman36 whuffman36 requested review from suzmue, shollyman and PhongChuong and removed request for suzmue April 29, 2025 19:24
Copy link
Contributor

@PhongChuong PhongChuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case you are not aware. There are 2 additional locations where we perform network calls in TableDataWriteChannel.java and Job.java.

@whuffman36
Copy link
Contributor Author

In the case you are not aware. There are 2 additional locations where we perform network calls in TableDataWriteChannel.java and Job.java.

Yes, I wanted to keep this PR scoped to just the BigQueryImpl file (though with the refactor that has changed). I was planning to include the calls in TableDataWriteChannel.java and Job.java in a separate PR with a few other methods, such as runWithRetries to make it easier to review and reason about.

Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest worry with this is the attribute structure here. This PR appears to define every field of every of every API as it's own attribute, which means we're more likely to introduce conflicts for users trying to filter based on attributes downstream.

This leads me to two questions:

  1. should we be putting the full contents of the responses into attributes, vs a more constrained approach (critical fields like identifiers, status, etc)?
  2. if we do want to capture the API response, should it just be stringified as a single attribute?

@whuffman36
Copy link
Contributor Author

My biggest worry with this is the attribute structure here. This PR appears to define every field of every of every API as it's own attribute, which means we're more likely to introduce conflicts for users trying to filter based on attributes downstream.

This leads me to two questions:

  1. should we be putting the full contents of the responses into attributes, vs a more constrained approach (critical fields like identifiers, status, etc)?
  2. if we do want to capture the API response, should it just be stringified as a single attribute?
  1. I'm of the opinion that since tracing is opt-in and meant to make debugging easier, the more visibility into the data the better. It's better to have visibility into the full contents and not need it than it is to want that visibility and not have it. What do you think?
  2. I'm not convinced we should be capturing the response, as it essentially just passed to the user. The user can decide to capture or log the response once received outside of the library code. This design was to capture just the inputs to the API calls. That being said, I think stringifying a response as a single attribute makes the attribute harder to parse and read when debugging, which can defeat the purpose of creating a better debug system.

@PhongChuong
Copy link
Contributor

My 2 cents regarding attributes:
Recall that we have 2 layers for the span. The upper level interface (BigQueryImpl.java, Job.java, etc.) and the HttpBigQueryRpc level. I believe we should capture the request and response (not in this PR). Specifically, it should be captured as a request/response* attribute at the HttpBigQueryRpc level. For the interface level (this PR), we should capture any metadata information that would help with debugging any issue that may arise and we should capture it in a structured manner so it is visible. Specifically, attributes in the options are great as they are rarely captured elsewhere. Meanwhile, we can be a little more selective with datatype attributes (TableId vs all the fields in TableInfo). And if needed, then the request attribute can be used to deep dive.

On a side note, I think we should also clearly state that "Span names and attributes are subject to change without notice" to allow us the flexibility to change the structure as needed.

  • For certain responses such as table row data, we probably won't want to capture those values.

@whuffman36
Copy link
Contributor Author

I've cleaned up the attributes that we are collecting by removing some fields that are not very useful in debugging. I also added in attributes for language and "db.name" to make the tracing more consistent with python. I've updated the span naming pattern to be more descriptive.

I think it's a good idea to include "Span names and attributes are subject to change without notice", but that seems like it would better to put in the docs rather than the code. What do you guys think?

PhongChuong
PhongChuong previously approved these changes May 12, 2025
Copy link
Contributor

@PhongChuong PhongChuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes.
@shollyman , can you take a look at the attributes to see if we missed anything?

Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I continue to be wary of replicating many of the API response fields in the span attributes, favoring the "less is more" approach, but I can be convinced otherwise.

@whuffman36
Copy link
Contributor Author

I've gone through and removed most attributes that we were collecting before, keeping mainly just the OP identifiers and time information such as creationTime, lastModifiedTime, etc. Let me know if you think we should shave off any more

Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some more reading, two other considerations for these attributes (which may require a lot of callsite touches):

  • Should the attributes be namespaced so it's clear they're fields from the BQ API? Right now they're just single term attributes and could conflict with other telemetry.

  • I believe otel attribute conventions are snake case rather than camel, so we may want to revisit the multi word attributes.

.getOpenTelemetryTracer()
.spanBuilder("com.google.cloud.bigquery.BigQuery.createJob")
.setAllAttributes(jobInfo.getJobId().getOtelAttributes())
.setAttribute("status", getFieldAsString(jobInfo.getStatus()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status is a submessage. Should we just report status fields individually? status.state (pending/running/done), and error presence in the job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, though I've decided to move those attributes to the Job class instead since during the create() function, job.status() is not yet populated. The field would always be set to null and thus be practically useless for debugging.

getOptions()
.getOpenTelemetryTracer()
.spanBuilder("com.google.cloud.bigquery.BigQuery.listDatasets")
.setAttribute("projectId", projectId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: worth pulling the current page token into an attribute alongside the project for the list RPCs? It'll be visible in the HTTP request below it as part of the URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the page token is supplied as a DatasetListOption, it will already be captured in an attribute here. I agree that adding the page token as an attribute is a good idea, though I think it would be better suited to the http layer rather than the interface.

Attributes attributes = Attributes.builder().build();
for (Field field : this.getFields()) {
attributes =
attributes.toBuilder().put("Field: " + field.getName(), field.toString()).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want BQ schema in the otel attributes at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is likely overkill, I went ahead and removed the schema from the attributes.

@whuffman36
Copy link
Contributor Author

I renamed all the attributes to be snake case and gave them namespace prefixes to remove any overloading of attributes. I went with the namespace "bq" and created further namespaces depending on the objects, such as "bq.dataset", "bq.table", "bq.query", etc.

Let me know if you think a different namespace style would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/java-bigquery API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/googleapis/java-bigquery/pull/3747

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy