Description
Thanks for stopping by to let us know something could be better!
Originally filed as piskvorky/smart_open#858, but I think they're right on reflection that this should be regarded as a python-storage
bug.
Environment details
- OS type and version:
- Python version:
Python 3.11.11
- pip version:
pip 23.3.1
google-cloud-storage
version:
Name: google-cloud-storage
Version: 2.18.2
Summary: Google Cloud Storage API client library
Home-page: https://github.com/googleapis/python-storage
Author: Google LLC
Author-email: googleapis-packages@google.com
License: Apache 2.0
Location: /root/.pyenv/versions/3.11.11/lib/python3.11/site-packages
Requires: google-api-core, google-auth, google-cloud-core, google-crc32c, google-resumable-media, requests
Required-by: gcsfs, google-cloud-aiplatform
---
Name: google-resumable-media
Version: 2.7.2
Summary: Utilities for Google Media Downloads and Resumable Uploads
Home-page: https://github.com/googleapis/google-resumable-media-python
Author: Google Cloud Platform
Author-email: googleapis-publisher@google.com
License: Apache 2.0
Location: /root/.pyenv/versions/3.11.11/lib/python3.11/site-packages
Requires: google-crc32c
Required-by: google-cloud-bigquery, google-cloud-storage
Steps to reproduce
- Given a (bucket, key) pair, where you have permissions to write into some parts of the bucket, but not the specific key
- Attempt to write to the key using
python-storage
- Attempt to
close()
the write file handle multiple times - The first attempt with fail with a
InvalidResponse
and a 403 error, as expected - The second and future attempts will fail with a confusing
ValueError
arising from an invariant check inside the library.
Note that if you don't have permissions anywhere on the bucket, then the problem doesn't occur, and close
consistently raises InvalidResponse
.
Code example
Example snippet that demonstrates the problem:
client = storage.Client()
blob = client.get_bucket(bucket).blob(key)
fh = blob.open("wb")
fh.write(b"hello\n")
for i in range(3):
try:
print(f"before {i=} {fh.closed=}")
fh.close()
print(f"attempt={i} success!")
except Exception as ex:
print(f"attempt={i} failed with: {type(ex)}: {ex}")
Sample output
- From a bucket I control, to a key I lack permission on:
before i=0 fh.closed=False
attempt=0 failed with: <class 'google.resumable_media.common.InvalidResponse'>: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PERMANENT_REDIRECT: 308>)
before i=1 fh.closed=False
attempt=1 failed with: <class 'ValueError'>: Upload is in an invalid state. To recover call `recover()`.
before i=2 fh.closed=False
attempt=2 failed with: <class 'ValueError'>: Upload is in an invalid state. To recover call `recover()`.
- From an attempt to write to a public bucket (
gs://gcp-public-data-arco-era5/co/eperm.dat
)
before i=0 fh.closed=False
attempt=0 failed with: <class 'google.resumable_media.common.InvalidResponse'>: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.CREATED: 201>)
before i=1 fh.closed=False
attempt=1 failed with: <class 'google.resumable_media.common.InvalidResponse'>: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.CREATED: 201>)
before i=2 fh.closed=False
attempt=2 failed with: <class 'google.resumable_media.common.InvalidResponse'>: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.CREATED: 201>)
Stack trace
re-pasting the trace from piskvorky/smart_open#858, in which smart_open
tries to close the file handle twice (once via a TextIOWrapper, and once directly)
---------------------------------------------------------------------------
InvalidResponse Traceback (most recent call last)
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/smart_open/utils.py:220, in FileLikeProxy.__exit__(self, *args, **kwargs)
219 try:
--> 220 return super().__exit__(*args, **kwargs)
221 finally:
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/smart_open/utils.py:207, in TextIOWrapper.__exit__(self, exc_type, exc_val, exc_tb)
206 if exc_type is None:
--> 207 self.close()
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py:437, in BlobWriter.close(self)
436 if not self._buffer.closed:
--> 437 self._upload_chunks_from_buffer(1)
438 self._buffer.close()
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py:417, in BlobWriter._upload_chunks_from_buffer(self, num_chunks)
416 for _ in range(num_chunks):
--> 417 upload.transmit_next_chunk(transport, **kwargs)
419 # Wipe the buffer of chunks uploaded, preserving any remaining data.
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py:515, in ResumableUpload.transmit_next_chunk(self, transport, timeout)
513 return result
--> 515 return _request_helpers.wait_and_retry(
516 retriable_request, self._get_status_code, self._retry_strategy
517 )
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/requests/_request_helpers.py:155, in wait_and_retry(func, get_status_code, retry_strategy)
154 try:
--> 155 response = func()
156 except _CONNECTION_ERROR_CLASSES as e:
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py:511, in ResumableUpload.transmit_next_chunk.<locals>.retriable_request()
507 result = transport.request(
508 method, url, data=payload, headers=headers, timeout=timeout
509 )
--> 511 self._process_resumable_response(result, len(payload))
513 return result
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/_upload.py:690, in ResumableUpload._process_resumable_response(self, response, bytes_sent)
670 """Process the response from an HTTP request.
671
672 This is everything that must be done after a request that doesn't
(...)
688 .. _sans-I/O: https://sans-io.readthedocs.io/
689 """
--> 690 status_code = _helpers.require_status_code(
691 response,
692 (http.client.OK, http.client.PERMANENT_REDIRECT),
693 self._get_status_code,
694 callback=self._make_invalid,
695 )
696 if status_code == http.client.OK:
697 # NOTE: We use the "local" information of ``bytes_sent`` to update
698 # ``bytes_uploaded``, but do not verify this against other
(...)
703 # * ``stream.tell()`` (relying on fact that ``initiate()``
704 # requires stream to be at the beginning)
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/_helpers.py:108, in require_status_code(response, status_codes, get_status_code, callback)
107 callback()
--> 108 raise common.InvalidResponse(
109 response,
110 "Request failed with status code",
111 status_code,
112 "Expected one of",
113 *status_codes
114 )
115 return status_code
InvalidResponse: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PERMANENT_REDIRECT: 308>)
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[6], line 1
----> 1 with smart_open.smart_open(path, 'w') as fh:
2 fh.write("hello\n")
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/smart_open/utils.py:222, in FileLikeProxy.__exit__(self, *args, **kwargs)
220 return super().__exit__(*args, **kwargs)
221 finally:
--> 222 self.__inner.__exit__(*args, **kwargs)
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py:437, in BlobWriter.close(self)
435 def close(self):
436 if not self._buffer.closed:
--> 437 self._upload_chunks_from_buffer(1)
438 self._buffer.close()
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py:417, in BlobWriter._upload_chunks_from_buffer(self, num_chunks)
415 # Upload chunks. The SlidingBuffer class will manage seek position.
416 for _ in range(num_chunks):
--> 417 upload.transmit_next_chunk(transport, **kwargs)
419 # Wipe the buffer of chunks uploaded, preserving any remaining data.
420 self._buffer.flush()
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py:503, in ResumableUpload.transmit_next_chunk(self, transport, timeout)
424 def transmit_next_chunk(
425 self,
426 transport,
(...)
430 ),
431 ):
432 """Transmit the next chunk of the resource to be uploaded.
433
434 If the current upload was initiated with ``stream_final=False``,
(...)
501 does not match or is not available.
502 """
--> 503 method, url, payload, headers = self._prepare_request()
505 # Wrap the request business logic in a function to be retried.
506 def retriable_request():
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/_upload.py:613, in ResumableUpload._prepare_request(self)
611 raise ValueError("Upload has finished.")
612 if self.invalid:
--> 613 raise ValueError(
614 "Upload is in an invalid state. To recover call `recover()`."
615 )
616 if self.resumable_url is None:
617 raise ValueError(
618 "This upload has not been initiated. Please call "
619 "initiate() before beginning to transmit chunks."
620 )
ValueError: Upload is in an invalid state. To recover call `recover()`.