Skip to content

Ability to skip JSON parsing for DynamoDB get_item, query, and the like #3238

Closed as not planned
@pechersky

Description

Describe the feature

When doing result = session.client("dynamodb").query(**kwargs), have an option to skip json-parsing of the output to speed up collection of results.

Use Case

We are loading on the order of thousands of items from DynamoDB using aioboto3/aiobotocore. Each item is ~16KB, and we combine/filter items and forward them to a client. We noticed that in a loop retrieving these items, AioJSONParser.parse is the largest contributor of the time (as measured by pyinstrument).
botocore took 50 s for 1000 dynamodb queries (20% inside JSONParser.parse)
image

Proposed Solution

One possibility is to have a keyword argument for client and/or resource calls that turns off parsing that allows the user to receive the raw body of the response.

Other Information

botocore:

import json
from botocore.session import get_session

def main():
    session = get_session()
    tasks = []
    query_kwarg_list = json.load(open("kwargs.json"))
    assert len(query_kwarg_list) == 1000
    client = session.create_client("dynamodb")
    for kwargs in query_kwarg_list:
        tasks.append(client.query(**kwargs))
    result = tasks
    assert len(result) == 1000


if __name__ == "__main__":
    main()

aiobotocore:

import asyncio
import json
from aiobotocore.session import get_session

async def main():
    session = get_session()
    tasks = []
    query_kwarg_list = json.load(open("kwargs.json"))
    assert len(query_kwarg_list) == 1000
    async with session.create_client("dynamodb") as client:
        for kwargs in query_kwarg_list:
            tasks.append(client.query(**kwargs))
        result = await asyncio.gather(*tasks)
    assert len(result) == 1000


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Cf: aio-libs/aiobotocore#1132

Acknowledgements

  • I may be able to implement this feature request
    This feature might incur a breaking change

SDK version used

1.34.131 and higher

Environment details (OS name and version, etc.)

python 3.10, 3.11; Ubuntu 20.04

Activity

pechersky

pechersky commented on Aug 14, 2024

@pechersky
Author

The developer of aiobotocore suggested:

May I advise to raise an issue with botocore to improve performance of JSON coding, e.g. by supporting alternative JSON libraries such as orjson? That would benefit users of both botocore and aiobotocore.

pechersky

pechersky commented on Aug 14, 2024

@pechersky
Author

Further inspection indicates that it isn't even json parsing, but rather, the shape parsing:
image
image

pechersky

pechersky commented on Aug 14, 2024

@pechersky
Author

Likely relevant issue: boto/boto3#2928

self-assigned this
on Aug 21, 2024
tim-finnigan

tim-finnigan commented on Aug 21, 2024

@tim-finnigan
Contributor

Thanks for reaching out. I brought this issue up for discussion with the team, and the consensus was that there are no plans to change the current behavior. SDKs like Boto3 rely on the JSON parsing — you would need to call service APIs directly in order to get the raw responses. We can continue tracking the issue in boto/boto3#2928 for now to get more feedback and explore potential optimizations.

github-actions

github-actions commented on Aug 21, 2024

@github-actions

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

feature-requestThis issue requests a feature.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

    Participants

    @pechersky@tim-finnigan

    Issue actions

      Ability to skip JSON parsing for DynamoDB get_item, query, and the like · Issue #3238 · boto/botocore