Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function on total size of Delta table #2

Open
MrPowers opened this issue Mar 7, 2023 · 1 comment
Open

Add function on total size of Delta table #2

MrPowers opened this issue Mar 7, 2023 · 1 comment
Labels
good first issue Good for newcomers

Comments

@MrPowers
Copy link
Collaborator

MrPowers commented Mar 7, 2023

This should return the number of bytes in the Delta table

@MrPowers MrPowers added the good first issue Good for newcomers label Mar 7, 2023
@puneetsharma04
Copy link

Hello @MrPowers : I tried developing the code in order to fulfil this requirement.
Could you please check the below code and let me know if that is the thing that you are looking for.

from pyspark.sql.functions import sum
from delta.tables import DeltaTable

def get_delta_table_size(path):
    delta_table = DeltaTable.forPath(spark, path)
    size_in_bytes = delta_table.history().select(sum('size')).collect()[0][0]
    return size_in_bytes
    
table_size = get_delta_table_size('/path/to/my/delta/table')
print(f"The size of the Delta table is {table_size} bytes.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants