Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read PNG and extract data #6

Open
ernestguevarra opened this issue Apr 16, 2024 · 0 comments
Open

read PNG and extract data #6

ernestguevarra opened this issue Apr 16, 2024 · 0 comments

Comments

@ernestguevarra
Copy link
Member

something like this:

library(magick)
#> Linking to ImageMagick 7.1.0.31
#> Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp, x11
#> Disabled features: fftw, ghostscript
#> Using 12 threads
library(tesseract)
input <- image_read("https://i.stack.imgur.com/JxGHc.png") %>% 
  # preprocess image to make it easier to ocr
  image_convert(type = 'Grayscale') %>% 
  image_deskew() %>% 
  image_resize("2000x") %>% 
  ocr()

df <- data.table::fread(text = input)
#> Warning in data.table::fread(text = input): Detected 11 column names but the
#> data has 12 columns (i.e. invalid file). Added 1 extra default column name for
#> the first column which is guessed to be row names or an index. Use setnames()
#> afterwards if this guess is not correct, or fix the file write command that
#> created the file to create a valid file.
df
#>     V1     info    tmax ACREAGE                               GLOBALID
#>  1:  1 PRISM_tm 30.3976  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#>  2:  2 PRISM_tm 26.0226  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#>  3:  3 PRISM_tm 27.1775  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#>  4:  4 PRISM_tm  24,164  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#>  5:  5 PRISM_tm  24.458  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#>  6:  6 PRISM_tm  26.118  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#>  7:  7 PRISM_tm  27.259  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#>  8:  8 PRISM_tm  30.105  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#>  9:  9 PRISM_tm  30.697  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 10: 10 PRISM_tm   32949  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 11: 11 PRISM_tm  32,966  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 12: 12 PRISM_tm  32.081  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 13: 13 PRISM_tm  29.847  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 14: 14 PRISM_tm  27.576  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 15: 15 PRISM_tm  24.671  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 16: 16 PRISM_tm  24.382  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 17: 17 PRISM_tm  24.382  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 18: 18 PRISM_tm  26.365  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 19: 19 PRISM_tm  29.246  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 20: 20 PRISM_tm  30.737  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 21: 21 PRISM_tm  31.658  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 22: 22 PRISM_tm  31.386  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 23: 23 PRISM_tm   32457  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 24: 24 PRISM_tm  32.093  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 25: 25 PRISM_tm  30.303  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 26: 26 PRISM_tm  26.231  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 27: 27 PRISM_tm  25.956  783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#>     V1     info    tmax ACREAGE                               GLOBALID
#>     datasource variable    datatype resolutior    Date year month
#>  1:      PRISM     tmax provisional      4kmM3 2021-10 2021    10
#>  2:      PRISM     tmax provisional      4kmM3 2021-11 2021    11
#>  3:      PRISM     tmax provisional      4kmM3 2021-12 2021    12
#>  4:      PRISM     tmax      stable      4kmM3 2005-01 2005     1
#>  5:      PRISM     tmax      stable      4kmM3 2005-02 2005     2
#>  6:      PRISM     tmax      stable      4kmM3 2005-03 2005     3
#>  7:      PRISM     tmax      stable      4kmM3 2005-04 2005     4
#>  8:      PRISM     tmax      stable      4kmM3 2005-05 2005     5
#>  9:      PRISM     tmax      stable      4kmM3 2005-06 2005     6
#> 10:      PRISM     tmax      stable      4kmM3 2005-07 2005     7
#> 11:      PRISM     tmax      stable      4kmM3 2005-08 2005     8
#> 12:      PRISM     tmax      stable      4kmM3 2005-09 2005     9
#> 13:      PRISM     tmax      stable      4kmM3 2005-10 2005    10
#> 14:      PRISM     tmax      stable      4kmM3 2005-11 2005    11
#> 15:      PRISM     tmax      stable      4kmM3 2005-12 2005    12
#> 16:      PRISM     tmax      stable      4kmM3 2006-01 2006     1
#> 17:      PRISM     tmax      stable      4kmM3 2006-02 2006     2
#> 18:      PRISM     tmax      stable      4kmM3 2006-03 2006     3
#> 19:      PRISM     tmax      stable      4kmM3 2006-04 2006     4
#> 20:      PRISM     tmax      stable      4kmM3 2006-05 2006     5
#> 21:      PRISM     tmax      stable      4kmM3 2006-06 2006     6
#> 22:      PRISM     tmax      stable      4kmM3 2006-07 2006     7
#> 23:      PRISM     tmax      stable      4kmM3 2006-08 2006     8
#> 24:      PRISM     tmax      stable      4kmM3 2006-09 2006     9
#> 25:      PRISM     tmax      stable      4kmM3 2006-10 2006    10
#> 26:      PRISM     tmax      stable      4kmM3 2006-11 2006    11
#> 27:      PRISM     tmax      stable      4kmM3 2006-12 2006    12
#>     datasource variable    datatype resolutior    Date year month

from https://stackoverflow.com/questions/73238598/r-extract-text-from-image-and-export-it-as-a-csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant