diff --git a/01-BNHM-Data.ipynb b/01-BNHM-Data.ipynb index b772a1b..2fe7652 100644 --- a/01-BNHM-Data.ipynb +++ b/01-BNHM-Data.ipynb @@ -22,7 +22,7 @@ "import time\n", "import folium\n", "from datetime import datetime\n", - "from shapely.geometry import Point\n", + "from shapely.geometry import Point, mapping\n", "from shapely.geometry.polygon import Polygon\n", "import matplotlib as mpl\n", "from matplotlib.collections import PatchCollection\n", @@ -58,11 +58,9 @@ "\n", "2 - [GBIF API](#gbif)\n", "\n", - "3 - [Cal-Adapt API](#adapt)\n", + "3 - [Comparing California Oak Species](#oak)\n", "\n", - "4 - [Comparing California Oak Species](#oak)\n", - "\n", - "5 - [Sagehen Field Station](#sagehen)" + "4 - [Cal-Adapt API](#adapt)" ] }, { @@ -358,7 +356,7 @@ "\n", "For this example, we are going to ask: **where have [*Argia agrioides*](https://www.google.com/search?q=Argia+agrioides&rlz=1C1CHBF_enUS734US734&source=lnms&tbm=isch&sa=X&ved=0ahUKEwji9t29kNTWAhVBymMKHWJ-ANcQ_AUICygC&biw=1536&bih=694) (the California Dancer dragonfly) been documented? Are there records at any of our field stations?**\n", "\n", - "The code to ask the API has already been written for us! This is often the case with programming, someone has already written the code, so we don't have to. We'll just set up the `GBIFRequest` object and assign that to the variable `req`:" + "The code to ask the API has already been written for us! This is often the case with programming, someone has already written the code, so we don't have to. We'll just set up the `GBIFRequest` object and assign that to the variable `req`, short for \"request\":" ] }, { @@ -399,6 +397,7 @@ "metadata": {}, "outputs": [], "source": [ + "params = {'scientificName': 'Argia agrioides'} # setting our parameters (the specific species we want)\n", "pages = req.get_pages(params) # using those parameters to complete the request\n", "records = [rec for page in pages for rec in page['results'] if rec.get('decimalLatitude')] # sift out valid records\n", "records[:5] # print first 5 records" @@ -419,15 +418,15 @@ "metadata": {}, "outputs": [], "source": [ - "records_df = pd.read_json(json.dumps(records))\n", - "records_df.head()" + "records_df = pd.read_json(json.dumps(records)) # converts the JSON above to a dataframe\n", + "records_df.head() # prints the first five rows of the dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Since each column (or row) above can be thought of as a `list`, that means we can use list functions to interact with them! Recall that we can use the `len` function to get the number of elements in a `list`:" + "Since each column (or row) above can be thought of as a `list`, that means we can use list functions to interact with them! One such function is the `len` function to get the number of elements in a `list`:" ] }, { @@ -443,7 +442,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "So we have 99 columns and 177 rows! That's a lot of information. What variables do we have?" + "So we have 99 columns and 177 rows! That's a lot of information. What variables do we have in the columns?" ] }, { @@ -462,6 +461,15 @@ "We can use two methods from `pandas` to do a lot more. The `value_counts()` method will tabulate the frequency of the row value in a column, and the `plot.barh()` will plot us a horizontal bar chart:" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "records_df['country'].value_counts()" + ] + }, { "cell_type": "code", "execution_count": null, @@ -498,13 +506,28 @@ "records_df['collectionCode'].value_counts().plot.barh()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `groupby()` method allows us to count based one column based on another, and then color the bar chart differently depending on a variable of our choice:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "records_df.groupby([\"collectionCode\", \"basisOfRecord\"])['basisOfRecord'].count()" + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "# df.groupby() allows us to color the bar chart differently depending on a variable of our choice.\n", "records_df.groupby([\"collectionCode\", \"basisOfRecord\"])['basisOfRecord'].count().unstack().plot.barh(stacked=True)" ] }, @@ -531,7 +554,7 @@ "---\n", "\n", "
\n", - "**Exercise**: Edit the code below to search for a different species you're interested in, then use some of the graphing cells above to explore your data!\n", + "**Exercise**: Edit the code below to search for a different species you're interested in, then adapt some of the graphing cells above to explore your data!\n", "
" ] }, @@ -566,7 +589,7 @@ "\n", "### Mapping\n", "\n", - "Now we can map our latitude and longitude points. We'll want to color code by `collectionCode` so we can see which institution made the observation. We'll use a function that does this automatically, but it will randomly generate a color, so if you get a lot of a similar color maybe run the cell a couple times!" + "Now we can map our latitude and longitude points. We'll want to color code by `collectionCode` so we can see which collection made the observation or has the specimen. We'll use a function that does this automatically, but it will randomly generate a color, so if you get a lot of a similar color maybe run the cell a couple times!" ] }, { @@ -609,7 +632,7 @@ "source": [ "---\n", "\n", - "To get the boundraies for all the reserves, we will need to send a request to GeoJSON, which is a format for encoding a variety of geographic data structures. With this code, we can request GeoJSON for all reserves and plot ocurrences of the species. First we'll assign the API URL that has the data to a new variable `url`:" + "To get the boundries for all the reserves, we will need to send a request to get GeoJSON, which is a format for encoding a variety of geographic data structures. With this code, we can request GeoJSON for all reserves and plot ocurrences of the species. First we'll assign the API URL that has the data to a new variable `url`:" ] }, { @@ -634,7 +657,17 @@ "metadata": {}, "outputs": [], "source": [ - "reserves = requests.get(url, params={'page_size': 30}).json()" + "reserves = requests.get(url, params={'page_size': 30}).json()\n", + "reserves" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you look closely, this is just bounding boxes for the latitude and longitude of the reserves.\n", + "\n", + "There are some reserves that the EcoEngine didn't catch, we'll add the information for \"Blodgett\", \"Hopland\", and \"Sagehen\":" ] }, { @@ -643,13 +676,29 @@ "metadata": {}, "outputs": [], "source": [ - "reserve_polygons = []\n", - "\n", - "for r in reserves['features']:\n", - " name = r['properties']['name']\n", - " poly = sg.shape(r['geometry'])\n", - " reserve_polygons.append({\"id\": name,\n", - " \"geometry\": poly})" + "station_urls = {\n", + " 'Blodgett': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/BlodgettForestResearchStation.wkt',\n", + " 'Hopland': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/HoplandREC.wkt',\n", + " 'Sagehen': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/SagehenCreekFieldStation.wkt'\n", + "}\n", + "reserves['features'] += [{'type': \"Feature\", 'properties': {\"name\": name}, 'geometry': mapping(wkt.loads(requests.get(url).text))}\n", + " for name, url in station_urls.items()]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see all the station names by indexing the `name` value for the reserves:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[r['properties']['name'] for r in reserves['features']]" ] }, { @@ -720,9 +769,7 @@ "source": [ "---\n", "\n", - "# Part 3: Cal-Adapt API\n", - "\n", - "Now we will use the [Cal-Adapt](http://api.cal-adapt.org/api/) web API to work with time series raster data. It will request an entire time series for any geometry and return a Pandas `DataFrame` object." + "We can also find out which stations have how many *Argia Argrioides*. First we'll have to add a column to our `DataFrame` that makes points out of the latitude and longitude coordinates:" ] }, { @@ -731,17 +778,17 @@ "metadata": {}, "outputs": [], "source": [ - "req = CalAdaptRequest()\n", - "records_g = [dict(rec, geometry=sg.Point(rec['decimalLongitude'], rec['decimalLatitude']))\n", - " for rec in records]\n", - "ca_df = req.concat_features(records_g, 'gbifID')" + "def make_point(row):\n", + " return Point(row['decimalLongitude'], row['decimalLatitude'])\n", + "\n", + "records_df[\"point\"] = records_df.apply(lambda row: make_point (row),axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's look at the first five rows:" + "Now we can write a little function to check whether that point is in one of the stations, and if it is, we'll add that station in a new column called `station`:" ] }, { @@ -750,14 +797,32 @@ "metadata": {}, "outputs": [], "source": [ - "ca_df.head()" + "def in_station(reserves, row):\n", + " \n", + " reserve_polygons = []\n", + "\n", + " for r in reserves['features']:\n", + " name = r['properties']['name']\n", + " poly = sg.shape(r['geometry'])\n", + " reserve_polygons.append({\"id\": name,\n", + " \"geometry\": poly})\n", + " \n", + " sid = False\n", + " for r in reserve_polygons:\n", + " if r['geometry'].contains(row['point']):\n", + " sid = r['id']\n", + " sid = r['id']\n", + " if sid:\n", + " return sid\n", + " else:\n", + " return False" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "This looks like the time series data we want for each observation (the unique ID numbers as the columns). Plot predictions for the *Argia agrioides* in Fahrenheit:" + "Now apply this function to the `DataFrame`:" ] }, { @@ -766,22 +831,16 @@ "metadata": {}, "outputs": [], "source": [ - "# Make a line plot using the first 9 columns of df.\n", - "ca_df.iloc[:,:9].plot()\n", - "\n", - "# Use matplotlib to title your plot.\n", - "plt.title('Argia agrioides - %s' % req.slug)\n", - "\n", - "# Use matplotlib to add labels to the x and y axes of your plot.\n", - "plt.xlabel('Year', fontsize=18)\n", - "plt.ylabel('Degrees (Fahrenheit)', fontsize=16)" + "records_df[\"station\"] = records_df.apply(lambda row: in_station(reserves, row),axis=1)\n", + "in_stations_df = records_df[records_df[\"station\"] != False]\n", + "in_stations_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can map the projected means for occurrence locations." + "Let's see if this corresponds to what we observed on the map:" ] }, { @@ -790,22 +849,24 @@ "metadata": {}, "outputs": [], "source": [ - "# Use .mean() to compute the average of each column in your data frame,\n", - "# and store results in a Pandas series called tmax_means.\n", - "tmax_means = ca_df.mean(axis=0)\n", + "in_stations_df.groupby([\"species\", \"station\"])['station'].count().unstack().plot.barh(stacked=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", "\n", - "# Filter GBIF records where we have climate data.\n", - "records_rvals = [rec for rec in records_g if rec['gbifID'] in tmax_means.index]\n", - "coords = [(r['decimalLongitude'], r['decimalLatitude']) for r in records_rvals]\n", - "xs2, ys2 = zip(*coords)\n", + "# Part 3: Comparing California Oak species:\n", "\n", - "norm = mpl.colors.Normalize()\n", "\n", - "# Create an array of the sizes of points to be plotted in a scatterplot of occurrence locations\n", - "size = np.pi * (15 * norm(tmax_means)) ** 2\n", - "# Create scatter plot below.\n", - "plt.scatter(xs2, ys2, c=tmax_means, s=size, cmap='viridis', alpha=0.5)\n", - "plt.colorbar()" + "| ![quercus douglassi](https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Large_Blue_Oak.jpg/220px-Large_Blue_Oak.jpg) | ![quercus lobata](https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Valley_Oak_Mount_Diablo.jpg/220px-Valley_Oak_Mount_Diablo.jpg) | ![quercus durata](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Quercusduratadurata.jpg/220px-Quercusduratadurata.jpg) | ![quercus agrifolia](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d1/Quercus_agrifolia_foliage.jpg/220px-Quercus_agrifolia_foliage.jpg) |\n", + "|:---:|:---:|:---:|:---:|\n", + "| *Quercus douglassi* | *Quercus lobata* | *Quercus durata* | *Quercus agrifolia*|\n", + "\n", + "\n", + "Let's search for these different species of oak using our `GBIF` API and collect the observations:" ] }, { @@ -814,28 +875,24 @@ "metadata": {}, "outputs": [], "source": [ - "ca_df.iloc[:,:7].plot.box()\n", - "plt.title('Argia agrioides - %s' % req.slug)" + "species_records = []\n", + "\n", + "species = [\"Quercus douglassi\", \"Quercus lobata\", \"Quercus durata\", \"Quercus agrifolia\"]\n", + "\n", + "for s in species:\n", + " req = GBIFRequest() # creating a request to the API\n", + " params = {'scientificName': s} # setting our parameters (the specific species we want)\n", + " pages = req.get_pages(params) # using those parameters to complete the request\n", + " records = [rec for page in pages for rec in page['results'] if rec.get('decimalLatitude')] # sift out valid records\n", + " species_records.extend(records)\n", + " time.sleep(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Taking a look at minimum temperatures now, here are the temperature distributions by decade." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dfs = []\n", - "names = ('tasmin_year_CanESM2_rcp45', 'tasmin_year_CanESM2_rcp85')\n", - "for name in names:\n", - " req = CalAdaptRequest(name)\n", - " dfs.append(req.concat_features(records_g, 'gbifID'))" + "We can convert this JSON to a `DataFrame` again:" ] }, { @@ -844,18 +901,15 @@ "metadata": {}, "outputs": [], "source": [ - "for name, df in zip(names, dfs):\n", - " t = df.resample('10A').mean().transpose()\n", - " t.columns = t.columns.year\n", - " t.plot.box()\n", - " plt.title(name)" + "records_df = pd.read_json(json.dumps(species_records))\n", + "records_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Make plots for historical and projected temperatures at three station locations. Once again, Pandas gives us a lot of functionality in displaying our data beautifully. " + "How many records do we have now?" ] }, { @@ -864,27 +918,14 @@ "metadata": {}, "outputs": [], "source": [ - "station_urls = {\n", - " 'Blodgett': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/BlodgettForestResearchStation.wkt',\n", - " 'Hopland': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/HoplandREC.wkt',\n", - " 'Sagehen': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/SagehenCreekFieldStation.wkt'\n", - "}\n", - "stn_features = [{'id': name, 'geometry': wkt.loads(requests.get(url).text)}\n", - " for name, url in station_urls.items()]\n", - "# Reduce geometric complexity, we need smaller geometry serializations to circumvent \n", - "# URL length restrictions.\n", - "tol = .0001\n", - "for feat in stn_features:\n", - " feat['geometry'] = feat['geometry'].buffer(tol).simplify(tol)" + "len(records_df)" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "stn_features" + "Let's see how they're distributed among our different species queries:" ] }, { @@ -893,25 +934,14 @@ "metadata": {}, "outputs": [], "source": [ - "dfstns_lst = []\n", - "names = ('tasmax_year_CanESM2_historical', 'tasmax_year_CanESM2_rcp85')\n", - "# Iterate through each name and make requests using the CalAdapt API. We can also use Pandas to display this data efficiently.\n", - "for name in names:\n", - " req = CalAdaptRequest(name)\n", - " d = req.concat_features(stn_features)\n", - " dfstns_lst.append(d)\n", - "\n", - "dfstns = pd.concat(dfstns_lst)" + "records_df['scientificName'].value_counts().plot.barh()" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "# Use .describe() to display summary statistics of dfstns.\n", - "dfstns.describe()" + "We can group this again by `collectionCode`:" ] }, { @@ -920,15 +950,14 @@ "metadata": {}, "outputs": [], "source": [ - "# Use dfstns to create a line graph.\n", - "dfstns.plot()" + "records_df.groupby([\"scientificName\", \"collectionCode\"])['collectionCode'].count().unstack().plot.barh(stacked=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Here we'll print the decadal average and difference from 20 years prior." + "We can also map these like we did with the *Argia arioides* above:" ] }, { @@ -937,7 +966,8 @@ "metadata": {}, "outputs": [], "source": [ - "dfstns.resample('10A').mean().diff(periods=2).dropna()" + "color_dict, html_key = assign_colors(records_df, \"scientificName\")\n", + "display(HTML(html_key))" ] }, { @@ -946,24 +976,22 @@ "metadata": {}, "outputs": [], "source": [ - "dfstns.resample('10A').mean().diff().plot()" + "mapc = folium.Map([37.359276, -122.179626], zoom_start=5)\n", + "\n", + "points = folium.features.GeoJson(reserves)\n", + "mapc.add_child(points)\n", + "for r in records_df.iterrows():\n", + " lat = r[1]['decimalLatitude']\n", + " long = r[1]['decimalLongitude']\n", + " folium.Circle((lat, long), color=color_dict[r[1]['scientificName']]).add_to(mapc)\n", + "mapc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "---\n", - "\n", - "# Part 4: Comparing California Oak species:\n", - "\n", - "\n", - "| ![quercus douglassi](https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Large_Blue_Oak.jpg/220px-Large_Blue_Oak.jpg) | ![quercus lobata](https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Valley_Oak_Mount_Diablo.jpg/220px-Valley_Oak_Mount_Diablo.jpg) | ![quercus durata](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Quercusduratadurata.jpg/220px-Quercusduratadurata.jpg) | ![quercus agrifolia](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d1/Quercus_agrifolia_foliage.jpg/220px-Quercus_agrifolia_foliage.jpg) |\n", - "|:---:|:---:|:---:|:---:|\n", - "| *Quercus douglassi* | *Quercus lobata* | *Quercus durata* | *Quercus agrifolia*|\n", - "\n", - "\n", - "Let's search for these different species of oak using our `GBIF` API and collect the observations:" + "We can use the same code we wrote earlier to see which reserves have which species of oak:" ] }, { @@ -972,24 +1000,17 @@ "metadata": {}, "outputs": [], "source": [ - "species_records = []\n", - "\n", - "species = [\"Quercus douglassi\", \"Quercus lobata\", \"Quercus durata\", \"Quercus agrifolia\"]\n", - "\n", - "for s in species:\n", - " req = GBIFRequest() # creating a request to the API\n", - " params = {'scientificName': s} # setting our parameters (the specific species we want)\n", - " pages = req.get_pages(params) # using those parameters to complete the request\n", - " records = [rec for page in pages for rec in page['results'] if rec.get('decimalLatitude')] # sift out valid records\n", - " species_records.extend(records)\n", - " time.sleep(3)" + "records_df[\"point\"] = records_df.apply(lambda row: make_point (row),axis=1)\n", + "records_df[\"station\"] = records_df.apply(lambda row: in_station(reserves, row),axis=1)\n", + "in_stations_df = records_df[records_df[\"station\"] != False]\n", + "in_stations_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can convert this JSON to a `DataFrame` again:" + "Now we can make a bar graph like we did before, grouping by `species` and stacking the bar based on `station`:" ] }, { @@ -998,15 +1019,18 @@ "metadata": {}, "outputs": [], "source": [ - "records_df = pd.read_json(json.dumps(species_records))\n", - "records_df.head()" + "in_stations_df.groupby([\"species\", \"station\"])['station'].count().unstack().plot.barh(stacked=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "How many records do we have now?" + "---\n", + "\n", + "# Part 3: Cal-Adapt API\n", + "\n", + "Let's get back the data from *Argia agrioides* with the GBIF API:" ] }, { @@ -1015,14 +1039,18 @@ "metadata": {}, "outputs": [], "source": [ - "len(records_df)" + "req = GBIFRequest() # creating a request to the API\n", + "params = {'scientificName': 'Argia agrioides'} # setting our parameters (the specific species we want)\n", + "pages = req.get_pages(params) # using those parameters to complete the request\n", + "records = [rec for page in pages for rec in page['results'] if rec.get('decimalLatitude')] # sift out valid records\n", + "records[:5] # print first 5 records" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's see how they're distributed among our different species queries:" + "We'll make a `DataFrame` again for later use:" ] }, { @@ -1031,14 +1059,15 @@ "metadata": {}, "outputs": [], "source": [ - "records_df['scientificName'].value_counts().plot.barh()" + "records_df = pd.read_json(json.dumps(records))\n", + "records_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can group this again by `collectionCode`:" + "Now we will use the [Cal-Adapt](http://api.cal-adapt.org/api/) web API to work with time series raster data. It will request an entire time series for any geometry and return a Pandas `DataFrame` object for each record in all of our *Argia agrioides* records:" ] }, { @@ -1047,24 +1076,17 @@ "metadata": {}, "outputs": [], "source": [ - "records_df.groupby([\"scientificName\", \"collectionCode\"])['collectionCode'].count().unstack().plot.barh(stacked=True)" + "req = CalAdaptRequest()\n", + "records_g = [dict(rec, geometry=sg.Point(rec['decimalLongitude'], rec['decimalLatitude']))\n", + " for rec in records]\n", + "ca_df = req.concat_features(records_g, 'gbifID')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can also map these like we did with the *Argia arioides* above:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "color_dict, html_key = assign_colors(records_df, \"scientificName\")\n", - "display(HTML(html_key))" + "Let's look at the first five rows:" ] }, { @@ -1073,22 +1095,14 @@ "metadata": {}, "outputs": [], "source": [ - "mapc = folium.Map([37.359276, -122.179626], zoom_start=5)\n", - "\n", - "points = folium.features.GeoJson(reserves)\n", - "mapc.add_child(points)\n", - "for r in records_df.iterrows():\n", - " lat = r[1]['decimalLatitude']\n", - " long = r[1]['decimalLongitude']\n", - " folium.Circle((lat, long), color=color_dict[r[1]['scientificName']]).add_to(mapc)\n", - "mapc" + "ca_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can also find out which stations have which species. First we'll have to add a column to our `DataFrame` that makes points out of the latitude and longitude coordinates:" + "This looks like the time series data we want for each record (the unique ID numbers as the columns). Each record has the projected temperature in Fahrenheit for 170 years (every row!). We can plot predictions for a given record:" ] }, { @@ -1097,17 +1111,22 @@ "metadata": {}, "outputs": [], "source": [ - "def make_point(row):\n", - " return Point(row['decimalLongitude'], row['decimalLatitude'])\n", + "# Make a line plot using the first 9 columns of df.\n", + "ca_df.iloc[:,:9].plot()\n", "\n", - "records_df[\"point\"] = records_df.apply(lambda row: make_point (row),axis=1)" + "# Use matplotlib to title your plot.\n", + "plt.title('Argia agrioides - %s' % req.slug)\n", + "\n", + "# Use matplotlib to add labels to the x and y axes of your plot.\n", + "plt.xlabel('Year', fontsize=18)\n", + "plt.ylabel('Degrees (Fahrenheit)', fontsize=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now we can write a little function to check whether that point is in one of the stations, and if it is, we'll add that station in a new column called `station`:" + "It looks like temperature is increasing across the board wherever these observations are occuring. We can calculate the average temperature for each year across observations:" ] }, { @@ -1116,26 +1135,15 @@ "metadata": {}, "outputs": [], "source": [ - "def in_station(stations, row):\n", - " sid = False\n", - " for s in stations:\n", - " if s['geometry'].contains(row['point']):\n", - " sid = s['id']\n", - " if sid:\n", - " return sid\n", - " else:\n", - " return False\n", - "\n", - "records_df[\"station\"] = records_df.apply(lambda row: in_station(reserve_polygons + stn_features, row),axis=1)\n", - "in_stations_df = records_df[records_df[\"station\"] != False]\n", - "in_stations_df.head()" + "tmax_means = ca_df.mean(axis=1)\n", + "tmax_means" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now we can make a bar graph like we did before, grouping by `species` and stacking the bar based on `station`:" + "What's happening to the average temperature that *Argia agrioides* is going to experience in the coming year?" ] }, { @@ -1144,31 +1152,16 @@ "metadata": {}, "outputs": [], "source": [ - "in_stations_df.groupby([\"species\", \"station\"])['station'].count().unstack().plot.barh(stacked=True)" + "tmax_means.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "---\n", + "Is there a temperature at which the *Argia agrioides* cannot survive?\n", "\n", - "
\n", - "**Exercise**: Edit the code above and look at your own selection for species!\n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "# Part 5: Sagehen Creek Field Station\n", - "\n", - "![sagehen](http://gk12calbio.berkeley.edu/images/sagehen_view.jpg)\n", - "\n", - "What are the specimens that occur in Sagehen Creek FS?" + "What if we look specifically at the field stations and reserves? We can grab our same code that checked whether a record was within a station, and then map those `gbifID`s back to this temperature dataset:" ] }, { @@ -1177,18 +1170,17 @@ "metadata": {}, "outputs": [], "source": [ - "req = GBIFRequest() # creating a request to the API\n", - "params = {'institutionCode': 'scfs'} # setting our parameters (the specific species we want)\n", - "pages = req.get_pages(params) # using those parameters to complete t he request\n", - "records = [rec for page in pages for rec in page['results'] if rec.get('decimalLatitude')] # sift out valid records\n", - "records[:5] # print first 5 records" + "records_df[\"point\"] = records_df.apply(lambda row: make_point (row),axis=1)\n", + "records_df[\"station\"] = records_df.apply(lambda row: in_station(reserves, row),axis=1)\n", + "in_stations_df = records_df[records_df[\"station\"] != False]\n", + "in_stations_df[['gbifID', 'station']].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's make this a `DataFrame` again:" + "Recall the column headers of our `ca_df` are the `gbifID`:" ] }, { @@ -1197,15 +1189,14 @@ "metadata": {}, "outputs": [], "source": [ - "records_df = pd.read_json(json.dumps(records))\n", - "records_df.head()" + "ca_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "How many records did we get?" + "Now we subset the temperature dataset:" ] }, { @@ -1214,14 +1205,23 @@ "metadata": {}, "outputs": [], "source": [ - "len(records_df)" + "station_obs = [str(id) for id in list(in_stations_df['gbifID'])]\n", + "ca_df[station_obs]" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "We can plot the different classes:" + "ca_df[station_obs].plot()\n", + "# Use matplotlib to title your plot.\n", + "plt.title('Argia agrioides and temperatures in Santa Cruz Island')\n", + "\n", + "# Use matplotlib to add labels to the x and y axes of your plot.\n", + "plt.xlabel('Year', fontsize=18)\n", + "plt.ylabel('Degrees (Fahrenheit)', fontsize=16)" ] }, { @@ -1229,15 +1229,13 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "records_df['class'].value_counts().plot.barh()" - ] + "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "And species, but there are a lot, so we'll look at the most common 10:" + "We can then take all the records for which we have climate data and grab the coordinates:" ] }, { @@ -1246,14 +1244,16 @@ "metadata": {}, "outputs": [], "source": [ - "records_df['species'].value_counts()[:10].plot.barh()" + "# Filter GBIF records where we have climate data.\n", + "records_rvals = [rec for rec in records_g if rec['gbifID'] in tmax_means.index]\n", + "coords = [(r['decimalLongitude'], r['decimalLatitude']) for r in records_rvals]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "And genus:" + "We can make a heat map of these temperatures:" ] }, { @@ -1262,14 +1262,21 @@ "metadata": {}, "outputs": [], "source": [ - "records_df['genus'].value_counts()[:10].plot.barh()" + "xs2, ys2 = zip(*coords)\n", + "norm = mpl.colors.Normalize()\n", + "\n", + "# Create an array of the sizes of points to be plotted in a scatterplot of occurrence locations\n", + "size = np.pi * (15 * norm(tmax_means)) ** 2\n", + "# Create scatter plot below.\n", + "plt.scatter(xs2, ys2, c=tmax_means, s=size, cmap='viridis', alpha=0.5)\n", + "plt.colorbar()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "So what are all the unique species observed?" + "We can also make a box plot for each record over all the years:" ] }, { @@ -1278,24 +1285,15 @@ "metadata": {}, "outputs": [], "source": [ - "unique_obs = set(records_df['species'].dropna())\n", - "unique_obs" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "records_df[records_df[\"species\"].notnull()]" + "ca_df.iloc[:,:7].plot.box()\n", + "plt.title('Argia agrioides - %s' % req.slug)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can also map these observations in their reserve:" + "Taking a look at minimum temperatures now, here are the temperature distributions by decade." ] }, { @@ -1304,37 +1302,31 @@ "metadata": {}, "outputs": [], "source": [ - "color_dict, html_key = assign_colors(records_df[records_df[\"species\"].notnull()], \"species\")\n", - "display(HTML(html_key))" + "dfs = []\n", + "names = ('tasmin_year_CanESM2_rcp45', 'tasmin_year_CanESM2_rcp85')\n", + "for name in names:\n", + " req = CalAdaptRequest(name)\n", + " dfs.append(req.concat_features(records_g, 'gbifID'))" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": false - }, + "metadata": {}, "outputs": [], "source": [ - "mape = folium.Map([39.274061, -120.394561], zoom_start=10)\n", - "\n", - "for r in records_df.iterrows():\n", - " lat = r[1]['decimalLatitude']\n", - " long = r[1]['decimalLongitude']\n", - " try:\n", - " folium.CircleMarker((lat, long), color=color_dict[r[1]['species']], popup=r[1]['species']).add_to(mape)\n", - " except:\n", - " pass\n", - "mape" + "for name, df in zip(names, dfs):\n", + " t = df.resample('10A').mean().transpose()\n", + " t.columns = t.columns.year\n", + " t.plot.box()\n", + " plt.title(name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Ecoengine API\n", - "\n", - "We can use UCB's [EcoEngine API](https://ecoengine.berkeley.edu/) just like we used the GBIF API to get back a checklist of species for a specific station. We'll ask for what's on the checklist for Sagehen:" + "Make plots for historical and projected temperatures at three station locations. Once again, Pandas gives us a lot of functionality in displaying our data beautifully. " ] }, { @@ -1343,18 +1335,27 @@ "metadata": {}, "outputs": [], "source": [ - "eco_req = EcoEngineRequest()\n", - "\n", - "params = {\"footprint\": \"sagehen\"}\n", - "checklist_sagehen = eco_req.get_scientific_names_from_checklists(params)\n", - "checklist_sagehen" + "# station_urls = {\n", + "# 'Blodgett': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/BlodgettForestResearchStation.wkt',\n", + "# 'Hopland': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/HoplandREC.wkt',\n", + "# 'Sagehen': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/SagehenCreekFieldStation.wkt'\n", + "# }\n", + "# stn_features = [{'id': name, 'geometry': wkt.loads(requests.get(url).text)}\n", + "# for name, url in station_urls.items()]\n", + "# Reduce geometric complexity, we need smaller geometry serializations to circumvent \n", + "# URL length restrictions.\n", + "tol = .0001\n", + "for feat in reserves['features']:\n", + " feat['geometry'] = feat['geometry']['coordinates'].buffer(tol).simplify(tol)" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "How much does this overlapped with what's been observed and recorded in the GBIF API? Let's grab all the observations from the Sagehen Creek FS:" + "stn_features" ] }, { @@ -1363,21 +1364,25 @@ "metadata": {}, "outputs": [], "source": [ - "req = GBIFRequest() # creating a request to the API\n", - "params = {'basisOfRecord': \"HUMAN_OBSERVATION\",\n", - " 'stateProvince': \"California\",\n", - " 'locality': \"Sagehen Creek Field Station\"} # setting our parameters (the specific species we want)\n", - "pages = req.get_pages(params, thresh=500) # using those parameters to complete the request\n", - "records = [rec for page in pages for rec in page['results'] if rec.get('decimalLatitude')] # sift out valid records\n", - "records_df = pd.read_json(json.dumps(records))\n", - "records_df.head()" + "dfstns_lst = []\n", + "names = ('tasmax_year_CanESM2_historical', 'tasmax_year_CanESM2_rcp85')\n", + "# Iterate through each name and make requests using the CalAdapt API. We can also use Pandas to display this data efficiently.\n", + "for name in names:\n", + " req = CalAdaptRequest(name)\n", + " d = req.concat_features(stn_features)\n", + " dfstns_lst.append(d)\n", + "\n", + "dfstns = pd.concat(dfstns_lst)" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "How many do we have?" + "# Use .describe() to display summary statistics of dfstns.\n", + "dfstns.describe()" ] }, { @@ -1386,14 +1391,15 @@ "metadata": {}, "outputs": [], "source": [ - "len(records_df)" + "# Use dfstns to create a line graph.\n", + "dfstns.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "That's not too many and it looks mostly like birds, so it probably won't overlap with any of the checklists:" + "Here we'll print the decadal average and difference from 20 years prior." ] }, { @@ -1402,9 +1408,7 @@ "metadata": {}, "outputs": [], "source": [ - "sagehen_geom = stn_features[1][\"geometry\"]\n", - "observed_scientific_names = [s.split(\"(\")[0].strip().split(\",\")[0] for s in records_df['scientificName']]\n", - "observed_scientific_names" + "dfstns.resample('10A').mean().diff(periods=2).dropna()" ] }, { @@ -1413,14 +1417,7 @@ "metadata": {}, "outputs": [], "source": [ - "set(unique_obs).intersection(set(observed_scientific_names))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - ":(" + "dfstns.resample('10A').mean().diff().plot()" ] } ], diff --git a/drafts/sagehen.ipynb b/drafts/sagehen.ipynb new file mode 100644 index 0000000..604a389 --- /dev/null +++ b/drafts/sagehen.ipynb @@ -0,0 +1,331 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install --no-cache-dir shapely\n", + "!pip install -U folium\n", + "\n", + "%matplotlib inline\n", + "import os\n", + "import time\n", + "import folium\n", + "from datetime import datetime\n", + "from shapely.geometry import Point\n", + "from shapely.geometry.polygon import Polygon\n", + "import matplotlib as mpl\n", + "from matplotlib.collections import PatchCollection\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pandas as pd\n", + "import requests\n", + "from datascience import *\n", + "from shapely import geometry as sg, wkt\n", + "from scripts.espm_module import *\n", + "import json\n", + "import random\n", + "from IPython.core.display import display, HTML\n", + "import ipywidgets as widgets\n", + "plt.style.use('seaborn')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "# Sagehen Creek Field Station\n", + "\n", + "![sagehen](http://gk12calbio.berkeley.edu/images/sagehen_view.jpg)\n", + "\n", + "What are the specimens that occur in Sagehen Creek FS?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "req = GBIFRequest() # creating a request to the API\n", + "params = {'institutionCode': 'scfs'} # setting our parameters (the specific species we want)\n", + "pages = req.get_pages(params) # using those parameters to complete t he request\n", + "records = [rec for page in pages for rec in page['results'] if rec.get('decimalLatitude')] # sift out valid records\n", + "records[:5] # print first 5 records" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's make this a `DataFrame` again:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "records_df = pd.read_json(json.dumps(records))\n", + "records_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "How many records did we get?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "len(records_df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can plot the different classes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "records_df['class'].value_counts().plot.barh()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And species, but there are a lot, so we'll look at the most common 10:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "records_df['species'].value_counts()[:10].plot.barh()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And genus:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "records_df['genus'].value_counts()[:10].plot.barh()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So what are all the unique species observed?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "unique_obs = set(records_df['species'].dropna())\n", + "unique_obs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "records_df[records_df[\"species\"].notnull()]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also map these observations in their reserve:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "color_dict, html_key = assign_colors(records_df[records_df[\"species\"].notnull()], \"species\")\n", + "display(HTML(html_key))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "mape = folium.Map([39.274061, -120.394561], zoom_start=10)\n", + "\n", + "for r in records_df.iterrows():\n", + " lat = r[1]['decimalLatitude']\n", + " long = r[1]['decimalLongitude']\n", + " try:\n", + " folium.CircleMarker((lat, long), color=color_dict[r[1]['species']], popup=r[1]['species']).add_to(mape)\n", + " except:\n", + " pass\n", + "mape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Ecoengine API\n", + "\n", + "We can use UCB's [EcoEngine API](https://ecoengine.berkeley.edu/) just like we used the GBIF API to get back a checklist of species for a specific station. We'll ask for what's on the checklist for Sagehen:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "eco_req = EcoEngineRequest()\n", + "\n", + "params = {\"footprint\": \"sagehen\"}\n", + "checklist_sagehen = eco_req.get_scientific_names_from_checklists(params)\n", + "checklist_sagehen" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "How much does this overlapped with what's been observed and recorded in the GBIF API? Let's grab all the observations from the Sagehen Creek FS:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "req = GBIFRequest() # creating a request to the API\n", + "params = {'basisOfRecord': \"HUMAN_OBSERVATION\",\n", + " 'stateProvince': \"California\",\n", + " 'locality': \"Sagehen Creek Field Station\"} # setting our parameters (the specific species we want)\n", + "pages = req.get_pages(params, thresh=500) # using those parameters to complete the request\n", + "records = [rec for page in pages for rec in page['results'] if rec.get('decimalLatitude')] # sift out valid records\n", + "records_df = pd.read_json(json.dumps(records))\n", + "records_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "How many do we have?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "len(records_df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "That's not too many and it looks mostly like birds, so it probably won't overlap with any of the checklists:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagehen_geom = stn_features[1][\"geometry\"]\n", + "observed_scientific_names = [s.split(\"(\")[0].strip().split(\",\")[0] for s in records_df['scientificName']]\n", + "observed_scientific_names" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "set(unique_obs).intersection(set(observed_scientific_names))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":(" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python [conda root]", + "language": "python", + "name": "conda-root-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}