-
-
Notifications
You must be signed in to change notification settings - Fork 349
API
PyWhat has its own API, it will return a JSON object like:
{
"File Signatures": None,
"Regexes": {
"text": [
{
"Matched": "127.0.0.1",
"Regex Pattern": {
"Name": "Internet Protocol (IP) Address Version 4",
"Regex": "^((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?::[0-9]{1,5})?)$",
"plural_name": False,
"Description": None,
"Rarity": 0.7,
"URL": "https://www.shodan.io/host/",
"Tags": [
"Identifiers",
"Networking",
"IPv4"
],
"Boundaryless Regex": "((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?::[0-9]{1,5})?)"
}
}
]
}
}
To use this API, run this code:
from pywhat import Identifier
id = Identifier()
id.identify(text)
All parameters to identify() are keyword-only except the text itself.
id.identify(text,
only_text=True, # If this is True, PyWhat will not read data from the file
dist=None, # Distribution to use (see below for more info regarding Distributions)
key=None, # Key used for sorting, defaults to Keys.NONE (see below for more info regarding sorting)
reverse=False, # If this is True, the output is sorted in descending order
boundaryless=None, # Filter that defines what regexes should be boundaryless (see below for more info regarding boundaryless mode)
search_filenames=False # If this is True, PyWhat will search the name of a file for identifiable info
)
PyWhat has its own filtration system. The core of it is a Filter class.
To filter out what regexes should be used or shown, we can use distributions. A distribution is a filter with regex list.
A nice use-case is Wannacry. Using distributions you can only get all the domains from malware (no crypto-addresses) and use that to auto-buy those domains if possible. Potentially stopping the malware if it has a built in kill-switch!
We start by importing the necessary libraries:
from pywhat import pywhat_tags, Distribution, Filter
Now we can make a filter and a distribution:
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
dist = Distribution(filter1)
We only support:
- MinRarity. Rarity is a measure of how unlikely it is for something to be a false-positive. Rarity of 1 == it can't be a false positive.
Rarity of 0.1 == Very likely to be a false positive.
MinRarity is the absolute minimum you'll want to see. Up this to avoid false positives!
- MaxRarity
Max rarity is the absolute maximum rarity you want to see.
- Tags.
Every regex is tagged. To only use AWS specific tags, use
AWS
as the tag.
To see all tags, run what --tags
😄 or
from pywhat import *
print(pywhat_tags)
- ExcludeTags. What tags do you not want to see?
Let's make another filter:
from pywhat import pywhat_tags, Distribution, Filter
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})
Distributions and Filters support logical operators! Want every tag that's in both filter1
and filter2
?
from pywhat import pywhat_tags, Distribution, Filter
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})
dist = Distribution(filter1 & filter2)
r = identifier.Identifier(dist=dist)
r.identify(text)
Or:
from pywhat import pywhat_tags, Distribution, Filter
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})
dist = Distribution(filter1)
dist &= Distribution(filter2)
r = identifier.Identifier(dist=dist)
r.identify(text)
We also support logical or! Get all the items in distribution1 or distribution2!
from pywhat import pywhat_tags, Distribution, Filter
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking", "AWS"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})
filter3 = Filter({"ExcludeTags": ["AWS"]})
dist = Distribution(filter1) | Distribution(filter2)
dist |= Distribution(filter3)
r = identifier.Identifier(dist=dist)
r.identify(text)
There are 2 ways to use distributions with identifiers.
You can assign one per object:
r = Identifier(dist=dist)
r.identify(text)
Or you can call it in the identifier:
no_networking_tags = Distribution(filter2)
r.identify(text, dist=no_networking_tags)
To get more information use:
from pywhat import *
help(Filter)
help(Distribution)
Pywhat supports sorting. You can get sorted output this way:
from pywhat import *
r = Identifier()
r.identify(text, key=Keys.RARITY) # returns matches sorted by rarity in ascending order
r2 = Identifier(key=Keys.MATCHED, reverse=True)
r2.identify(text) # returns matches sorted alphabetically in descending order
Keys.NAME # Sort by the name of regex pattern
Keys.RARITY # Sort by rarity
Keys.MATCHED # Sort by a matched string
Keys.NONE # No sorting is done (the default)
PyWhat can check if input is a valid file/folder name or a path to a file. If it finds a folder match, PyWhat will recursively search it, and return matches for each file, with key
value being the filename. When PyWhat is searching only text, this value is text
. This behaviour is disabled in API. In order to search within files and folders, you can specify an only_text=False
parameter.
out = r.identify("/Desktop/file.txt", only_text=False)
File searching is enabled in CLI. To disable it pass -o
or --only-text
option.
API does not match inputs like "abcthm{kgh}jk" because the boundaryless mode is disabled by default. Boundaryless mode allows regexes to search within strings (in case of "abcthm{kgh}jk", pywhat can find "thm{kgh}" match). To enable it you need to create a filter denoting what regexes should be in boundaryless mode (see above for more info regarding the filtration system).
from pywhat import *
# All regexes that have 'Identifiers' or 'Cyber Security' tags and a rarity of 0.6 or higher will be in boundaryless mode.
boundaryless = Filter({"Tags": ["Identifiers", "Cyber Security"], "MinRarity": 0.6})
id = Identifier()
id.identify("abcthm{kgh}jk", boundaryless=boundaryless)