Remote web browser automation.
mokr
is a spirtual successor to pyppeteer
,
which it was originally forked from. However, mokr
isn't meant to be a 1:1 drop-in
replacement for it, and also doesn't seek to keep parity with
puppeteer
.
Some functionality has remained the same, but a lot has changed, too.
Some elements have been based off of puppeteer
proper and python-playwright
, too.
mokr
is named after MOCR, Nasa's
Mission Operation Control Rooms that were used to control launches.
Run pip install mokr
to install package.
Run mokr install
to install browsers.
Run mokr scrape <url>
to load the target page and dump contents to console.
See the full documentation.
Launch a headless browser, navigate to a site, and dump the html to console.
import asyncio
from mokr import launch
async def main():
async with launch() as browser:
page = await browser.first_page()
response = await page.goto("https://example.com")
content = await response.content()
print(content)
asyncio.run(main())
Launch a headful browser, hook some handlers to handle requests and responses, and navigate to the Wikipedia page for Python. Use the handlers to intercept the Python logo, make a new request for a picture of a python snake, and fulfill the original request with it.
import asyncio
from mokr import launch
from mokr.network import Request, Response
async def main():
snake_url = "https://upload.wikimedia.org/wikipedia/commons/3/32/Python_molurus_molurus_2.jpg"
async with launch(headless=False) as browser:
page = await browser.first_page()
async def intercept_request(request: Request) -> Request | None:
print(f"Intercepted request for: {request.url}")
if request.url.endswith("Python-logo-notext.svg.png"):
print("Getting a cute python picture to use as the new logo...")
response = await page.fetch(snake_url)
await request.fulfill(response)
else:
return request
def log_response(response: Response) -> Request:
print(f"Got {response.status} from: {response.url}")
page.on("request", intercept_request)
page.on("response", log_response)
await page.goto("https://en.wikipedia.org/wiki/Python_(programming_language)")
asyncio.run(main())
Screenshot from running the above example.
While forked from pyppeteer
, there are some notable changes beyond reformating,
refactoring, and restructuring! Including, but not limited to...
Changed:
- The
NetworkManager
has been overhauled. The new Chrome implementation is based off ofpuppeteer
heavily, but is not 1:1 with it. It uses the fetch domain instead of just the network domain. - Request interception is enabled by default. Can be disabled with
Page.set_request_interception_enabled(False)
(on Chrome, Firefox is always on). Browser.create
has been replaced withBrowser.ready
and accepts no keyword arguments. This means aBrowser
can be instantied and target discovery postponed until.ready()
is called.- The
launch
method is top-level and offers an async context manager to better handle graceful exits. - Firefox only: Temporary extensions can be installed at browser launch.
CDPSession
is nowDevtoolsSession
and shares a base class withConnection
, calledRemoteConnection
.
New:
- Partial Firefox support.
- There is a new class,
FetchDomain
that can be used to send fetch requests viaPage.fetch
(this calls the page's instantiatedFetchDomain
object). - Another new class,
HttpDomain
is available to send ad hoc requests via anhttpx
, HTTP2-enabled, client that syncs it's cookies with the parentPage
and vice-versa. - Proxy support is baked-in, meaning you can pass a
proxy
string tomokr.launch
directly. - New
EventWaiter
class; based off ofpyppeteer.helper.waitForEvent
method.
Removed:
- Tracing has been removed.
ElementHandle.querySelectorEval
and.querySelectorAllEval
have been removed.
Huge thanks are owed to the contributors of all the below projects, without them, this project would be quite different.
The disadvantages below are not a knock on any of these projects or their contributors.
Package | Advantages | Disadvantages |
---|---|---|
playwright-python |
|
|
puppeteer |
|
|
pyppeteer |
|
|
- Finish/publish tests.
- Fully support Firefox. Currently only a subset of CDP is implemented in Firefox, so functionality is lacking. While
BiDi is in development, it is not certain when it will be feature-complete.
There are a few options here:
- puppeteer is tracking their own progress. Could wait for this to be closer to parity and port it.
- Another option would be to port off the implementation Microsoft has done, since they will not be abandoning their custom Firefox distribution for BiDi anytime soon. This would create an abtract dependency, though.
- A third option would be to use temporary extensions to mimic as much behaviour as possible. We could already potentially use the
webRequest API to intercept, abort, and alert requests.
Though the
Runtime.addBinding
CDP method is not implemented in Firefox so it can be difficult to callback to Python methods in a blocking manner.
- Explore decorating
Page.wait_for_<x>
methods withcontextlib.asynccontextmanager
so the syntax is more straightforward.