-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Interface being busy prevented instance creation #579
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When attempting to schedule a Firecracker instance, the error below came in a loop. Solution: Log a warning instead of raising an exception. ``` 2024-03-19 15:45:49,346 | ERROR | File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/opt/aleph-vm/aleph/vm/orchestrator/__main__.py", line 4, in <module> main() File "/opt/aleph-vm/aleph/vm/orchestrator/cli.py", line 371, in main supervisor.run() File "/opt/aleph-vm/aleph/vm/orchestrator/supervisor.py", line 163, in run web.run_app(app, host=settings.SUPERVISOR_HOST, port=settings.SUPERVISOR_PORT) File "/opt/aleph-vm/aiohttp/web.py", line 544, in run_app loop.run_until_complete(main_task) File "/usr/lib/python3.11/asyncio/base_events.py", line 640, in run_until_complete self.run_forever() File "/usr/lib/python3.11/asyncio/base_events.py", line 607, in run_forever self._run_once() File "/usr/lib/python3.11/asyncio/base_events.py", line 1922, in _run_once handle._run() File "/usr/lib/python3.11/asyncio/events.py", line 80, in _run self._context.run(self._callback, *self._args) File "/opt/aleph-vm/aiohttp/web_protocol.py", line 452, in _handle_request resp = await request_handler(request) File "/opt/aleph-vm/sentry_sdk/integrations/aiohttp.py", line 129, in sentry_app_handle response = await old_handle(self, request) File "/opt/aleph-vm/aiohttp/web_app.py", line 543, in _handle resp = await handler(request) File "/opt/aleph-vm/aiohttp/web_middlewares.py", line 114, in impl return await handler(request) File "/opt/aleph-vm/aleph/vm/orchestrator/supervisor.py", line 65, in server_version_middleware resp: web.StreamResponse = await handler(request) File "/opt/aleph-vm/aiohttp/web_urldispatcher.py", line 200, in handler_wrapper result = await result File "/opt/aleph-vm/aleph/vm/orchestrator/run.py", line 129, in run_code_on_request execution = await create_vm_execution_or_raise_http_error(vm_hash=vm_hash, pool=pool) File "/opt/aleph-vm/aleph/vm/orchestrator/run.py", line 90, in create_vm_execution_or_raise_http_error return await create_vm_execution(vm_hash=vm_hash, pool=pool) File "/opt/aleph-vm/aleph/vm/orchestrator/run.py", line 60, in create_vm_execution execution = await pool.create_a_vm( File "/opt/aleph-vm/aleph/vm/pool.py", line 113, in create_a_vm await self.network.create_tap(vm_id, tap_interface) File "/opt/aleph-vm/aleph/vm/network/hostnetwork.py", line 221, in create_tap await interface.create() File "/opt/aleph-vm/aleph/vm/network/interfaces.py", line 128, in create create_tap_interface(ipr, self.device_name) File "/opt/aleph-vm/aleph/vm/network/interfaces.py", line 32, in create_tap_interface ipr.link("add", ifname=device_name, kind="tuntap", mode="tap") File "/opt/aleph-vm/pyroute2/iproute/linux.py", line 1696, in link ret = self.nlm_request(msg, msg_type=msg_type, msg_flags=msg_flags) File "/opt/aleph-vm/pyroute2/netlink/nlsocket.py", line 870, in nlm_request return tuple(self._genlm_request(*argv, **kwarg)) File "/opt/aleph-vm/pyroute2/netlink/nlsocket.py", line 1209, in nlm_request self.put(msg, msg_type, msg_flags, msg_seq=msg_seq) File "/opt/aleph-vm/pyroute2/netlink/nlsocket.py", line 906, in put return self.engine.put( File "/opt/aleph-vm/pyroute2/netlink/nlsocket.py", line 443, in put self.socket.sendto_gate(msg, addr) File "/opt/aleph-vm/pyroute2/netlink/rtnl/iprsocket.py", line 52, in sendto_gate ret = self._sproxy.handle(msg) File "/opt/aleph-vm/pyroute2/netlink/proxy.py", line 61, in handle log.error(''.join(traceback.format_stack())) 2024-03-19 15:45:49,353 | ERROR | Traceback (most recent call last): File "/opt/aleph-vm/pyroute2/netlink/proxy.py", line 43, in handle ret = plugin(msg, self.nl) ^^^^^^^^^^^^^^^^^^^^ File "/opt/aleph-vm/pyroute2/netlink/rtnl/ifinfmsg/proxy.py", line 73, in proxy_newlink return manage_tuntap(msg) ^^^^^^^^^^^^^^^^^^ File "/opt/aleph-vm/pyroute2/netlink/rtnl/ifinfmsg/sync.py", line 60, in decorated ret = f(msg) ^^^^^^ File "/opt/aleph-vm/pyroute2/netlink/rtnl/ifinfmsg/tuntap.py", line 135, in manage_tuntap ioctl(fd, TUNSETIFF, ifr) OSError: [Errno 16] Device or resource busy 2024-03-19 15:45:49,356 | ERROR | Interface vmtap4 is busy - is there another process using it ? Traceback (most recent call last): File "/opt/aleph-vm/aleph/vm/network/interfaces.py", line 32, in create_tap_interface ipr.link("add", ifname=device_name, kind="tuntap", mode="tap") File "/opt/aleph-vm/pyroute2/iproute/linux.py", line 1696, in link ret = self.nlm_request(msg, msg_type=msg_type, msg_flags=msg_flags) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aleph-vm/pyroute2/netlink/nlsocket.py", line 870, in nlm_request return tuple(self._genlm_request(*argv, **kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aleph-vm/pyroute2/netlink/nlsocket.py", line 1214, in nlm_request for msg in self.get( ^^^^^^^^^ File "/opt/aleph-vm/pyroute2/netlink/nlsocket.py", line 873, in get return tuple(self._genlm_get(*argv, **kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aleph-vm/pyroute2/netlink/nlsocket.py", line 550, in get raise msg['header']['error'] pyroute2.netlink.exceptions.NetlinkError: (16, 'Device or resource busy') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/aleph-vm/aleph/vm/orchestrator/run.py", line 90, in create_vm_execution_or_raise_http_error return await create_vm_execution(vm_hash=vm_hash, pool=pool) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aleph-vm/aleph/vm/orchestrator/run.py", line 60, in create_vm_execution execution = await pool.create_a_vm( ^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aleph-vm/aleph/vm/pool.py", line 113, in create_a_vm await self.network.create_tap(vm_id, tap_interface) File "/opt/aleph-vm/aleph/vm/network/hostnetwork.py", line 221, in create_tap await interface.create() File "/opt/aleph-vm/aleph/vm/network/interfaces.py", line 128, in create create_tap_interface(ipr, self.device_name) File "/opt/aleph-vm/aleph/vm/network/interfaces.py", line 37, in create_tap_interface raise InterfaceBusyError( aleph.vm.network.interfaces.InterfaceBusyError: Interface vmtap4 is busy - is there another process using it ? 2024-03-19 15:45:49,362 | INFO | 127.0.0.1 [19/Mar/2024:15:45:30 +0000] "GET /vm/3fc0aa9569da840c43e7bd2033c3c580abb4 ```
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #579 +/- ##
=======================================
Coverage 35.25% 35.25%
=======================================
Files 53 53
Lines 4862 4862
Branches 577 577
=======================================
Hits 1714 1714
Misses 3127 3127
Partials 21 21 ☔ View full report in Codecov by Sentry. |
github-actions
bot
added
the
BLACK
This PR has critical implications and must be reviewed by a senior engineer.
label
Mar 19, 2024
nesitor
approved these changes
Mar 19, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When attempting to schedule a Firecracker instance, the error below came in a loop.
Solution: Log a warning instead of raising an exception.