Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery from unresponsive system #7

Open
mthuurne opened this issue Sep 9, 2010 · 4 comments
Open

Recovery from unresponsive system #7

mthuurne opened this issue Sep 9, 2010 · 4 comments

Comments

@mthuurne
Copy link
Owner

mthuurne commented Sep 9, 2010

Sometimes the system might become unresponsive and no telnet connection or way to press the reset pin is near. We should have one or more measures to regain control of the system.

There are different unresponsive situations that need to be handled separately:

  • If the kernel has crashed and not paniced, for example by overclocking too high, there is nothing we can do: reset is the only way out.
  • If the kernel has paniced, we can use the feature to automatically reboot on panics.
  • If the kernel is still running but the userland is broken beyond repair, a forced reboot is useful. This would be similar to "eisub" ("reisub" without "r", see Magic SysRq).
  • If the application that controls the current virtual terminal is hanging, killing that application would return control to the menu and make the system responsive again. This is probably the most common kind of unresponsiveness.
@mthuurne
Copy link
Owner Author

Actually, we can use the watch dog timer feature of the JZ4740 to automatically restart the system if the kernel is no longer responding.

@mthuurne
Copy link
Owner Author

Ayla wrote a watchdog driver for the JZ4740. It should be integrated in the OpenDingux kernel and be activated and sent "still alive" signals from user space. This could either be done using a dedicated daemon or from the power slider daemon. A dedicated daemon is the cleaner solution, but we have to check what the impact is on memory use.

The feature to kill the application that uses the current virtual terminal could be implemented in either the kernel or the power slider daemon. I prefer the daemon, since we could add visual confirmation to that once we have the overlay framebuffer.

mthuurne pushed a commit that referenced this issue Aug 2, 2011
Affected kernels 2.6.36 - 3.0

AppArmor may do a GFP_KERNEL memory allocation with task_lock(tsk->group_leader);
held when called from security_task_setrlimit.  This will only occur when the
task's current policy has been replaced, and the task's creds have not been
updated before entering the LSM security_task_setrlimit() hook.

BUG: sleeping function called from invalid context at mm/slub.c:847
 in_atomic(): 1, irqs_disabled(): 0, pid: 1583, name: cupsd
 2 locks held by cupsd/1583:
  #0:  (tasklist_lock){.+.+.+}, at: [<ffffffff8104dafa>] do_prlimit+0x61/0x189
  jonsmirl#1:  (&(&p->alloc_lock)->rlock){+.+.+.}, at: [<ffffffff8104db2d>]
do_prlimit+0x94/0x189
 Pid: 1583, comm: cupsd Not tainted 3.0.0-rc2-git1 #7
 Call Trace:
  [<ffffffff8102ebf2>] __might_sleep+0x10d/0x112
  [<ffffffff810e6f46>] slab_pre_alloc_hook.isra.49+0x2d/0x33
  [<ffffffff810e7bc4>] kmem_cache_alloc+0x22/0x132
  [<ffffffff8105b6e6>] prepare_creds+0x35/0xe4
  [<ffffffff811c0675>] aa_replace_current_profile+0x35/0xb2
  [<ffffffff811c4d2d>] aa_current_profile+0x45/0x4c
  [<ffffffff811c4d4d>] apparmor_task_setrlimit+0x19/0x3a
  [<ffffffff811beaa5>] security_task_setrlimit+0x11/0x13
  [<ffffffff8104db6b>] do_prlimit+0xd2/0x189
  [<ffffffff8104dea9>] sys_setrlimit+0x3b/0x48
  [<ffffffff814062bb>] system_call_fastpath+0x16/0x1b

Signed-off-by: John Johansen <[email protected]>
Reported-by: Miles Lane <[email protected]>
Cc: [email protected]
Signed-off-by: James Morris <[email protected]>
@mthuurne
Copy link
Owner Author

mthuurne commented Sep 4, 2011

Current status: Watchdog is integrated and the rootfs starts a daemon for it. Auto-reboot on panic is enabled. The power slider daemon has button combinations for reboot and poweroff. What is still missing is a forced reboot when user space is half-functional (watchdog operational but "reboot" command not working) and a way to terminate the active application.

@pcercuei
Copy link

It is now possible to kill the active application via a key shortcut.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants