-
Notifications
You must be signed in to change notification settings - Fork 4
/
TODO
74 lines (61 loc) · 3.57 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
- PLANNED FOR AFTER DMTCP-1.2.4 (as time and resources permit):
1. Separate real from virtual pid/tid: user code always uses virtual
pid/tid, while wrappers always translate into real pid/tid.
2. Create MTCP ckpt module making it easy to choose the preferred
compression routine on fly, remote checkpoints, etc.
3. Modify ckpt image format to note zero-mapped pages and use /proc/*/pagemap.
(An intermediate solution to replace runs of zeros
by a region of zero-mapped pages at checkpoint time has already
been implemented as of DMTCP-1.2.4.)
4. Replace temporary files of $DMTCP_TMPDIR (created on restart)
by shared memory objects.
5. Faster checkpoint/restart: Don't remap libs
6. ELF honors a segment of type PT_LOAD. This can be used in the future
to reserve address ranges to prevent conflicts with vdso, etc.
This will aid process migration between different Linux kernels,
by stopping a loader using address space randomization from
placing vdso, etc., in the wrong place.
7. dmtcp_checkpoint --disable-randomization should be implemented,
which would also disable randomization on restart.
This can be useful for internal testing and further development,
but also useful for users with specialized needs. Note that
gdb disables randomization of the target process by default starting
in GDB 7.x.
- When environ is not yet initialized, currently dmtcphijack.so simply returns.
(This occurs when executing libc.so.6 .)
We should spend more effort to find environ at top of stack.
Note that bash also plays tricks with environ, but bash still gives
us a working getenv().
- We currently wrap fork(), but not vfork(). Usually, the exec happens
soon after vfork(). But if we checkpoint after vfork() and before
exec, then we would miss the child process. This should be fixed.
- setsighandler fails if user has reset our signal handler
Do we cleanly tell the user what signal is being used by app,
and to reurn dmtcp with --sigckpt set to other signal?
- Extend list of syscalls using netdb to put wrappers around them.
The danger is that in invoking the netdb, they might want to
create a socket. Then we detect this, and we try to create
a socket to the coordinator. This hits our wrapper function, which
then creates an infinite loop. At least, we think we saw this in
Ubuntu 6.06 (dapper).
- Many more exotic fcntl()/ioctl() flags are not checkpointed
- Should refuse to call DMTCP recursively, perhaps by maintaining
environment variable $DMTCPLVL (similar to $SHLVL), and refusing
to recurse if it's already 1. (Maybe in the future, we might
support this.)
- dmtcp_coordinator should refuse to allow other users to connect to it.
(Are there security issues with respect to spoofing, also?)
- Tab completion fails on tcsh. (?)
- Wrapper protection lock for vfprintf (called by all functions of printf
family) and iofputs(used for )
- Analyze why the return from maskstderr() and unmaskstderr() solves the problem.
- --enable-debug when combined with enable-pid-virtualization exposes some
problems which are visible in OpenMPI and some other programs.
PID-VIRTUALIZATION:
1. wait status of zombie processes
Store status of those processes which were zombie at checkpoint time. If
the parent process does a wait() system call for that child after restart
we should return its status.
2. update dmtcp_restart.cpp to check and restore orphaned child processes at restart.
3. better error handling for missing processes.
REPLACE all temporary files present in $DMTCP_TMP_DIR with some sort of shared-memory files.