forked from mit-pdos/xv6-riscv-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
sched.tex
1605 lines (1514 loc) · 47.4 KB
/
sched.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% cox and mullender, semaphores.
%
% process has one thread with two stacks
%
% pike et al, sleep and wakeup
\chapter{Scheduling}
\label{CH:SCHED}
Any operating system is likely to run with more processes than the
computer has CPUs, so a plan is needed to time-share the
CPUs among the processes. Ideally the sharing would be transparent to user
processes. A common approach is to provide each process
with the illusion that it has its own virtual CPU by
\indextext{multiplexing}
the processes onto the hardware CPUs.
This chapter explains how xv6 achieves this multiplexing.
%%
\section{Multiplexing}
%%
Xv6 multiplexes by switching each CPU from one process to
another in two situations. First, xv6's
\lstinline{sleep}
and
\lstinline{wakeup}
mechanism switches when a process waits
for device or pipe I/O to complete, or waits for a child
to exit, or waits in the
\lstinline{sleep}
system call.
Second, xv6 periodically forces a switch to cope with
processes that compute for long periods without sleeping.
This multiplexing creates the illusion that
each process has its own CPU, much as xv6 uses the memory allocator and hardware
page tables to create the illusion that each process has its own memory.
Implementing multiplexing poses a few challenges. First, how to switch from one
process to another?
Although the idea of context switching
is simple, the implementation is some of the most opaque code
in xv6. Second, how to force
switches in a way that is transparent to user processes? Xv6 uses the
standard technique in which a hardware timer's interrupts drive context switches.
Third, all of the CPUs switch among the same shared set of processes, and a
locking plan is necessary to avoid races. Fourth, a process's
memory and other resources must be freed when the process exits,
but it cannot do all of this itself
because (for example) it can't free its own kernel stack while still using it.
Fifth, each core of a multi-core machine must remember which
process it is executing so that system calls affect the correct
process's kernel state.
Finally, \lstinline{sleep} and \lstinline{wakeup} allow a process to give up
the CPU and wait to be woken up by another process or interrupt.
Care is needed to avoid races that result in
the loss of wakeup notifications.
Xv6 tries to solve these problems as
simply as possible, but nevertheless the resulting code is
tricky.
%%
\section{Code: Context switching}
%%
\begin{figure}[t]
\center
\includegraphics[scale=0.5]{fig/switch.pdf}
\caption{Switching from one user process to another. In this example, xv6 runs with one CPU (and thus one scheduler thread).}
\label{fig:switch}
\end{figure}
Figure~\ref{fig:switch}
outlines the steps involved in switching from one
user process to another:
a user-kernel transition (system
call or interrupt) to the old process's kernel thread,
a context switch to the current CPU's scheduler thread, a context
switch to a new process's kernel thread, and a trap return
to the user-level process.
The xv6 scheduler has a dedicated thread (saved registers and stack)
per CPU because
it is not safe for the scheduler to execute on
the old process's kernel stack: some other core might
wake the process up and run it, and it would be a disaster
to use the same stack on two different cores.
%
% I think it may never be safe. if the old process is a ZOMBIE, the
% parent may mark it UNUSED and another core may allocate it in
% fork(). if the old process is SLEEPING or RUNNABLE, another core may
% wake it up and run it.
%
% what has changed is that, if we deleted wait(), it would
% be safe for an exit()ing process to set its state to UNUSED
% directly (no ZOMBIE, no need for parent cleanup).
%
In this section we'll examine the mechanics of switching
between a kernel thread and a scheduler thread.
Switching from one thread to another involves saving the old thread's
CPU registers, and restoring the previously-saved registers of the
new thread; the fact that
the stack pointer and program counter
are saved and restored means that the CPU will switch stacks and
switch what code it is executing.
The function
\indexcode{swtch}
performs the saves and restores for a kernel thread switch.
\lstinline{swtch}
doesn't directly know about threads; it just saves and
restores sets of 32 RISC-V registers, called
\indextext{contexts}.
When it is time for a process to give up the CPU,
the process's kernel thread calls
\lstinline{swtch}
to save its own context and return to the scheduler context.
Each context is contained in a
\lstinline{struct}
\lstinline{context}
\lineref{kernel/proc.h:/^struct.context/},
itself contained in a process's
\lstinline{struct proc}
or a CPU's
\lstinline{struct cpu}.
\lstinline{swtch}
takes two arguments:
\indexcode{struct context}
\lstinline{*old}
and
\lstinline{struct}
\lstinline{context}
\lstinline{*new}.
It saves the current registers in
\lstinline{old},
loads registers from
\lstinline{new},
and returns.
Let's follow a process through
\lstinline{swtch}
into the scheduler.
We saw in Chapter~\ref{CH:TRAP}
that one possibility at the end of an interrupt is that
\indexcode{usertrap}
calls
\indexcode{yield}.
\lstinline{yield}
in turn calls
\indexcode{sched},
which calls
\indexcode{swtch}
to save the current context in
\lstinline{p->context}
and switch to the scheduler context previously saved in
\indexcode{cpu->context}
\lineref{kernel/proc.c:/swtch..p/}.
\lstinline{swtch}
\lineref{kernel/swtch.S:/swtch/}
saves only callee-saved registers;
the C compiler generates code in the caller to save
caller-saved registers on the stack.
\lstinline{swtch} knows the offset of each
register's field in
\lstinline{struct context}.
It does not save the program counter.
Instead,
\lstinline{swtch}
saves the
\lstinline{ra} register,
which holds the return address from which
\lstinline{swtch}
was called.
Now
\lstinline{swtch}
restores registers from the new context,
which holds register values saved by a previous
\lstinline{swtch}.
When
\lstinline{swtch}
returns, it returns to the instructions pointed to
by the restored
\lstinline{ra}
register, that is,
the instruction from which the new thread previously
called \lstinline{swtch}.
In addition, it returns on the new thread's stack,
since that's where the restored \lstinline{sp} points.
In our example,
\indexcode{sched}
called
\indexcode{swtch}
to switch to
\indexcode{cpu->context},
the per-CPU scheduler context.
That context was saved at the point in the past when
\lstinline{scheduler}
called
\lstinline{swtch}
\lineref{kernel/proc.c:/swtch.&c/}
to switch to the process that's now giving up the CPU.
When the
\indexcode{swtch}
we have been tracing returns,
it returns not to
\lstinline{sched}
but to
\indexcode{scheduler},
with the stack pointer in the current CPU's
scheduler stack.
%%
\section{Code: Scheduling}
%%
The last section looked at the low-level details of
\indexcode{swtch};
now let's take
\lstinline{swtch}
as a given and examine
switching from one process's kernel thread
through the scheduler to another process.
The scheduler exists in the form of a special thread per CPU, each running the
\lstinline{scheduler}
function.
This function is in charge of choosing which process to run next.
A process
that wants to give up the CPU must
acquire its own process lock
\indexcode{p->lock},
release any other locks it is holding,
update its own state
(\lstinline{p->state}),
and then call
\indexcode{sched}.
You can see this sequence in
\lstinline{yield}
\lineref{kernel/proc.c:/^yield/},
\texttt{sleep}
and
\texttt{exit}.
\lstinline{sched}
double-checks some of those requirements
\linerefs{kernel/proc.c:/if..holding/,/running/}
and then checks an implication:
since a lock is held, interrupts should be disabled.
Finally,
\indexcode{sched}
calls
\indexcode{swtch}
to save the current context in
\lstinline{p->context}
and switch to the scheduler context in
\indexcode{cpu->context}.
\lstinline{swtch}
returns on the scheduler's stack
as though
\indexcode{scheduler}'s
\lstinline{swtch}
had returned
\lineref{kernel/proc.c:/swtch.*contex.*contex/}.
The scheduler continues its
\lstinline{for}
loop, finds a process to run,
switches to it, and the cycle repeats.
We just saw that xv6 holds
\indexcode{p->lock}
across calls to
\lstinline{swtch}:
the caller of
\indexcode{swtch}
must already hold the lock, and control of the lock passes to the
switched-to code. This convention is unusual with locks; usually
the thread that acquires a lock is also responsible for
releasing the lock, which makes it easier to reason about correctness.
For context switching it is necessary to break this convention because
\indexcode{p->lock}
protects invariants on the process's
\lstinline{state}
and
\lstinline{context}
fields that are not true while executing in
\lstinline{swtch}.
One example of a problem that could arise if
\lstinline{p->lock}
were not held during
\indexcode{swtch}:
a different CPU might decide
to run the process after
\indexcode{yield}
had set its state to
\lstinline{RUNNABLE},
but before
\lstinline{swtch}
caused it to stop using its own kernel stack.
The result would be two CPUs running on the same stack,
which would cause chaos.
The only place a kernel thread gives up its CPU is in
\lstinline{sched},
and it always switches to the same location in \lstinline{scheduler}, which
(almost) always switches to some kernel thread that previously called
\lstinline{sched}.
Thus, if one were to print out the line numbers where xv6 switches
threads, one would observe the following simple pattern:
\lineref{kernel/proc.c:/swtch..c/},
\lineref{kernel/proc.c:/swtch..p/},
\lineref{kernel/proc.c:/swtch..c/},
\lineref{kernel/proc.c:/swtch..p/},
and so on.
Procedures that intentionally transfer control to each other via thread
switch are sometimes referred to as
\indextext{coroutines};
in this example,
\indexcode{sched}
and
\indexcode{scheduler}
are co-routines of each other.
There is one case when the scheduler's call to
\indexcode{swtch}
does not end up in
\indexcode{sched}.
\lstinline{allocproc} sets the context \lstinline{ra}
register of a new process to
\indexcode{forkret}
\lineref{kernel/proc.c:/^forkret/},
so that its first \lstinline{swtch} ``returns''
to the start of that function.
\lstinline{forkret}
exists to release the
\indexcode{p->lock};
otherwise, since the new process needs to
return to user space as if returning from \lstinline{fork},
it could instead start at
\lstinline{usertrapret}.
\lstinline{scheduler}
\lineref{kernel/proc.c:/^scheduler/}
runs a loop:
find a process to run, run it until it yields, repeat.
The scheduler
loops over the process table
looking for a runnable process, one that has
\lstinline{p->state}
\lstinline{==}
\lstinline{RUNNABLE}.
Once it finds a process, it sets the per-CPU current process
variable
\lstinline{c->proc},
marks the process as
\lstinline{RUNNING},
and then calls
\indexcode{swtch}
to start running it
\linerefs{kernel/proc.c:/Switch.to/,/swtch/}.
One way to think about the structure of the scheduling code is
that it enforces a set of invariants about each process,
and holds
\indexcode{p->lock}
whenever those invariants are not true.
One invariant is that if a process is
\lstinline{RUNNING},
a timer interrupt's
\indexcode{yield}
must be able to safely switch away from the process;
this means that the CPU registers must hold the process's register values
(i.e.
\lstinline{swtch}
hasn't moved them to a
\lstinline{context}),
and
\lstinline{c->proc}
must refer to the process.
Another invariant is that if a process is
\indexcode{RUNNABLE},
it must be safe for
an idle CPU's
\indexcode{scheduler}
to run it;
this means that
\indexcode{p->context}
must hold the process's registers (i.e., they are
not actually in the real registers),
that no CPU is executing on the process's kernel stack,
and that no CPU's
\lstinline{c->proc}
refers to the process.
Observe that these properties are often not true while
\lstinline{p->lock} is held.
Maintaining the above invariants is the reason why xv6 often acquires
\indexcode{p->lock}
in one thread and releases it in another,
for example acquiring in
\lstinline{yield}
and releasing in
\lstinline{scheduler}.
Once \lstinline{yield} has started to modify a running process's state
to make it
\lstinline{RUNNABLE},
the lock must remain held until the invariants are restored:
the earliest correct release point is after
\lstinline{scheduler}
(running on its own stack)
clears
\lstinline{c->proc}.
Similarly, once
\lstinline{scheduler}
starts to convert a \lstinline{RUNNABLE} process to
\lstinline{RUNNING},
the lock cannot be released until the kernel thread
is completely running (after the
\lstinline{swtch},
for example in
\lstinline{yield}).
%%
\section{Code: mycpu and myproc}
%%
Xv6 often needs a pointer to the current process's \lstinline{proc}
structure. On a uniprocessor one could have a global variable pointing
to the current \lstinline{proc}. This doesn't work on a multi-core
machine, since each core executes a different process. The way to
solve this problem is to exploit the fact that each core has its own
set of registers; we can use one of those registers to help find
per-core information.
Xv6 maintains a
\indexcode{struct cpu}
for each CPU
\lineref{kernel/proc.h:/^struct.cpu/},
which records
the process currently running
on that CPU (if any),
saved registers for the CPU's scheduler thread,
and the count of nested spinlocks needed to manage
interrupt disabling.
The function
\indexcode{mycpu}
\lineref{kernel/proc.c:/^mycpu/}
returns a pointer to the current CPU's
\lstinline{struct cpu}.
RISC-V numbers its CPUs, giving each
a \indextext{hartid}.
Xv6 ensures that each CPU's hartid is stored in that CPU's \lstinline{tp} register
while in the kernel.
This allows
\lstinline{mycpu} to use \lstinline{tp} to index an array
of \lstinline{cpu} structures to find the right one.
Ensuring that a CPU's \lstinline{tp} always holds the CPU's
hartid is a little involved. \lstinline{start} sets the \lstinline{tp}
register early in the CPU's boot sequence, while still in machine mode
\lineref{kernel/start.c:/w_tp/}.
\lstinline{usertrapret} saves \lstinline{tp} in the trampoline
page, because the user process might modify \lstinline{tp}.
Finally, \lstinline{uservec} restores that saved \lstinline{tp}
when entering the kernel from user space
\lineref{kernel/trampoline.S:/make tp hold/}.
The compiler guarantees never to use the \lstinline{tp}
register.
It would be more convenient if xv6 could ask the RISC-V
hardware for the current hartid whenever needed,
but RISC-V allows that only in
machine mode, not in supervisor mode.
The return values of
\lstinline{cpuid}
and
\lstinline{mycpu}
are fragile: if the timer were to interrupt and cause
the thread to yield and then move to a different CPU,
a previously returned value would no longer be correct.
To avoid this problem, xv6 requires that callers
disable interrupts, and only enable
them after they finish using the returned
\lstinline{struct cpu}.
The function
\indexcode{myproc}
\lineref{kernel/proc.c:/^myproc/}
returns the
\lstinline{struct proc}
pointer
for the process that is running on the current CPU.
\lstinline{myproc}
disables interrupts, invokes
\lstinline{mycpu},
fetches the current process pointer
(\lstinline{c->proc})
out of the
\lstinline{struct cpu},
and then enables interrupts.
The return value of
\lstinline{myproc}
is safe to use even if interrupts are enabled:
if a timer interrupt moves the calling process to a
different CPU, its
\lstinline{struct proc}
pointer will stay the same.
%%
\section{Sleep and wakeup}
\label{sec:sleep}
%%
Scheduling and locks help conceal the actions of one thread
from another,
but we also need abstractions that help
threads intentionally interact.
For example, the reader of a pipe in xv6 may need to wait
for a writing process to produce data;
a parent's call to \lstinline{wait} may need to
wait for a child to exit; and
a process reading the disk needs to wait
for the disk hardware to finish the read.
The xv6 kernel uses a mechanism called sleep and wakeup
in these situations (and many others).
Sleep allows a kernel thread to
wait for a specific event; another thread can call wakeup
to indicate that threads waiting for an event should resume.
Sleep and wakeup are often called
\indextext{sequence coordination}
or
\indextext{conditional synchronization}
mechanisms.
Sleep and wakeup provide a relatively low-level synchronization
interface. To motivate the way they work in xv6,
we'll use them to build a higher-level
synchronization mechanism called a \indextext{semaphore}~\cite{dijkstra65} that
coordinates producers and consumers
(xv6 does not use semaphores).
A semaphore maintains a count and provides two operations.
The ``V'' operation (for the producer) increments the count.
The ``P'' operation (for the consumer) waits until the count is non-zero,
and then decrements it and returns.
If there were only one producer thread and one consumer thread,
and they executed on different CPUs,
and the compiler didn't optimize too aggressively,
this implementation would be correct:
\begin{lstlisting}[numbers=left,firstnumber=100]
struct semaphore {
struct spinlock lock;
int count;
};
void
V(struct semaphore *s)
{
acquire(&s->lock);
s->count += 1;
release(&s->lock);
}
void
P(struct semaphore *s)
{
while(s->count == 0)
;
acquire(&s->lock);
s->count -= 1;
release(&s->lock);
}
\end{lstlisting}
The implementation above
is expensive. If the producer acts
rarely, the consumer will spend most
of its time spinning in the
\lstinline{while}
loop hoping for a non-zero count.
The consumer's CPU could probably find more productive work than
\indextext{busy waiting}
by repeatedly
\indextext{polling}
\lstinline{s->count}.
Avoiding busy waiting requires
a way for the consumer to yield the CPU
and resume only after
\lstinline{V}
increments the count.
Here's a step in that direction, though as we
will see it is not enough.
Let's imagine a pair of calls,
\indexcode{sleep}
and
\indexcode{wakeup},
that work as follows.
\lstinline{sleep(chan)}
sleeps on the arbitrary value
\indexcode{chan},
called the
\indextext{wait channel}.
\lstinline{sleep}
puts the calling process to sleep, releasing the CPU
for other work.
\lstinline{wakeup(chan)}
wakes all processes sleeping on
\lstinline{chan}
(if any), causing their
\lstinline{sleep}
calls to return.
If no processes are waiting on
\lstinline{chan},
\lstinline{wakeup}
does nothing.
We can change the semaphore implementation to use
\lstinline{sleep}
and
\lstinline{wakeup} (changes highlighted in yellow):
\begin{lstlisting}[numbers=left,firstnumber=200]
void
V(struct semaphore *s)
{
acquire(&s->lock);
s->count += 1;
(*@\hl{wakeup(s);}@*)
release(&s->lock);
}
void
P(struct semaphore *s)
{
while(s->count == 0) (*@\label{line:test}@*)
(*@\hl{sleep(s);}@*) (*@\label{line:sleep}@*)
acquire(&s->lock);
s->count -= 1;
release(&s->lock);
}
\end{lstlisting}
% \begin{figure}[t]
% \center
% \includegraphics[scale=0.5]{fig/deadlock.pdf}
% \caption{Example lost wakeup problem}
% \label{fig:deadlock}
% \end{figure}
\lstinline{P}
now gives up the CPU instead of spinning, which is nice.
However, it turns out not to be straightforward to design
\lstinline{sleep}
and
\lstinline{wakeup}
with this interface without suffering
from what is known as the \indextext{lost wake-up} problem.
% (see
% Figure~\ref{fig:deadlock}).
Suppose that
\lstinline{P}
finds that
\lstinline{s->count}
\lstinline{==}
\lstinline{0}
on line~\ref{line:test}.
While
\lstinline{P}
is between lines~\ref{line:test} and~\ref{line:sleep},
\lstinline{V}
runs on another CPU:
it changes
\lstinline{s->count}
to be nonzero and calls
\lstinline{wakeup},
which finds no processes sleeping and thus does nothing.
Now
\lstinline{P}
continues executing at line~\ref{line:sleep}:
it calls
\lstinline{sleep}
and goes to sleep.
This causes a problem:
\lstinline{P}
is asleep waiting for a \lstinline{V} call
that has already happened.
Unless we get lucky and the producer calls
\lstinline{V} again, the consumer will wait
forever even
though the count is non-zero.
The root of this problem is that the
invariant that
\lstinline{P}
only sleeps when
\lstinline{s->count}
\lstinline{==}
\lstinline{0}
is violated by
\lstinline{V}
running at just the wrong moment.
An incorrect way to protect the invariant would be to
move the lock acquisition (highlighted in yellow below) in
\lstinline{P}
so that its check of the count and its call to \lstinline{sleep}
are atomic:
\begin{lstlisting}[numbers=left,firstnumber=300]
void
V(struct semaphore *s)
{
acquire(&s->lock);
s->count += 1;
wakeup(s);
release(&s->lock);
}
void
P(struct semaphore *s)
{
(*@\hl{acquire(\&s->lock);}@*)
while(s->count == 0) (*@\label{line:test1}@*)
sleep(s); (*@\label{line:sleep1}@*)
s->count -= 1;
release(&s->lock);
}
\end{lstlisting}
One might hope that this version of
\lstinline{P}
would avoid the lost wakeup because the lock prevents
\lstinline{V}
from executing between lines~\ref{line:test1} and~\ref{line:sleep1}.
It does that, but it also deadlocks:
\lstinline{P}
holds the lock while it sleeps,
so \lstinline{V} will block forever waiting for the lock.
We'll fix the preceding scheme by changing
\lstinline{sleep}'s
interface:
the caller must pass the \indextext{condition lock} to
\lstinline{sleep}
so it can release the lock after
the calling process is marked as asleep and waiting on the
sleep channel.
The lock will force a concurrent
\lstinline{V}
to wait until \lstinline{P} has finished putting itself to sleep,
so that the
\lstinline{wakeup}
will find the sleeping consumer and wake it up.
Once the consumer is awake again
\indexcode{sleep}
reacquires the lock before returning.
Our new correct sleep/wakeup scheme is usable as follows (change
highlighted in yellow):
\begin{lstlisting}[numbers=left,firstnumber=400]
void
V(struct semaphore *s)
{
acquire(&s->lock);
s->count += 1;
wakeup(s);
release(&s->lock);
}
void
P(struct semaphore *s)
{
acquire(&s->lock);
while(s->count == 0)
(*@\hl{sleep(s, \&s->lock);}@*)
s->count -= 1;
release(&s->lock);
}
\end{lstlisting}
The fact that
\lstinline{P}
holds
\lstinline{s->lock}
prevents
\lstinline{V}
from trying to wake it up between
\lstinline{P}'s
check of
\lstinline{s->count}
and its call to
\lstinline{sleep}.
Note, however, that we need
\lstinline{sleep}
to atomically release
\lstinline{s->lock}
and put the consuming process to sleep,
in order to avoid lost wakeups.
%%
\section{Code: Sleep and wakeup}
%%
Xv6's
\indexcode{sleep}
\lineref{kernel/proc.c:/^sleep/}
and
\indexcode{wakeup}
\lineref{kernel/proc.c:/^wakeup/}
provide the interface shown in the last example above,
and their implementation (plus rules for how to use them)
ensures that there are no lost wakeups.
The basic idea is to have
\lstinline{sleep}
mark the current process as
\indexcode{SLEEPING}
and then call
\indexcode{sched}
to release the CPU;
\lstinline{wakeup}
looks for a process sleeping on the given wait channel
and marks it as
\indexcode{RUNNABLE}.
Callers of
\lstinline{sleep}
and
\lstinline{wakeup}
can use any mutually convenient number as the channel.
Xv6 often uses the address
of a kernel data structure involved in the waiting.
\lstinline{sleep}
acquires
\indexcode{p->lock}
\lineref{kernel/proc.c:/DOC: sleeplock1/}.
Now the process going to sleep holds both
\lstinline{p->lock}
and
\lstinline{lk}.
Holding
\lstinline{lk}
was necessary in the caller (in the example,
\lstinline{P}):
it
ensured that no other process (in the example,
one running
\lstinline{V})
could start a call to
\lstinline{wakeup(chan)}.
Now that
\lstinline{sleep}
holds
\lstinline{p->lock},
it is safe to release
\lstinline{lk}:
some other process may start a call to
\lstinline{wakeup(chan)},
but
\indexcode{wakeup}
will wait to acquire
\indexcode{p->lock},
and thus will wait until
\lstinline{sleep}
has finished putting the process to sleep,
keeping the
\lstinline{wakeup}
from missing the
\lstinline{sleep}.
Now that
\lstinline{sleep}
holds
\lstinline{p->lock}
and no others,
it can put the process to sleep by recording
the sleep channel,
changing the process state to \texttt{SLEEPING},
and calling
\lstinline{sched}
\linerefs{kernel/proc.c:/chan.=.chan/,/sched/}.
In a moment it will be clear why it's critical that
\lstinline{p->lock} is not released (by \lstinline{scheduler}) until after
the process is marked \texttt{SLEEPING}.
At some point, a process will acquire the condition lock,
set the condition that the sleeper is waiting for,
and call \lstinline{wakeup(chan)}.
It's important that \lstinline{wakeup} is called
while holding the condition lock\footnote{%
%
Strictly speaking it is sufficient if
\lstinline{wakeup}
merely follows the
\lstinline{acquire}
(that is, one could call
\lstinline{wakeup}
after the
\lstinline{release}).%
%
}.
\lstinline{wakeup}
loops over the process table
\lineref{kernel/proc.c:/^wakeup\(/}.
It acquires the
\lstinline{p->lock}
of each process it inspects,
both because it may manipulate that process's state
and because
\lstinline{p->lock}
ensures that
\lstinline{sleep}
and
\lstinline{wakeup}
do not miss each other.
When \lstinline{wakeup} finds a process in state
\indexcode{SLEEPING}
with a matching
\indexcode{chan},
it changes that process's state to
\indexcode{RUNNABLE}.
The next time the scheduler runs, it will
see that the process is ready to be run.
Why do the locking rules for
\lstinline{sleep}
and
\lstinline{wakeup}
ensure a sleeping process won't miss a wakeup?
The sleeping process holds either
the condition lock or its own
\lstinline{p->lock}
or both from a point before it checks the condition
to a point after it is marked \texttt{SLEEPING}.
The process calling \texttt{wakeup} holds \textit{both}
of those locks in \texttt{wakeup}'s loop.
Thus the waker either makes the condition true before
the consuming thread checks the condition;
or the waker's \lstinline{wakeup} examines the sleeping
thread strictly after it has been marked \texttt{SLEEPING}.
Then
\lstinline{wakeup}
will see the sleeping process and wake it up
(unless something else wakes it up first).
It is sometimes the case that multiple processes are sleeping
on the same channel; for example, more than one process
reading from a pipe.
A single call to
\lstinline{wakeup}
will wake them all up.
One of them will run first and acquire the lock that
\lstinline{sleep}
was called with, and (in the case of pipes) read whatever
data is waiting in the pipe.
The other processes will find that, despite being woken up,
there is no data to be read.
From their point of view the wakeup was ``spurious,'' and
they must sleep again.
For this reason \lstinline{sleep} is always called inside a loop that
checks the condition.
No harm is done if two uses of sleep/wakeup accidentally
choose the same channel: they will see spurious wakeups,
but looping as described above will tolerate this problem.
Much of the charm of sleep/wakeup is that it is both
lightweight (no need to create special data
structures to act as sleep channels) and provides a layer
of indirection (callers need not know which specific process
they are interacting with).
%%
\section{Code: Pipes}
%%
A more complex example that uses \lstinline{sleep}
and \lstinline{wakeup} to synchronize producers and
consumers is xv6's implementation of pipes.
We saw the interface for pipes in Chapter~\ref{CH:UNIX}:
bytes written to one end of a pipe are copied
to an in-kernel buffer and then can be read from
the other end of the pipe.
Future chapters will examine the file descriptor support
surrounding pipes, but let's look now at the
implementations of
\indexcode{pipewrite}
and
\indexcode{piperead}.
Each pipe
is represented by a
\indexcode{struct pipe},
which contains
a
\lstinline{lock}
and a
\lstinline{data}
buffer.
The fields
\lstinline{nread}
and
\lstinline{nwrite}
count the total number of bytes read from
and written to the buffer.
The buffer wraps around:
the next byte written after
\lstinline{buf[PIPESIZE-1]}
is
\lstinline{buf[0]}.
The counts do not wrap.
This convention lets the implementation
distinguish a full buffer
(\lstinline{nwrite}
\lstinline{==}