~zainab/blog

ref: 8dffdeb4f8f5bf541cbc954de6477fe70a39c810 blog/src/chapters/2022-02-12-cats-effect-ioruntime/thread-pools.html.pm -rw-r--r-- 10.0 KiB
8dffdeb4zainab-ali Rerun snippets with more compute 6 months ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
#lang pollen

headline-link{Why have thread pools?}

(define snooze-time "two")
(define snooze-time-double "four")

p{A keyword{thread pool}, also known as an keyword{execution
context} is a way of managing keyword{parallelism}.}

p{To demonstrate, lets have a look at a simple task: code-inline{snooze}.}

snippet[#:name "snooze"]{}

p{code-inline{snooze} does absolutely nothing. More precisely, it
does absolutely nothing for |snooze-time| seconds. We can double check this by
running it using our handy code-inline{time} function:}

snippet[#:name "run-snooze-no-runtime"]{}

p{Whoops! We need an code-inline{IORuntime}. Lets use our own
code-inline{basicRuntime} explicitly:}

snippet[#:name "run-snooze"]{}

p{As expected, it took |snooze-time| seconds to run.}

p{What if we have multiple snooze tasks?}

snippet[#:name "snooze-list"]{}

p{We can combine a list of tasks using code-inline{parSequence}:}

snippet[#:name "snooze-parallel"]{}

p{The code-inline{parSequence} function produces an code-inline{IO}
that runs multiple tasks in
parallel. If each task takes |snooze-time| seconds, how long should
code-inline{parallelSnoozes} take?}

snippet[#:name "snooze-parallel-run"]{}

p{Both tasks were run at the same time, so the total elapsed time was
still only |snooze-time| seconds.}

p{If youre used to parallel computations, you may look at
code-inline{parSequence} with a degree of suspicion. It lets us run
many tasks in parallel, but how many?}

p{For instance, we can declare a thousand code-inline{snooze} tasks:}

snippet[#:name "snooze-parallel-thousand"]{}

p{Will they really only take |snooze-time| seconds?}

snippet[#:name "snooze-parallel-thousand-run"]{}

p{Nice! There seems to be no upper limit on the tasks we can run in
parallel.}

headline-link{Knowing our limits}

p{Unlimited parallelism seems like a great idea, but it has
significant downsides. Lets have a look at a different task to
demonstrate.}

snippet[#:name "factorial"]{}

p{Woah! Thats an odd bit of code.}

p{If youre a functional programming enthusiast, youre probably so
fond of factorials that you compute them in your sleep.}

p{Those skills, elegant as they are, arent too important here. Dont
worry if youve never heard of a factorial, havent seen code-inline{@scala.annotation.tailrec} before, or get a headache
reading this Escher-like code.}

p{The key part of code-inline{factorial} is that, unlike code-inline{snooze}, it does a lot of multiplication.}

p{Running this on my rusty old laptop takes approximately two seconds.}

snippet[#:name "factorial-run"]{}

p{The functional programmer within you might point out that this
code is pure: theres no reason to wrap it in an
code-inline{IO}. While thats true, doing so lets us parallelize it
with code-inline{parSequence}:}

snippet[#:name "factorial-io-parallelized"]{}

p{How long should code-inline{factorials} take to run?}
p{It took two seconds to run code-inline{factorial} once, so it
should also take two seconds to run in parallel, shouldnt it?:}

snippet[#:name "factorial-io-parallelized-run"]{}

p{If you ran the code above, you probably felt your laptop heat up a
bit. You might have also found that the code em{didnt} take two seconds
 it took longer.}

p{This is different to our code-inline{snooze} task, which always took
|snooze-time| seconds regardless of whether we ran one, ten or a
thousand in parallel.}

p{Why would that be?}

p{To answer that question, we need to take a closer look at our computers.}

headline-link{The processor beneath}

p{Despite what we might wish, our laptops are not magical boxes with
unlimited compute power:
theyre made of physical devices, and those devices have limits. A
computer has a limited number of processors, each of which can compute
one thing at once.}

p{We can check that number in Scala by taking a look at the
code-inline{Runtime} object:}

snippet[#:name "runtime-available-processors"]{}

p{My humble laptop has eight processors: it can execute a maximum of
eight computations at once. Even if I ask it to calculate ten
factorials in parallel, it wont actually do so.}

p{You might rightly wonder: why didnt we hit this limit for the
code-inline{snooze} task? This is because the code-inline{Thread.sleep}
operation in code-inline{snooze} didnt occupy a processor as it
ran.}

todo{Go into more detail about the OS?}

headline-link{Setting our limits}

p{We can take a closer look at how our factorial task is getting run
by timing each task:}

snippet[#:name "factorial-io-parallelized-time-each"]{}

p{This gives a string description for each of the
twenty tasks corresponding to how long the task took to run.}

p{Lets run it to check the times:}

snippet[#:name "factorial-io-parallelized-time-each-run"]{}

p{Thats strange! The code-inline{factorials} task may have taken
six seconds in total, but shouldnt each task have taken two seconds?}

p{Instead, we see times of anywhere between two and six.}

p{This is because all tasks are fired off at the same time, but our
processors switch between them as they run. A processor might start
computing a task, but put it on hold in order to compute a different
one, switching back to it at a later time.

Tasks are started, halted and restarted as they all compete for
processors.}

note[#:title "By the way"]{
If youre confused, take a moment to poke at the code. Why not insert
an code-inline{IO.println} at the end of code-inline{factorial} and
see when its printed out?}

p{The more tasks we parallelize, the more switching each processor
has to do. This is problematic for a few reasons:}

items{
item{Switching between tasks is expensive: a processor has to unload all
the information associated with the computation its about to pause,
and reload the information for the next.}
item{A paused computation still has resources hanging around. Our
code-inline{factorial} task doesnt need too much memory, but we 
could easily write a task that used a lot of heap space. Running too
many memory-intensive computations would give us
code-inline{OutOfMemoryError} exceptions.}
}

p{This is why its generally much more useful to limit the number of
tasks that can be run in parallel.}

p{We can do this using keyword{thread pools}.}

headline-link{Bounded and unbounded thread pools}

p{Our current limit, or lack thereof, is specified by our
thread pool. The cats-effect code-inline{IORuntime} has a
thread pool under the hood. The code-inline{basicRuntime} weve been
using has an em{unbounded} thread pool: it can execute an unlimited
number of tasks in parallel.}

p{In our code-inline{Threading} setup code, we declared another
code-inline{boundedRuntime} function. Lets give it a spin.}

p{We can pick a bound of two for ten code-inline{factorial} tasks:}

snippet[#:name "factorial-bounded-threadpool"]{}

p{Its much slower than before  only two tasks are run at once.}

p{How long does each task take to run? We can check with code-inline{timedFactorials}:}

snippet[#:name "factorial-time-bounded-threadpool"]{}

p{Unlike the previous unbounded thread pool, each task takes two
seconds  the tasks might be em{scheduled} at once, but theyre fired
off over time, once a keyword{thread} is free to compute them.}

note[#:title "Definition"]{
A keyword{thread pool} controls the number of tasks that can be
executed in parallel. Each task is run using a keyword{thread} in the
pool. A keyword{bounded thread pool} has a limited number of threads,
so limits the number of parallel tasks.}

p{What if we set the bound higher?}

snippet[#:name "factorial-time-bounded-threadpool-20"]{}

p{The code-inline{timedFactorials} task behaves as if were running
on the code-inline{basicRuntime}: its as if we didnt have
a bound at all.}

p{If you think about it, this makes sense: if we have more
computations running than the number of processors, each processor
will still need to switch between them. Our code-inline{factorial}
tasks will end up being paused by the processor and taking longer.}


p{So far, weve experimented with bounds of two and twenty. Having
two tasks run at once gets around our thread-switching 
problem: each processor can focus on a single task. But having only
two isnt too useful: most of our processors arent doing anything
Scala-related.}

p{The best limit probably corresponds to the number of
processors. Lets check:}

snippet[#:name "factorial-time-bounded-threadpool-available"]{}

p{Sure enough, each task takes two seconds.}

note[#:title "Key takeaway"]{A thread pool bounded at the number of
available processors makes the best use of your computer.}

headline-link{Snoozing}

p{A thread pool bounded at code-inline{numProcessors} is the
best option for the code-inline{factorial} task. But what about
code-inline{snooze}?}

p{We know that we can run an unlimited number of parallel
code-inline{snooze} tasks using the code-inline{basicRuntime}  this
had an unbounded thread pool. What about our code-inline{boundedRuntime}?}

p{Lets test it by running more tasks than processors. We can
construct an code-inline{IO} that runs ten tasks in parallel:}

snippet[#:name "snooze-10"]{}

p{Lets try running this our bounded thread pool. Each task takes
|snooze-time| seconds. How long should the code-inline{tenSnoozes}
task take?}

snippet[#:name "snooze-10-run"]{}

p{Not |snooze-time|, but em{|snooze-time-double|} seconds.}

p{The code-inline{Thread.sleep} call might not hog a processor, but
it does hog a thread in our pool.}

p{By choosing a bounded thread pool for our code-inline{tenSnoozes} task,
we cause it to take longer.  If we want to get our task to complete as
fast as possible, it seems better to have an unbounded pool.}