~mrlee/www.kamelasa.dev

ref: ae9510002ff41be75a94f289a248a4970d089200 www.kamelasa.dev/posts/ruby-sorcery-ractor-2.poly.pm -rw-r--r-- 10.1 KiB
ae951000Lee Meichin Fix syntax error a month ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
#lang pollen

define-meta[title]{Ruby Sorcery Part 2: Ractor, Chapter 2}
define-meta[date]{2021-10-09}
define-meta[published #t]
define-meta[category]{ruby}

In the previous chapter of this excurusion into Ractor^[1], the concept of the actor model was introduced and a toy TCP server was created. It was a naive implementation that created new Ractors for every TCP connection made to the server, and it looked like this:

codeblock['ruby]{
  require 'socket'

  tcp_server = Ractor.new do
    server = TCPServer.new(1337)

    loop do
      Ractor.new(server.accept) do |client|
        loop do
          input = client.gets
          client.puts(input.upcase)
        end
      end
    end
  end
}

While simple, this isn't something you'd call 'production ready'. There is at least one notable shortcoming: see if you can guess what it is, it will be addressed in a later chapter. Meanwhile, it's a good time to turn this toy example into something more serious.

aside{Hint: what does or does not happen when the client's connection is lost or terminated?}

◊h2{Building an HTTP server with Ractor}

In the beginning, the HyperText Transfer Protocol was (and still is!) a purely text-based specification layered on top of TCP. It has grown a hell of a lot of complexity over the years, in order to power more complex interactions over the web, but it is still easier to build a server that understands HTTP (even just HTTP 1.1) than it is to build the browser that handles all of the rendering.

So, it sounds like a good time to turn this toy TCP server into a basic HTTP server, using Ractors, and by the end of this chapter it should be possible for the the server to understand a simple request:

◊codeblock['text]{
  GET /hello-world HTTP/1.1
  Host: localhost:1337
}

There is, of course, a lot more to serving HTTP than this. For a start, there's no HTTP response! And then there are the other HTTP methods. Much of that will be covered in the following chapters.

◊h2{Parsing the request}

The anatomy of an HTTP request is divided, essentially, into three parts:

◊codeblock['text]{
  GET /hello-world HTTP/1.1        | 1. Method, location/target, HTTP version
  Host: localhost:1337             | 2. Headers (host, content-type, accept,
  Content-Type: application/json       caching, user-agent, etc.)
                                   | Empty line to separate headers and body
  { data: "Hello, world!" }        | 3. Body
                                   | Final empty line to denote end of request
}

The first line states the request method (i.e. code{GET}, code{POST}, etc.) and the target of the request, which will often be a relative path on the server but can also be a full URL.

aside{Note that there is also a concept of HTTP 'trailers', which are headers that appear em{after} the body^[2]. These are only used for certain kinds of chunked requests and are way out of scope for this post.}

What follows is a list of headers, which are key/value pairs used to provide extra information about the request. For the sake of simplicity, most of these will be ignored, and there are em{many} of them.

Finally, there is a place for the body of the request. This is optional, but it must have an empty line both before and afterwards if it is present. A typical request body may contain URLEncoded form data, which is how your typical HTML forms work, but there is not much of a restriction provided that the code{Content Type} header describes the format of the payload, e.g if it's JSON, XML, or perhaps even something like an image or a video.

In this chapter, the main focus is on the first section, and some of the second. And since the purpose of the post is to demonstrate Ractor, and because in the Actor model, everything is an actor... the thing that parses the request will be an Actor too.

◊aside{This is also a good time to use some of that pattern matching knowledge from the first part of this series.◊^[3]}

Something like this should do the trick, and provide a foundation to build on.

◊codeblock['ruby]{
  require 'strscan'

  HttpRequest = Struct.new(
    :method, :location, :version, :host, :content_type,
    keyword_init: true
  )
  
  HttpRequestParser = Ractor.new do
    while raw_req = StringScanner.new(receive)
      http_req = HttpRequest.new

      case raw_req.scan_until(/\n/).strip.split(" ")
      in "GET" => method, location, "HTTP/1.1" => http_version
        http_req.method = method
        http_req.location = location
        http_req.version = http_version
      end

      raw_req.scan_until(/(\r\n){2}/).split("\n").map(&:strip).each do |header|
        case header.split(": ")
        in "Content-Type", content_type
          http_req.content_type = content_type
        in "Host", host
          http_req.host = host
        else
          next
        end
      end

      # `move` the object as this Ractor no longer needs ownership
      # the Ractor that calls `take` will... take... ownership
      Ractor.yield(http_req, move: true)
    end
  ensure
    raw_req.terminate
  end
}

aside{Why all the code{loop}s and code{while} loops? Ractors behave a bit like Enumerators^[5], which means that if they stop yielding values or actually return a value, the Ractors close and can no longer be used.}

What we have here is a Ractor that waits for incoming HTTP request messages, and then parses them into something that the server can more easily work with by pulling out important info like the request location, the HTTP method, the content type, and the body. In this example, Ruby's pattern matching features are liberally employed to handle the parsing in some places; this is more for the sake of demonstration to show that it ◊em{can} be done, not necessarily that it always ◊em{should} be.

In any case, once the ◊code{HttpRequest} object is constructed, it is yielded so that another Ractor can use the object, and therefore it will sit in a queue (or a mailbox, in actor model parlance) until it is taken from it. As a final housekeeping step, the string scanner instance used to parse the request is terminated. It's always a good idea to clean up after yourself if the language provides you the mechanism to do so.

Going back to the functionality at hand; this basically shunts the parsing of HTTP requests into another thread, which means that the Ractors responsible for managing the TCP layer can stay responsible for that, and hand over the application-layer responsibilities to other actors/processes/Ractors.

The TCP server now requires an upgrade: it's going to read input but it can no longer work on a line-by-line basis, because a HTTP message takes up many lines. The only thing we can really depend on is that it always ends with ◊em{two} carriage returns (CR-LF characters, or '\r\n' in a string).

◊codeblock['ruby]{
  Ractor.new do
    tcp_server = TCPServer.new(1337)

    loop do
      Ractor.new(tcp_server.accept) do |client|
        HttpRequestParser.send(client.gets("\r\n\r\n"))
        request = HttpRequestParser.take
        client.puts("requested: #{request.location}")
        client.close
      end
    end
  end
}

The most significant change, here, is that the innnermost Ractor sends input over to the new HttpRequestParser Ractor. It then immediately waits for a response.

aside{This works for basic requests with no body element, but consider why it fails if a body is also supplied. Would the connection not have already closed?}

In our toy examples, this works fine, but try this with many clients at once and you will experience chaos. This is because we're using a single global ractor to parse input from any number of connections. Perhaps it shouldn't be a ractor at all, or it should work a little differently. This will be addressed in another chapter, as it becomes clear that building a concurrent HTTP server isn't as simple as it looks.

Note that this won't work with code{curl} yet, because the server isn't returning an appropriate response.

With that said, it's a good time to combine these two things, to make a functioning server:

codeblock['ruby]{
  require 'socket'
  require 'strscan'

  HttpRequest = Struct.new(
    :method, :location, :version, :host, :content_type, :headers, :body,
    keyword_init: true
  )

  HttpRequestParser = Ractor.new do
    while raw_req = StringScanner.new(receive)
      http_req = HttpRequest.new

      case raw_req.scan_until(/$/).strip.split(" ")
      in "GET" => method, location, "HTTP/1.1" => http_version
        http_req.method = method
        http_req.location = location
        http_req.version = http_version
      end

      raw_req.scan_until(/(\r\n){2}/).split("\n").map(&:strip).each do |header|
        case header.split(": ")
        in "Content-Type", content_type
          http_req.content_type = content_type
        in "Host", host
          http_req.host = host
        else
          next
        end
      end

      # `move` the object as this Ractor no longer needs ownership
      # the Ractor that calls `take` will... take... ownership
      Ractor.yield(http_req, move: true)
    end
  ensure
    raw_req.terminate
  end
  
  Ractor.new do
    tcp_server = TCPServer.new(1337)

    loop do
      Ractor.new(tcp_server.accept) do |client|
        HttpRequestParser.send(client.gets("\r\n\r\n"))
        request = HttpRequestParser.take
        client.puts("requested: #{request.location}")
        client.close
      end
    end
  end
}

Let's see it in action!

script[#:id "asciicast-qsU8HUdrJBIR7S2BUqpdk6KFU" #:src "https://asciinema.org/a/qsU8HUdrJBIR7S2BUqpdk6KFU.js" #:async "true" #:data-cols "190"]{}

It's gonna take a little bit more work to turn this into a workable HTTP server, but let's recap:

- The TCP server now knows about HTTP, even if it's just a little bit
- There is a parser for HTTP requests which can learn how to parse more of the protocol in future
- It primarily uses Ractors for communication

footnotes{
  ^[1]{<>["https://www.kamelasa.dev/posts/ruby-sorcery-ractor.html"]}
  ^[2]{<>["https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Trailer"]}
  ^[3]{<>["https://www.kamelasa.dev/posts/ruby-sorcery.html"]}
  ^[4]{<>["https://ruby-doc.com/core-3.0.0/Enumerator.html"]}
}