前言
最近正好做到了针对安卓某 APP 内置浏览器抓包相关的东西,顺手记录一下。
目前已有的抓包解决方式可以参考 r0capture 的这个图:
https://github.com/r0ysue/r0capture/blob/main/pic/summary2.jpg
其中最为方便的是 HOOK 抓包,不需要配置或导入证书即可获得数据。网络上现有的传统的解决方案为寻找 SSL 库里的 SSL_read 和 SSL_write 函数进行 hook 抓包。这种方法确实可以实现通杀且可以抓到数据,即使是在集成了自定义 SSL 库的内置浏览器中 API 定位也相对简单,但还是存在以下缺陷:
- 数据包碎片化:由于 hook 的位置较为底层,网络通信较为紊乱,这种方式抓取的流量一般需要借助流量分析软件(如 wireshark)进行进一步分析,在 HTTP2.0 协议的加成下,多个会话的流量占用同一个 TCP 连接进行传输,这使得读写流量的拼接更为复杂繁琐,这对于需要实时批量获取数据的场景是致命的。
- 无法实现篡改:由于上面的数据包碎片化问题,在攻击者视角下,运用该方法 hook 获取发送的数据包时无法实现实时的数据包篡改和伪造。
针对上述场景,针对内置浏览器使用的 chromium 内核进行了粗略的分析,考虑在浏览器较为上层的位置截取完整的包数据。
Chromium 网络栈
这里首先需要拿出一个经典的八股面试题:在浏览器输入URL 地址回车后,发生了什么?
我们并不关心无聊的八股答案,这里我们主要关注的是 Chromium 具体如何发送一个 HTTP 请求。
这里有一篇文章,懒得复读了:
https://www.cnblogs.com/bigben0123/p/12650519.html
虽然这篇文章完全忽略了对 Cache 相关的操作,但是正好我们也不关心那部分内容。
我们的目的是抓到完整的、全量的请求,所以我们需要找一个请求过程中符合以下条件的时机:
前者会导致我们无法获取完整的请求,而后者会导致请求已经根据请求协议被分流,我们只能拿到某种特定协议下的请求包而丢失其他请求。
由于确认第一点十分麻烦,所以我们期望找到的是满足第二点的最下层位置,即请求被交给传输流的前一刻。
经过几小时的坐牢我定位到了类 HttpNetworkTransaction。
https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_network_transaction.cc?q=HttpNetworkTransaction&ss=chromium%2Fchromium
这里我们比较关注的是 HttpNetworkTransaction 发送请求的流程:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
|
int HttpNetworkTransaction::DoLoop(int result) {
DCHECK(next_state_ != STATE_NONE);
int rv = result;
do {
State state = next_state_;
next_state_ = STATE_NONE;
switch (state) {
case STATE_NOTIFY_BEFORE_CREATE_STREAM:
DCHECK_EQ(OK, rv);
rv = DoNotifyBeforeCreateStream();
break;
case STATE_CREATE_STREAM:
DCHECK_EQ(OK, rv);
rv = DoCreateStream();
break;
case STATE_CREATE_STREAM_COMPLETE:
rv = DoCreateStreamComplete(rv);
break;
case STATE_INIT_STREAM:
DCHECK_EQ(OK, rv);
rv = DoInitStream();
break;
case STATE_INIT_STREAM_COMPLETE:
rv = DoInitStreamComplete(rv);
break;
case STATE_CONNECTED_CALLBACK:
rv = DoConnectedCallback();
break;
case STATE_CONNECTED_CALLBACK_COMPLETE:
rv = DoConnectedCallbackComplete(rv);
break;
case STATE_GENERATE_PROXY_AUTH_TOKEN:
DCHECK_EQ(OK, rv);
rv = DoGenerateProxyAuthToken();
break;
case STATE_GENERATE_PROXY_AUTH_TOKEN_COMPLETE:
rv = DoGenerateProxyAuthTokenComplete(rv);
break;
case STATE_GENERATE_SERVER_AUTH_TOKEN:
DCHECK_EQ(OK, rv);
rv = DoGenerateServerAuthToken();
break;
case STATE_GENERATE_SERVER_AUTH_TOKEN_COMPLETE:
rv = DoGenerateServerAuthTokenComplete(rv);
break;
case STATE_INIT_REQUEST_BODY:
DCHECK_EQ(OK, rv);
rv = DoInitRequestBody();
break;
case STATE_INIT_REQUEST_BODY_COMPLETE:
rv = DoInitRequestBodyComplete(rv);
break;
case STATE_BUILD_REQUEST:
DCHECK_EQ(OK, rv);
net_log_.BeginEvent(NetLogEventType::HTTP_TRANSACTION_SEND_REQUEST);
rv = DoBuildRequest();
break;
case STATE_BUILD_REQUEST_COMPLETE:
rv = DoBuildRequestComplete(rv);
break;
case STATE_SEND_REQUEST:
DCHECK_EQ(OK, rv);
rv = DoSendRequest();
break;
case STATE_SEND_REQUEST_COMPLETE:
rv = DoSendRequestComplete(rv);
net_log_.EndEventWithNetErrorCode(
NetLogEventType::HTTP_TRANSACTION_SEND_REQUEST, rv);
break;
case STATE_READ_HEADERS:
DCHECK_EQ(OK, rv);
net_log_.BeginEvent(NetLogEventType::HTTP_TRANSACTION_READ_HEADERS);
rv = DoReadHeaders();
break;
case STATE_READ_HEADERS_COMPLETE:
rv = DoReadHeadersComplete(rv);
net_log_.EndEventWithNetErrorCode(
NetLogEventType::HTTP_TRANSACTION_READ_HEADERS, rv);
break;
case STATE_READ_BODY:
DCHECK_EQ(OK, rv);
net_log_.BeginEvent(NetLogEventType::HTTP_TRANSACTION_READ_BODY);
rv = DoReadBody();
break;
case STATE_READ_BODY_COMPLETE:
rv = DoReadBodyComplete(rv);
net_log_.EndEventWithNetErrorCode(
NetLogEventType::HTTP_TRANSACTION_READ_BODY, rv);
break;
case STATE_DRAIN_BODY_FOR_AUTH_RESTART:
DCHECK_EQ(OK, rv);
net_log_.BeginEvent(
NetLogEventType::HTTP_TRANSACTION_DRAIN_BODY_FOR_AUTH_RESTART);
rv = DoDrainBodyForAuthRestart();
break;
case STATE_DRAIN_BODY_FOR_AUTH_RESTART_COMPLETE:
rv = DoDrainBodyForAuthRestartComplete(rv);
net_log_.EndEventWithNetErrorCode(
NetLogEventType::HTTP_TRANSACTION_DRAIN_BODY_FOR_AUTH_RESTART, rv);
break;
default:
NOTREACHED_IN_MIGRATION() << "bad state";
rv = ERR_FAILED;
break;
}
} while (rv != ERR_IO_PENDING && next_state_ != STATE_NONE);
return rv;
}
|
这个流程根据状态包含了所有请求的流程,在不发生错误或重连的情况下你可以认为流程是顺序进行的(其实并不。
RequestBody 获取
由于我们的目标是一个浏览器而不是一个网络库,在实现上 chromium 也没有必要在某个地方存储明文的整个网络请求,更多的是将 Headers、Body 等数据分散地组合在结构体里,将这些东西交给具体的传输流进行组装和传输。
在这种情况下,RequestBody 是我们最容易获取的字段。和其他我们需要的字段相比,RequestBody 直接来自前端应用。如果一个请求由 Chromium 进行创建,那么他一定是一个不含 Body 的 GET 请求,一般用于请求资源,这类资源一定不含邮 RequestBody。因此,当 RequestBody 存在的场合,该请求一定由前端主动发出,也就是说 RequestBody 的生成一定在 Chromium 之外,在进入 Chromium 时已经被完整传入。
在 Chromium 中,具体表现为存在一个类 UploadDataStream 在请求被创建开始时即被传入,一直层层下传到底层传输。
很遗憾,在我认为应该有 Body 信息的地方没有该信息。
1
2
3
4
5
6
7
8
9
10
|
int HttpNetworkTransaction::DoInitRequestBody() {
next_state_ = STATE_INIT_REQUEST_BODY_COMPLETE;
int rv = OK;
if (request_->upload_data_stream)
rv = request_->upload_data_stream->Init(
base::BindOnce(&HttpNetworkTransaction::OnIOComplete,
base::Unretained(this)),
net_log_);
return rv;
}
|
已知该 upload_data_stream 在传入 HttpNetworkTransaction 时就已经携带了我们想要的信息,或者我们想要的信息可以通过该结构读取。在实现具体的 hook 时,主动从一个 stream 里读数据是一个糟糕的选择:首先需要进一步分析这个数据流的工作方式,其次不能保证这个读行为是否会把数据取走导致 chromium 自己获取不到这部分数据。
因此,我们转而关注在传输流发送请求的时候如何使用这个 data_stream。
虽然前文说根据协议不同会走不同的传输流,但是他们对这个结构体的操作一定是相同的。这里选用最基础的 HTTP/1.1 协议进行分析。随机选取一个受害者类 HttpStreamParser。https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_stream_parser.cc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
|
int HttpStreamParser::DoLoop(int result) {
do {
DCHECK_NE(ERR_IO_PENDING, result);
DCHECK_NE(STATE_DONE, io_state_);
DCHECK_NE(STATE_NONE, io_state_);
State state = io_state_;
io_state_ = STATE_NONE;
switch (state) {
case STATE_SEND_HEADERS:
DCHECK_EQ(OK, result);
result = DoSendHeaders();
DCHECK_NE(STATE_NONE, io_state_);
break;
case STATE_SEND_HEADERS_COMPLETE:
result = DoSendHeadersComplete(result);
DCHECK_NE(STATE_NONE, io_state_);
break;
case STATE_SEND_BODY:
DCHECK_EQ(OK, result);
result = DoSendBody();
DCHECK_NE(STATE_NONE, io_state_);
break;
case STATE_SEND_BODY_COMPLETE:
result = DoSendBodyComplete(result);
DCHECK_NE(STATE_NONE, io_state_);
break;
case STATE_SEND_REQUEST_READ_BODY_COMPLETE:
result = DoSendRequestReadBodyComplete(result);
DCHECK_NE(STATE_NONE, io_state_);
break;
case STATE_SEND_REQUEST_COMPLETE:
result = DoSendRequestComplete(result);
break;
case STATE_READ_HEADERS:
net_log_.BeginEvent(NetLogEventType::HTTP_STREAM_PARSER_READ_HEADERS);
DCHECK_GE(result, 0);
result = DoReadHeaders();
break;
case STATE_READ_HEADERS_COMPLETE:
result = DoReadHeadersComplete(result);
net_log_.EndEventWithNetErrorCode(
NetLogEventType::HTTP_STREAM_PARSER_READ_HEADERS, result);
break;
case STATE_READ_BODY:
DCHECK_GE(result, 0);
result = DoReadBody();
break;
case STATE_READ_BODY_COMPLETE:
result = DoReadBodyComplete(result);
break;
default:
NOTREACHED_IN_MIGRATION();
break;
}
} while (result != ERR_IO_PENDING &&
(io_state_ != STATE_DONE && io_state_ != STATE_NONE));
return result;
}
|
关心的 SendBody
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
int HttpStreamParser::DoSendBody() {
if (request_body_send_buf_->BytesRemaining() > 0) {
io_state_ = STATE_SEND_BODY_COMPLETE;
return stream_socket_->Write(
request_body_send_buf_.get(), request_body_send_buf_->BytesRemaining(),
io_callback_, NetworkTrafficAnnotationTag(traffic_annotation_));
}
if (upload_data_stream_->is_chunked() && sent_last_chunk_) {
// Finished sending the request.
io_state_ = STATE_SEND_REQUEST_COMPLETE;
return OK;
}
request_body_read_buf_->Clear();
io_state_ = STATE_SEND_REQUEST_READ_BODY_COMPLETE;
return upload_data_stream_->Read(
request_body_read_buf_.get(), request_body_read_buf_->capacity(),
base::BindOnce(&HttpStreamParser::OnIOComplete,
weak_ptr_factory_.GetWeakPtr()));
}
|
可以发现当一个传输流准备发送 RequestBody 时,会先调用 UploadDataStream::Read 读取 RequestBody 再进行上传。这里产生了我们的第一个受害者函数:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
int UploadDataStream::Read(IOBuffer* buf,
int buf_len,
CompletionOnceCallback callback) {
DCHECK(!callback.is_null() || IsInMemory());
DCHECK(initialized_successfully_);
DCHECK_GT(buf_len, 0);
net_log_.BeginEvent(NetLogEventType::UPLOAD_DATA_STREAM_READ,
[&] { return CreateReadInfoParams(current_position_); });
int result = 0;
if (!is_eof_)
result = ReadInternal(buf, buf_len);
if (result == ERR_IO_PENDING) {
DCHECK(!IsInMemory());
callback_ = std::move(callback);
} else {
OnReadCompleted(result);
}
return result;
}
|
hook 该函数,当该函数被调用时读取 IOBuffer 即可拿到 RequestBody。
还是从 DoLoop 开始分析:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
|
int HttpNetworkTransaction::DoBuildRequest() {
next_state_ = STATE_BUILD_REQUEST_COMPLETE;
headers_valid_ = false;
// This is constructed lazily (instead of within our Start method), so that
// we have proxy info available.
if (request_headers_.IsEmpty()) {
bool using_http_proxy_without_tunnel = UsingHttpProxyWithoutTunnel();
return BuildRequestHeaders(using_http_proxy_without_tunnel);
}
return OK;
}
int HttpNetworkTransaction::BuildRequestHeaders(
bool using_http_proxy_without_tunnel) {
request_headers_.SetHeader(HttpRequestHeaders::kHost,
GetHostAndOptionalPort(request_->url));
// For compat with HTTP/1.0 servers and proxies:
if (using_http_proxy_without_tunnel) {
request_headers_.SetHeader(HttpRequestHeaders::kProxyConnection,
"keep-alive");
} else {
request_headers_.SetHeader(HttpRequestHeaders::kConnection, "keep-alive");
}
// Add a content length header?
if (request_->upload_data_stream) {
if (request_->upload_data_stream->is_chunked()) {
request_headers_.SetHeader(
HttpRequestHeaders::kTransferEncoding, "chunked");
} else {
request_headers_.SetHeader(
HttpRequestHeaders::kContentLength,
base::NumberToString(request_->upload_data_stream->size()));
}
} else if (request_->method == "POST" || request_->method == "PUT") {
// An empty POST/PUT request still needs a content length. As for HEAD,
// IE and Safari also add a content length header. Presumably it is to
// support sending a HEAD request to an URL that only expects to be sent a
// POST or some other method that normally would have a message body.
// Firefox (40.0) does not send the header, and RFC 7230 & 7231
// specify that it should not be sent due to undefined behavior.
request_headers_.SetHeader(HttpRequestHeaders::kContentLength, "0");
}
// Honor load flags that impact proxy caches.
if (request_->load_flags & LOAD_BYPASS_CACHE) {
request_headers_.SetHeader(HttpRequestHeaders::kPragma, "no-cache");
request_headers_.SetHeader(HttpRequestHeaders::kCacheControl, "no-cache");
} else if (request_->load_flags & LOAD_VALIDATE_CACHE) {
request_headers_.SetHeader(HttpRequestHeaders::kCacheControl, "max-age=0");
}
if (ShouldApplyProxyAuth() && HaveAuth(HttpAuth::AUTH_PROXY))
auth_controllers_[HttpAuth::AUTH_PROXY]->AddAuthorizationHeader(
&request_headers_);
if (ShouldApplyServerAuth() && HaveAuth(HttpAuth::AUTH_SERVER))
auth_controllers_[HttpAuth::AUTH_SERVER]->AddAuthorizationHeader(
&request_headers_);
if (net::features::kIpPrivacyAddHeaderToProxiedRequests.Get() &&
proxy_info_.is_for_ip_protection()) {
CHECK(!proxy_info_.is_direct() ||
net::features::kIpPrivacyDirectOnly.Get());
if (!proxy_info_.is_direct()) {
request_headers_.SetHeader("IP-Protection", "1");
}
}
request_headers_.MergeFrom(request_->extra_headers);
if (modify_headers_callbacks_) {
modify_headers_callbacks_.Run(&request_headers_);
}
response_.did_use_http_auth =
request_headers_.HasHeader(HttpRequestHeaders::kAuthorization) ||
request_headers_.HasHeader(HttpRequestHeaders::kProxyAuthorization);
return OK;
}
|
其实这里很明显了,我们的下一个受害者函数是 HttpRequestHeaders::SetHeader。
1
2
3
4
|
void HttpRequestHeaders::SetHeader(std::string_view key,
std::string_view value) {
SetHeader(key, std::string(value));
}
|
Request 组装
在前文我们获取了所有的 Request 信息,但是这两个信息获取的位置可以说是毫不相关,甚至所有的 RequestHeaders 都是碎片。我们需要一个方法把这些信息组装起来。
这里有一个好东西:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
int HttpNetworkTransaction::Start(const HttpRequestInfo* request_info,
CompletionOnceCallback callback,
const NetLogWithSource& net_log) {
if (request_info->load_flags & LOAD_ONLY_FROM_CACHE)
return ERR_CACHE_MISS;
DCHECK(request_info->traffic_annotation.is_valid());
DCHECK(request_info->IsConsistent());
net_log_ = net_log;
request_ = request_info;
url_ = request_->url;
network_anonymization_key_ = request_->network_anonymization_key;
#if BUILDFLAG(ENABLE_REPORTING)
// Store values for later use in NEL report generation.
request_method_ = request_->method;
request_->extra_headers.GetHeader(HttpRequestHeaders::kReferer,
&request_referrer_);
request_->extra_headers.GetHeader(HttpRequestHeaders::kUserAgent,
&request_user_agent_);
request_reporting_upload_depth_ = request_->reporting_upload_depth;
start_timeticks_ = base::TimeTicks::Now();
#endif // BUILDFLAG(ENABLE_REPORTING)
if (request_->idempotency == IDEMPOTENT ||
(request_->idempotency == DEFAULT_IDEMPOTENCY &&
HttpUtil::IsMethodSafe(request_info->method))) {
can_send_early_data_ = true;
}
if (request_->load_flags & LOAD_PREFETCH) {
response_.unused_since_prefetch = true;
}
if (request_->load_flags & LOAD_RESTRICTED_PREFETCH) {
DCHECK(response_.unused_since_prefetch);
response_.restricted_prefetch = true;
}
next_state_ = STATE_NOTIFY_BEFORE_CREATE_STREAM;
int rv = DoLoop(OK);
if (rv == ERR_IO_PENDING)
callback_ = std::move(callback);
// This always returns ERR_IO_PENDING because DoCreateStream() does, but
// GenerateNetworkErrorLoggingReportIfError() should be called here if any
// other net::Error can be returned.
DCHECK_EQ(rv, ERR_IO_PENDING);
return rv;
}
|
众所周知,在实际的类方法调用时,会隐藏地传入第一个参数 this,表示当前的类对象地址。而 HttpNetworkTransaction 类中将上述两个类实例当作了成员变量,我们只需要在该类被创建时建立上述两个类实例到 HttpNetworkTransaction 的连接映射,即可将他们组合到一起。这个类也可以帮助我们组合后续的 Response。
请求的 url 和请求的 method 也在此处入参获取。
Request 是散装的,Response 总应该是完整读取的吧?
其实不然,Chromium 在读取完 ResponseHeaders 之后,才会通知上层,由上层再主动下来读取 ResponseBody。
但是确实,我们可以一次性获得完整的 ResponseHeader。
1
2
3
4
|
int HttpNetworkTransaction::DoReadHeaders() {
next_state_ = STATE_READ_HEADERS_COMPLETE;
return stream_->ReadResponseHeaders(io_callback_);
}
|
stream 随使用的请求协议而变化,这不是很好,我们不如关心 ResponseHeader 的解析逻辑。
在 HttpNetworkTransaction 类成员里,我们看到了类成员 HttpResponseInfo 的成员 HttpResponseHeaders 的构造函数。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
|
HttpResponseHeaders::HttpResponseHeaders(
BuilderPassKey,
HttpVersion version,
std::string_view status,
base::span<const std::pair<std::string_view, std::string_view>> headers)
: http_version_(version) {
// This must match the behaviour of Parse(). We don't use Parse() because
// avoiding the overhead of parsing is the point of this constructor.
std::string formatted_status;
formatted_status.reserve(status.size() + 1); // ParseStatus() may add a space
response_code_ = ParseStatus(status, formatted_status);
// First calculate how big the output will be so that we can allocate the
// right amount of memory.
size_t expected_size = 8; // "HTTP/x.x"
expected_size += formatted_status.size();
expected_size += 1; // "\\0"
size_t expected_parsed_size = 0;
// Track which headers (by index) have a comma in the value. Since bools are
// only 1 byte, we can afford to put 100 of them on the stack and avoid
// allocating more memory 99.9% of the time.
absl::InlinedVector<bool, 100> header_contains_comma;
for (const auto& [key, value] : headers) {
expected_size += key.size();
expected_size += 1; // ":"
expected_size += value.size();
expected_size += 1; // "\\0"
// It's okay if we over-estimate the size of `parsed_`, so treat all ','
// characters as if they might split the value to avoid parsing the value
// carefully here.
const size_t comma_count = base::ranges::count(value, ',') + 1;
expected_parsed_size += comma_count;
header_contains_comma.push_back(comma_count);
}
expected_size += 1; // "\\0"
raw_headers_.reserve(expected_size);
parsed_.reserve(expected_parsed_size);
// Now fill in the output.
const uint16_t major = version.major_value();
const uint16_t minor = version.minor_value();
CHECK_LE(major, 9);
CHECK_LE(minor, 9);
raw_headers_.append("HTTP/");
raw_headers_.push_back('0' + major);
raw_headers_.push_back('.');
raw_headers_.push_back('0' + minor);
raw_headers_.append(formatted_status);
raw_headers_.push_back('\\0');
// It is vital that `raw_headers_` iterators are not invalidated after this
// point.
const char* const data_at_start = raw_headers_.data();
size_t index = 0;
for (const auto& [key, value] : headers) {
CheckDoesNotHaveEmbeddedNulls(key);
CheckDoesNotHaveEmbeddedNulls(value);
// Because std::string iterators are random-access, end() has to point to
// the position where the next character will be appended.
const auto name_begin = raw_headers_.cend();
raw_headers_.append(key);
const auto name_end = raw_headers_.cend();
raw_headers_.push_back(':');
auto values_begin = raw_headers_.cend();
raw_headers_.append(value);
auto values_end = raw_headers_.cend();
raw_headers_.push_back('\\0');
// The HTTP/2 standard disallows header values starting or ending with
// whitespace (RFC 9113 8.2.1). Hopefully the same is also true of HTTP/3.
// TODO(crbug.com/40282642): Validate that our implementations
// actually enforce this constraint and change this TrimLWS() to a DCHECK.
HttpUtil::TrimLWS(&values_begin, &values_end);
AddHeader(name_begin, name_end, values_begin, values_end,
header_contains_comma[index] ? ContainsCommas::kYes
: ContainsCommas::kNo);
++index;
}
raw_headers_.push_back('\\0');
CHECK_EQ(expected_size, raw_headers_.size());
CHECK_EQ(data_at_start, raw_headers_.data());
DCHECK_LE(parsed_.size(), expected_parsed_size);
DCHECK_EQ('\\0', raw_headers_[raw_headers_.size() - 2]);
DCHECK_EQ('\\0', raw_headers_[raw_headers_.size() - 1]);
}
|
可以发现该类在构造时把整个 header 传入了 raw_headers_,这是一个 string 变量,也就是说我们可以在 Headers 被读取完之后的任意时机直接从 HttpNetworkTransaction 的 this 地址找到这些信息。
ResponseBody 获取
这里的难点本来在于如何确定 ResponseBody 已经被全部读取完成,但是 chromium 的垃圾回收帮我们解决了这一难题:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
int HttpNetworkTransaction::DoReadBody() {
DCHECK(read_buf_.get());
DCHECK_GT(read_buf_len_, 0);
DCHECK(stream_ != nullptr);
next_state_ = STATE_READ_BODY_COMPLETE;
return stream_->ReadResponseBody(
read_buf_.get(), read_buf_len_, io_callback_);
}
int HttpNetworkTransaction::DoReadBodyComplete(int result) {
// We are done with the Read call.
bool done = false;
if (result <= 0) {
DCHECK_NE(ERR_IO_PENDING, result);
done = true;
}
// Clean up connection if we are done.
if (done) {
// Note: Just because IsResponseBodyComplete is true, we're not
// necessarily "done". We're only "done" when it is the last
// read on this HttpNetworkTransaction, which will be signified
// by a zero-length read.
// TODO(mbelshe): The keep-alive property is really a property of
// the stream. No need to compute it here just to pass back
// to the stream's Close function.
bool keep_alive =
stream_->IsResponseBodyComplete() && stream_->CanReuseConnection();
stream_->Close(!keep_alive);
// Note: we don't reset the stream here. We've closed it, but we still
// need it around so that callers can call methods such as
// GetUploadProgress() and have them be meaningful.
// TODO(mbelshe): This means we closed the stream here, and we close it
// again in ~HttpNetworkTransaction. Clean that up.
// The next Read call will return 0 (EOF).
// This transaction was successful. If it had been retried because of an
// error with an alternative service, mark that alternative service broken.
if (!enable_alternative_services_ &&
retried_alternative_service_.protocol != kProtoUnknown) {
HistogramBrokenAlternateProtocolLocation(
BROKEN_ALTERNATE_PROTOCOL_LOCATION_HTTP_NETWORK_TRANSACTION);
session_->http_server_properties()->MarkAlternativeServiceBroken(
retried_alternative_service_, network_anonymization_key_);
}
#if BUILDFLAG(ENABLE_REPORTING)
GenerateNetworkErrorLoggingReport(result);
#endif // BUILDFLAG(ENABLE_REPORTING)
}
// Clear these to avoid leaving around old state.
read_buf_ = nullptr;
read_buf_len_ = 0;
return result;
}
|
body 被读入 read_buf ,在读入结束后该指针会被清空。
也就是说,我们 hook HttpNetworkTransaction::DoReadBodyComplete,在该函数进入之前存下 read_buf 指针,在该函数退出之后若该指针被清空,则说明读取已完成,这时候我们可以从先前存下的指针位置读取完整的 Body。
为了减少工作量,在这个阶段所有的 ResponseHeader 一定已经被全部读取完成,我们可以在该函数的 hook 逻辑中顺手打印。
定位和实现
到这里我们的实现思路已经很明显了:
- Hook 函数 HttpNetworkTransaction::Start
- 从入参获取 url 和 method。
- 建立 HttpRequestHeaders -> HttpNetworkTransaction 映射。
- 建立 UploadDataStream -> HttpNetworkTransaction 映射。
- Hook 函数 UploadDataStream::Read
- 读取入参 IOBuffer,获取 RequestBody。
- 根据前面建立的映射,将 RequestBody 关联到 HttpNetworkTransaction 。
- Hook 函数 HttpRequestHeaders::SetHeader
- 读取入参 key 和 value。
- 将本次添加的 Header 关联到 HttpNetworkTransaction。
- Hook 函数 HttpNetworkTransaction::DoReadBodyComplete
- 判断请求是否完成,若未完成则继续接收。
- 根据调用前缓存的 read_buf_ 获取 ResponseBody
- 根据 content encoding 进行解压(如 gzip)
- 寻找 HttpNetworkTransaction->responseinfo->responseheaders->raw_headers_ 获取完整的 RequestHeaders
具体上述类成员偏移和函数偏移确定方式不公开,相信各位读者都有自己的理解。
项目代码:如果你想要,那你就得自己来写。