~jpastuszek/blog

eb2b126731fee3418351703905e739bb7c4a88a8 — Jakub Pastuszek 3 years ago 101a240
post about scipt for bittorent seach
A content/2020-08-27-search-torrent/bittorrent.jpg => content/2020-08-27-search-torrent/bittorrent.jpg +0 -0
A content/2020-08-27-search-torrent/index.md => content/2020-08-27-search-torrent/index.md +163 -0
@@ 0,0 1,163 @@
+++
title = "Rust script: search-torrent"

[taxonomies]
tags = ["rust", "script", "torren", "search"]
categories = ["programming"]

[extra]
image = "bittorrent.jpg"
image_alt = "BitTorrent Download"
image_credit = "<a href=\"https://www.flickr.com/photos/95021520@N00\">nrkbeta</a> / CC BY-SA 2.0"
+++

A simple HTML scraping script to query for magnet links on BitTorrent sites.
<!-- more -->

## Searching for BitTorrent content

Before you can download something with [BitTorrent] you need to find the `.torrent` file or [magnet] link for the content.

This searching is usually done with the help of one of the many [BitTorrent] hosting sites.
The problem with these sites is that they often require [JavaScript] to be used and I would not trust them to be safe.

## Rust command-line script

I could find some [JavaScript] [scripts on GitHub](https://github.com/JimmyLaurent/torrent-search-api) that allow searching these sites, but they would be difficult to install and run as command-line programs.

I have written a simple single file [Rust] script that can be run with the use of my [denim] scripting crate.


```rust
#!/usr/bin/env denim

/* Cargo.toml
[package]
name = "search-torrent"
version = "0.1.0"
authors = ["Anonymous"]
edition = "2018"

[dependencies]
cotton = "0.0.7"
structopt = "0.3.2"
reqwest = { version = "0.10.8", features = ["blocking"] }
url = "2.1.1"
scraper = "0.12.0"
*/

use cotton::prelude::*;
use url::Url;
use scraper::{Html, Selector};

/// Searches for torrent magnet links given a search term.
#[derive(Debug, StructOpt)]
struct Cli {
	#[structopt(flatten)]
	logging: LoggingOpt,

	#[structopt()]
	query: String,
}

fn main() -> FinalResult {
	let args = Cli::from_args();
	init_logger(&args.logging, vec![module_path!()]);

	// https://github.com/JimmyLaurent/torrent-search-api/blob/master/lib/providers/1337x.js
	let base = Url::parse("https://www.1337x.to/")?;
	let search = base.join("search/")?;

	info!("Searching for {:?} on {}", args.query, base);

	let item = Selector::parse("tbody > tr").unwrap();

	let title = Selector::parse("a:nth-child(2)").unwrap();
	let link = Selector::parse("a:nth-child(2)").unwrap();
	let time = Selector::parse(".coll-date").unwrap();
	let seeds = Selector::parse(".seeds").unwrap();
	let peers = Selector::parse(".leeches").unwrap();
	let size = Selector::parse(".size").unwrap();

	for page in 1.. {
		let resp = reqwest::blocking::get(search.join(&format!("{}/", &args.query))?.join(&format!("{}/", page))?)?;
		let body = resp.text()?;
		debug!("{}", body);

		let html = Html::parse_document(&body);

		let items = html.select(&item).collect_vec();
		if items.is_empty() {
			break;
		}

		for item in items {
			debug!("{}", item.inner_html());
			let link = item.select(&link).next().ok_or("no link found")?.value().attr("href").ok_or("no href found")?;
			let title = item.select(&title).next().ok_or("no title found")?.inner_html();
			let time = item.select(&time).next().ok_or("no time found")?.inner_html();
			let seeds = item.select(&seeds).next().ok_or("no seeds found")?.inner_html();
			let peers = item.select(&peers).next().ok_or("no peers found")?.inner_html();
			let size = item.select(&size).next().ok_or("no size found")?.inner_html();
			let size = size.splitn(2, "<").next().unwrap();

			info!("[{}, {}, seeds: {}, peers: {}] {}", time, size, seeds, peers, title);

			let desc = base.join(link)?;
			let resp = reqwest::blocking::get(desc)?;
			let body = resp.text()?;
			let html = Html::parse_document(&body);

			let links = Selector::parse("a").unwrap();
			let magnets = html
				.select(&links)
				.filter_map(|a| a.value().attr("href"))
				.filter(|href| href.starts_with("magnet:"))
				.sorted()
				.dedup();

			for magnet in magnets {
				println!("{}", magnet);
			}
		}
	}

	Ok(())
}
```

# Installation and usage

Install [denim] from cargo:

```
cargo install denim
```

Copy the content of the script to file named `search-torrent` and make it executable:

```
vim search-torrent
chmod +x search-torrent
```

Now you can run the script as-is, but if you want to see the build progress you can run it via [denim] with:

```
denim exec search-torrent -- -h
```

Once the build is complete you can perform your searches:


```
./search-torrent -v "Debian"
```

You should see a list of Debian ISO files along with their [magnet] links.

[BitTorrent]: https://en.wikipedia.org/wiki/BitTorrent
[magnet]: https://en.wikipedia.org/wiki/Magnet_URI_scheme
[JavaScript]: https://en.wikipedia.org/wiki/JavaScript
[Rust]: https://www.rust-lang.org/
[denim]: https://crates.io/crates/denim