Mining All 30,000 Firefox Extensions for Goodies & Baddies

The good, bad and the ugly in 1.2 million public files

The browser is the new hot attack surface, everyone uses one every day, and endusers can install unsanctioned extensions that siphon out all their data willy nilly. A security nightmare! But surely no one would publish malicious browser extensions - and even if they did, Google would make it easy to report them, actions reports and wouldn't slap a "featured" badge on them... right?.

On a random weekend, I decided to mine all ~30,000 Firefox extensions and sift for low hanging fruit. Heads up - I didn't find anything particularly juicy - but to combat publication bias my thought process might be interesting to you or other researchers in the future.

Plugin Composition and Scraping

Firefox plugins come in .xpi files, which are just renamed .zip files. Inside are all the JS/HTML/CSS/static assets and a manifest.json file filled with metadata, requested permissions, when to fire (e.g. only on certain sites) and what assets to load.

There's a frontend firefox extension search that queries a firefox API. Parameter tampering with &page_size=50 (the maximum) and scraping with &page=X field allows us to dump all extension's metadata, including the .xpi file URL. Yucky script: [1]. We can then extract the raw CDN .xpi URLs: [2]. Allowing us to download[3] ~30,000 extensions amassing ~20GBs compressed (or ~40GBs uncompressed) from 0_bitches-0.4.xpi through to zxpath-1.0.2.xpi. We can then unzip these to view all the raw files or grab just each extension's manifest.json file: [4]. Thus began throwing random libraries, APIs, and FOSS projects to mine for interesting things.

Permhash (Permission hash)

Permhash is a novel way of clustering similar extensions across authors by hashing the list of requested permissions e.g. activeTab or storage that Mandiant used to hunt for and cluster malicious Chrome extensions. Rather than use their Chrome/Android-focused implementation I just hacked together some jq to unpack the manifest.json's and throw them at sha256sum to get my hashes: [5]. The top six permission sets (and thus hashes) were: <no permissions declared>, storage, activeTab, activeTab,storage, <empty permission set declared>, tabs; with many smaller (<5) clusters formed which would be useful if a malicious item within the small cluster is located. In the following chart, the bottom right datapoint shows there are 9407 extensions that have a unique permhash (e.g. activeTab, tabs, storage, unlimitedStorage) and the top left shows 6789 extensions that share the same permhash (i.e. per above: <no permissions declared>).

Scatterplot of hash occurrences vs. number of extensions

Virus Scan

Why not run a virus scan on all the unwrapped files? ClamAV whilst not a leading vendor does have a static virus DB that was easy to query. Of 1.2 million files it only flagged the plugins: page_note (since removed) and pagenote as having Html.Exploit.CVE_2014_1800-1 which is an exploit for Internet Explorer 8-11 and not firefox so I deemed them false positives. Scanning the files with Win 11 base Defender turned up nothing either.

YARA Scan

AV scanning relies on entire or portions of files to be a hit, and can gloss over smaller malicious chunks which YARA can process. Throwing YARA over the dataset returned the following rule matches: [6]. Manually reviewing results returned more false positives - the only injustices to be found were the plugins with obfuscated code which is against firefox add-on policies.

Malicious URLs

Fine, the files don't contain immediate malware artifacts, what about the URLs the extensions reach out to - are they known-bad? Google provides an API for its' safe browsing DB that tags URLs with indicators such as malware, unwanted software, social engineering etc. Using a gross regex with grep over the files, I had a urls.txt file with which to process through the safe browsing API 500 URLs at a time: [7]. 177 hits were returned [8] - surely some true positives right? Well the vast majority of hits belonged to ad blocking or safe browsing extensions that simply ship with a static database of known-bad URLs. What remained was reviewed and nothing nasty was found.

Secret Mining

Let's take a break from looking for baddies and look for goodies. Developers make mistakes (or don't know otherwise) and put API keys and other secrets in their code. So I ran local TruffleHog over all 1.2 million files and it found hundreds of API keys for GitHub, GitLab, Gemini etc: [9]. Now I'm sure most of these are rotated or expired, but trufflehog also kindly verifies secrets are up to date as well which gives us 426 valid secrets: [10]!

Did You Mean to Distribute that to the World?

Looking at the raw extension files on disk, are there any interesting files / file extensions[11]? There were some interesting office files (.pptx, .docx, .xlsx) and plenty of 18+ images / audio clips. What I did find was some high schooler's assignment and even a backup of a French business's customer list and invoices (owner was notified).

List of customers
Sample invoice

Other Attempts

Closing

Did I find any malicious plugins? No. Do I think there's dozens of malicious plugins that I failed to find? Yes. Did I amass a cool dataset for future work? Yes. Did I have fun finding secrets and business documents? Yes. And finally, do I believe in publishing failed research? Yes. :)