WebQL: Using CodeQL for JavaScript Security in Modern Web Apps

A Framework for Penetration Testers, Bug Bounty Hunters, and Security Engineers to Identify & Exploit Modern APIs & Web Application Vulnerabilities

Quick Introduction

Earlier in the year I wrote about using Semgrep to analyze and exploit web applications. In the blog, I make the case for using Semgrep to supplement your web app pentests; running user-defined SAST templates to find vulnerabilities in web applications, specifically in client-side frameworks. I feel like it’s a valid approach, although maybe not a fully-fledged one, because there are a lot of missing traditional vulnerability checks and analysis features that you’d expect, namely things like authentication, authorization, and business logic. After getting comfortable with Semgrep, I decided to branch out a little bit more and look at CodeQL - I really think you should too. To provide some background, the following Microsoft writeup is about as good as you’re ever going to come across to cover what we’re talking about here:

https://msrc.microsoft.com/blog/2019/11/vulnerability-hunting-with-semmle-ql-dom-xss/

(At the bottom of this article I provide a lot of inspiration, resources, shouts, and works cited. Please check out these resources for an even more complete and accurate picture of what I try to describe here.)

In my Semgrep article, I quipped, “What if I'm performing a black box test with no access to the source?" [….] We'll walk through how to scrape a single page application (SPA) to access the contents locally, essentially turning a black box test into a white box(ish) one.”. I think we can do that to a higher degree than what is conventionally accepted and communicated as part of a web application assessment & penetration test.

We can do that by automating the orchestration of CodeQL to hunt for vulnerabilities and conduct exploitation against target web applications.

That’s this blog, and that’s this tool. Plain and simple. I want to find vulnerabilities in web apps, expand my web application pentesting skills, my technical frontend knowledge, and become more versed in using tools like Semgrep and CodeQL to accomplish those goals.

I decided to try my best and at least approach the problem with a best effort. To set some ground rules…

I am going to get some things wrong, either in my definitions, approach, tooling, analysis, or prognostication on feasibility. Let me know what I get wrong.
I am not a developer. Let me know what I can do better.
I am a hacker, so I’m stumbling my way through like every good hacker does. Let me know if I’m crazy in my thinking.

With the disclaimer out of the way, I am totally looking for feedback and ways to make this tool, framework, and new (to me) testing approach better for myself and the community. While technically this is all just done on the client side for now, I have this crazy hunch there’s more here. We aren’t replacing BurpSuite, Caido or ZAP. I know that. But this is how I’ve been feeling lately:

What I really believe to be valuable here is perhaps the introduction to an approach - combining CodeQL, user-defined SAST analysis modules, and efficient parsing of scanning results to gain deeper insight into potential vulnerabilities in web applications.

Hopefully after reading this blog and looking over the tool you feel the same way.

PS: As a JavaScript noob, at times it seems completely nonsensical and filled with terrifyingly complex attributes & attributes. I won’t pretend to have a deep understanding of it.

The Evolution of Web Applications: SPAs, PWAs, and Beyond

Modern web development has shifted dramatically from traditional multi-page applications to more dynamic and responsive architectures:

Single-Page Applications (SPAs): SPAs load a single HTML page and dynamically update content as users interact with the app. Frameworks like React, Angular, and Vue.js have popularized this approach, offering seamless user experiences but introducing new security considerations.
Progressive Web Applications (PWAs): PWAs combine the best of web and mobile apps. They're installable, work offline, and provide native-app-like experiences. However, their reliance on service workers and caching mechanisms presents unique security challenges.

These modern architectures, while enhancing user experience and performance, increase the complexity of JavaScript codebases. This complexity, in turn, expands the potential attack surface, making thorough security analysis more crucial than ever.

The Modern Front-End Build Process: A Security Perspective

The front-end build process for contemporary web applications is divided into two distinct phases: development and production. During development, developers utilize various tools and techniques to enhance productivity and maintainability:

HTML Preprocessors (EJS, Handlebars, Mustache, Pug)
CSS Preprocessors (SASS, SCSS, LESS, PostCSS)
JavaScript Frameworks (React, Angular, Vue)
TypeScript

These technologies require a build process to convert the code into standard HTML, JavaScript, and CSS that browsers can interpret. This process, known as transpilation, often includes optimizations such as compression and minification to improve performance and reduce file size.

Webpack and Build Tools: Complexity & Security Implications

Build tools like webpack, Vite, or Parcel manage the optimizations and processes described above. The final product of these processes is then deployed to production. However, this build process introduces layers of abstraction that can obscure potential security vulnerabilities.

Sourcemaps: A Double-Edged Sword

Sourcemaps are files that assist developers in translating minified, compressed, and transpiled code back into the original source code, obviously done to aid the debugging process. However, they can inadvertently expose sensitive information such as API keys, hidden API routes, or embedded secrets if not properly managed.

Security Considerations in Modern JavaScript Development

With all these tools, frameworks, build tools, and sourcemaps, several questions to ask ourselves:

How can organizations effectively manage the security implications of complex build processes without compromising development efficiency? How can we as pentesters take advantage of this?
In an environment where code undergoes multiple transformations, what can we do to maintain visibility into potential vulnerabilities?
With frameworks and build processes abstracting much of the underlying code, how can developers and security professionals ensure they're not inadvertently introducing security vulnerabilities?

These questions, at least partially, underscore the need for analysis tools capable of understanding and evaluating modern JavaScript applications from the perspective of a penetration tester and bug hunter. So let’s bring them all together into a tool that can help us do all of this!

And yes, there are about a million and one existing solutions that will do the job better, faster, and more thoroughly - But I’m not spearheading those efforts, I’m spearheading this one.

Introducing WebQL: Advanced JavaScript Security Analysis

WebQL is an automated JavaScript analysis engine and workflow orchestration framework for modern web application analysis. It combines the power of static analysis tools like CodeQL with dynamic scanning capabilities to provide comprehensive security insights for web applications.

Features

URL scanning and JavaScript file extraction
Automatic JavaScript beautification (thanks Webcrack!)
Secret Scanning Modules (currently only supports TruffleHog, more to come!)
CodeQL database generation
Vulnerability analysis using CodeQL queries
Results parsing and presentation
Easy-to-use CLI interface

TL;DR

If this all sounds like fun and you want to just dive right in and give it a go, the README has a lot more information, including vulnerable examples, test scripts, and easy installation instructions:

https://github.com/queencitycyber/webql

WebQL - Quick Scan

WebQL's scanning capabilities include hitting webpack bundles, sourcemaps, and dynamically imported modules:

This command will scan the specified URL, extract JavaScript files, beautify them, generate a CodeQL database, and runs CodeQL analysis:

webql scan https://example.com

As part of the scanning process (and scan command specifically), JavaScript files are downloaded to your local filesystem. Then, beautification and deobfuscation are performed on the files, making AST parsing and CodeQL analysis much more accurate. It also just makes the code more readable, which I unscientifically have determined increases vulnerability coverage and discoverability. Webcrack does the heavy lifting here, so shoutout to the team over there 🫡

Code Analysis with CodeQL

Following the scanning phase, WebQL leverages CodeQL for the actual advanced semantic analysis. This is done in a number of ways, including taint and source/sink analysis. Briefly, we are tracking the flow of user input and taking note of all the places where that input touches the pieces of an application. This can uncover things like SQL injection, directory traversal, cross-site scripting, and prototype pollution.

1. This command creates a CodeQL database from the JavaScript files in the specified directory.

webql generate ./output --db-name my_analysis

2. Then, this command runs CodeQL analysis on the generated database and outputs the results in SARIF format:

webql parse ./output/my_analysis --output-file results.sarif

3. Finally, this command parses and displays the vulnerability results from the SARIF file:

webql results results.sarif

💡Check out Trail of Bits SARIF Explorer for a better view of the results: SARIF Explorer

These commands create a CodeQL database and execute a series of queries designed to identify security vulnerabilities specific to modern JavaScript applications. This includes detecting XSS vulnerabilities in template literals, identifying SQL injections obscured by ORM abstractions, and uncovering subtle prototype pollution issues that could lead to remote code execution.

Trial Run - Full Analysis

tl;dr: Wanna quickly try it out? Get the tool installed and run it against JuiceShop:

webql full-analysis https://juice-shop.herokuapp.com

Let it rock n roll and see what bugs you can come up with!

Full trial run against random bug bounty target (it's VERY verbose right now):

Source: https://bugcrowd.com/engagements/ynab#:~:text=of-scope exceptions%3A-,Targets,-%3A

Stage 1: Scan, Extract, & Download JavaScript files, sourcemaps and webpacks

We also deobfuscate, prettify, unminify, transpile, and unpack JavaScript bundles to get us back to the original source code (as close as possible at least):

bullrun@ssec:$ webql full-analysis <https://staging-app.bany.dev/>
(equivalent command): webql scan <https://staging-app.bany.dev/>

Stage 2: Build a JavaScript CodeQL Database

(equivalent command): webql generate webql_output/staging-app_bany_dev_20240906_160150 --db-name staging-app_bany_dev_20240906_160150_db

Bonus: Here we get to see a snippet of the beautified JavaScript :D

The screenshot below is us building and running CodeQL on our downloaded JavaScript:

As the tool and CodeQL hammer away and they do their things, your machine might get a little warm, that’s okay :) After a minute of data crunching, we’ve got some flagged vulnerabilities to look over:

Stage 3 & 4: Analyzing the Database and Parsing the Results

A few quick observations here:

Right now, the analysis is primary written as a wrapper parsing the CodeQL output. There isn’t really much intelligence happening just yet.
JavaScript is currently hardcoded into the CodeQL command, since that covers both JavaScript and TypeScript which is what you’ll usually find in our target applications (along with a standard set of CodeQL queries). There’s no reason we can’t add the target language as an optional flag and run our customer queries. In fact, that may be a perfect Part 2 to this blog 🙂

(equivalent command): webql parse webql_output/staging-app_bany_dev_20240906_160150/staging-app_bany_dev_20240906_160150_db --output-file demo_results.sarif
(equivalent command): webql results demo_results.sarif

Now…it won’t do Hail Mary (or db_autopwn for the real ones) and actually exploit anything, but I think the flagged items above are a great place to start! Or at least….that’s the idea and inspiration behind the tool.

And for the real, million dollar question: Is the application actually vulnerable to DOM-based XSS or is CodeQL flagging false positives? Regardless of the answer, how do we modify and reconfigure WebQL to get us closer to the actual answer? This is where I want (and encourage you) to spend some time.

Conclusion: WebQL - Advancing JavaScript Security in the Modern Web Era

In this tutorial, we've walked through the entire process of using WebQL to scan and analyze a web application for security vulnerabilities. We've seen how WebQL can:

Scan and download relevant JavaScript files from a web application
Generate a CodeQL database for deep analysis
Run security queries to detect vulnerabilities
Provide detailed, actionable results

Hopefully you’ve made it this far and at least can see the vision I had when this all began. I’d like to think the community may enjoy something like WebQL and we can continue to work on it together. I’ve been really liking the idea of this type of automation so I do think there’s value. Maybe up next is incorporating some authentication mechanisms . A few cool ideas I have about possible future components if this all seems viable:

Automated Authentication Mechanisms - everything we’ve done so far is an unauthenticated, gray-box style assessment. Giving WebQL credentials to better crawl and analyze the application will uncover more attack surface and higher vulnerability signal.
Transparent Rule Building - CodeQL allows you to create custom queries to search for vulnerabilities. I imagine a submodule or lower-level component of WebQL that could watch and analyze your downloaded JavaScript files to create new rules based on your target applications. Thus, as you browse and continue to use WebQL, you can fine-tune CodeQL queries to uncover additional vulnerabilities.

So let me know what you think - is this a viable approach and attack path for assessing web applications? Is this something you’d be interesting in exploring further?

Inspiration, Resources, Creds, Shouts:

https://github.com/zb3/getfrontend

https://news.ycombinator.com/item?id=40855117

https://devtools.tech/blog/understanding-webpacks-require---rid---7VvMusDzMPVh17YyHdyL

https://msrc.microsoft.com/blog/2019/11/vulnerability-hunting-with-semmle-ql-dom-xss/

https://breachforce.net/source-and-sinks

https://medium.com/codex/hunting-for-xss-with-codeql-57f70763b938

https://raz0r.name/articles/using-codeql-to-detect-client-side-vulnerabilities-in-web-applications/

https://medium.com/@rarecoil/spa-source-code-recovery-by-un-webpacking-source-maps-ef830fc2351d