Browse Classifications
- All Resources
- Strategic Content
- Technical Content
- Ahead of the Breach Podcast Content
- Partner Program Content
Introducing WebQL, an automated JavaScript analysis tool that leverages CodeQL to identify and exploit vulnerabilities in modern web applications like SPAs and PWAs. By automating the extraction, beautification, and analysis of client-side code, WebQL enhances penetration testing by uncovering security issues obscured by modern development practices.
A Framework for Penetration Testers, Bug Bounty Hunters, and Security Engineers to Identify & Exploit Modern APIs & Web Application Vulnerabilities
Earlier in the year I wrote about using Semgrep to analyze and exploit web applications. In the blog, I make the case for using Semgrep to supplement your web app pentests; running user-defined SAST templates to find vulnerabilities in web applications, specifically in client-side frameworks. I feel like it’s a valid approach, although maybe not a fully-fledged one, because there are a lot of missing traditional vulnerability checks and analysis features that you’d expect, namely things like authentication, authorization, and business logic. After getting comfortable with Semgrep, I decided to branch out a little bit more and look at CodeQL - I really think you should too. To provide some background, the following Microsoft writeup is about as good as you’re ever going to come across to cover what we’re talking about here:
https://msrc.microsoft.com/blog/2019/11/vulnerability-hunting-with-semmle-ql-dom-xss/
(At the bottom of this article I provide a lot of inspiration, resources, shouts, and works cited. Please check out these resources for an even more complete and accurate picture of what I try to describe here.)
In my Semgrep article, I quipped, “What if I'm performing a black box test with no access to the source?" [….] We'll walk through how to scrape a single page application (SPA) to access the contents locally, essentially turning a black box test into a white box(ish) one.”. I think we can do that to a higher degree than what is conventionally accepted and communicated as part of a web application assessment & penetration test.
We can do that by automating the orchestration of CodeQL to hunt for vulnerabilities and conduct exploitation against target web applications.
That’s this blog, and that’s this tool. Plain and simple. I want to find vulnerabilities in web apps, expand my web application pentesting skills, my technical frontend knowledge, and become more versed in using tools like Semgrep and CodeQL to accomplish those goals.
I decided to try my best and at least approach the problem with a best effort. To set some ground rules…
With the disclaimer out of the way, I am totally looking for feedback and ways to make this tool, framework, and new (to me) testing approach better for myself and the community. While technically this is all just done on the client side for now, I have this crazy hunch there’s more here. We aren’t replacing BurpSuite, Caido or ZAP. I know that. But this is how I’ve been feeling lately:
What I really believe to be valuable here is perhaps the introduction to an approach - combining CodeQL, user-defined SAST analysis modules, and efficient parsing of scanning results to gain deeper insight into potential vulnerabilities in web applications.
Hopefully after reading this blog and looking over the tool you feel the same way.
PS: As a JavaScript noob, at times it seems completely nonsensical and filled with terrifyingly complex attributes & attributes. I won’t pretend to have a deep understanding of it.
Modern web development has shifted dramatically from traditional multi-page applications to more dynamic and responsive architectures:
These modern architectures, while enhancing user experience and performance, increase the complexity of JavaScript codebases. This complexity, in turn, expands the potential attack surface, making thorough security analysis more crucial than ever.
The front-end build process for contemporary web applications is divided into two distinct phases: development and production. During development, developers utilize various tools and techniques to enhance productivity and maintainability:
These technologies require a build process to convert the code into standard HTML, JavaScript, and CSS that browsers can interpret. This process, known as transpilation, often includes optimizations such as compression and minification to improve performance and reduce file size.
Build tools like webpack, Vite, or Parcel manage the optimizations and processes described above. The final product of these processes is then deployed to production. However, this build process introduces layers of abstraction that can obscure potential security vulnerabilities.
Sourcemaps are files that assist developers in translating minified, compressed, and transpiled code back into the original source code, obviously done to aid the debugging process. However, they can inadvertently expose sensitive information such as API keys, hidden API routes, or embedded secrets if not properly managed.
With all these tools, frameworks, build tools, and sourcemaps, several questions to ask ourselves:
These questions, at least partially, underscore the need for analysis tools capable of understanding and evaluating modern JavaScript applications from the perspective of a penetration tester and bug hunter. So let’s bring them all together into a tool that can help us do all of this!
And yes, there are about a million and one existing solutions that will do the job better, faster, and more thoroughly - But I’m not spearheading those efforts, I’m spearheading this one.
WebQL is an automated JavaScript analysis engine and workflow orchestration framework for modern web application analysis. It combines the power of static analysis tools like CodeQL with dynamic scanning capabilities to provide comprehensive security insights for web applications.
If this all sounds like fun and you want to just dive right in and give it a go, the README has a lot more information, including vulnerable examples, test scripts, and easy installation instructions:
https://github.com/queencitycyber/webql
WebQL's scanning capabilities include hitting webpack bundles, sourcemaps, and dynamically imported modules:
webql scan https://example.com
As part of the scanning process (and scan
command specifically), JavaScript files are downloaded to your local filesystem. Then, beautification and deobfuscation are performed on the files, making AST parsing and CodeQL analysis much more accurate. It also just makes the code more readable, which I unscientifically have determined increases vulnerability coverage and discoverability.
Webcrack does the heavy lifting here, so shoutout to the team over there 🫡
Following the scanning phase, WebQL leverages CodeQL for the actual advanced semantic analysis. This is done in a number of ways, including taint and source/sink analysis. Briefly, we are tracking the flow of user input and taking note of all the places where that input touches the pieces of an application. This can uncover things like SQL injection, directory traversal, cross-site scripting, and prototype pollution.
1. This command creates a CodeQL database from the JavaScript files in the specified directory.
webql generate ./output --db-name my_analysis
2. Then, this command runs CodeQL analysis on the generated database and outputs the results in SARIF format:
webql parse ./output/my_analysis --output-file results.sarif
3. Finally, this command parses and displays the vulnerability results from the SARIF file:
webql results results.sarif
💡Check out Trail of Bits SARIF Explorer for a better view of the results: SARIF Explorer
These commands create a CodeQL database and execute a series of queries designed to identify security vulnerabilities specific to modern JavaScript applications. This includes detecting XSS vulnerabilities in template literals, identifying SQL injections obscured by ORM abstractions, and uncovering subtle prototype pollution issues that could lead to remote code execution.
tl;dr: Wanna quickly try it out? Get the tool installed and run it against JuiceShop:
webql full-analysis https://juice-shop.herokuapp.com
Let it rock n roll and see what bugs you can come up with!
Full trial run against random bug bounty target (it's VERY verbose right now):
Source: https://bugcrowd.com/engagements/ynab#:~:text=of-scope exceptions%3A-,Targets,-%3A
We also deobfuscate, prettify, unminify, transpile, and unpack JavaScript bundles to get us back to the original source code (as close as possible at least):
bullrun@ssec:$ webql full-analysis <https://staging-app.bany.dev/>
(equivalent command): webql scan <https://staging-app.bany.dev/>
(equivalent command): webql generate webql_output/staging-app_bany_dev_20240906_160150 --db-name staging-app_bany_dev_20240906_160150_db
Bonus: Here we get to see a snippet of the beautified JavaScript :D
The screenshot below is us building and running CodeQL on our downloaded JavaScript:
As the tool and CodeQL hammer away and they do their things, your machine might get a little warm, that’s okay :) After a minute of data crunching, we’ve got some flagged vulnerabilities to look over:
A few quick observations here:
(equivalent command): webql parse webql_output/staging-app_bany_dev_20240906_160150/staging-app_bany_dev_20240906_160150_db --output-file demo_results.sarif
(equivalent command): webql results demo_results.sarif
Now…it won’t do Hail Mary (or db_autopwn
for the real ones) and actually exploit anything, but I think the flagged items above are a great place to start! Or at least….that’s the idea and inspiration behind the tool.
And for the real, million dollar question: Is the application actually vulnerable to DOM-based XSS or is CodeQL flagging false positives? Regardless of the answer, how do we modify and reconfigure WebQL to get us closer to the actual answer? This is where I want (and encourage you) to spend some time.
In this tutorial, we've walked through the entire process of using WebQL to scan and analyze a web application for security vulnerabilities. We've seen how WebQL can:
Hopefully you’ve made it this far and at least can see the vision I had when this all began. I’d like to think the community may enjoy something like WebQL and we can continue to work on it together. I’ve been really liking the idea of this type of automation so I do think there’s value. Maybe up next is incorporating some authentication mechanisms . A few cool ideas I have about possible future components if this all seems viable:
So let me know what you think - is this a viable approach and attack path for assessing web applications? Is this something you’d be interesting in exploring further?
Inspiration, Resources, Creds, Shouts:
https://github.com/zb3/getfrontend
https://news.ycombinator.com/item?id=40855117
https://devtools.tech/blog/understanding-webpacks-require---rid---7VvMusDzMPVh17YyHdyL
https://msrc.microsoft.com/blog/2019/11/vulnerability-hunting-with-semmle-ql-dom-xss/
https://breachforce.net/source-and-sinks
https://medium.com/codex/hunting-for-xss-with-codeql-57f70763b938
https://raz0r.name/articles/using-codeql-to-detect-client-side-vulnerabilities-in-web-applications/
https://medium.com/@rarecoil/spa-source-code-recovery-by-un-webpacking-source-maps-ef830fc2351d
Continuous Human & Automated Security
Continuously monitor your attack surface with advanced change detection. Upon change, testers and systems perform security testing. You are alerted and assisted in remediation efforts all contained in a single security application, the Sprocket Platform.