docs: Add 'Customizing library models for Rust' documentation#21727
docs: Add 'Customizing library models for Rust' documentation#21727coadaflorin wants to merge 5 commits intomainfrom
Conversation
Add documentation for customizing library models for Rust using data extension files. This follows the pattern of existing documentation for other languages (Java, Python, Ruby, Go, C#, C++, JavaScript). The documentation covers: - Rust-specific extensible predicates (sourceModel, sinkModel, summaryModel, neutralModel) with their simplified schema - Canonical path syntax for identifying Rust functions and methods - Examples using real models from the codebase (sqlx, reqwest, std::env, std::path, Iterator::map) - Access path token reference (Argument, Parameter, ReturnValue, Element, Field, Reference, Future) - Source and sink kind reference - Threat model integration Also updates codeql-for-rust.rst to include the new page in the toctree. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add barrierModel and barrierGuardModel sections to the Rust library models documentation, following the pattern established in PR #21523 for other languages. Includes: - New extensible predicate descriptions in the overview - Example: barrier for SQL injection using escape_sql - Example: barrier guard for path injection using is_safe_path - Reference material for both barrierModel and barrierGuardModel Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
geoffw0
left a comment
There was a problem hiding this comment.
Partially reviewed. I need to continue from "Examples of custom model definitions", then check final rendering and links. We will also want a docs team review at some point.
| - **Free functions**: ``crate::module::function``, for example ``std::env::var`` or ``std::fs::read_to_string``. | ||
| - **Inherent methods**: ``<Type>::method``, for example ``<std::fs::File>::open``. | ||
| - **Trait methods with a concrete type**: ``<Type as Trait>::method``, for example ``<std::fs::File as std::io::Read>::read_to_end``. | ||
| - **Trait methods with a wildcard type**: ``<_ as Trait>::method``, for example ``<_ as core::clone::Clone>::clone``. This form matches any type that implements the trait and is useful for modeling broadly applicable trait methods. |
There was a problem hiding this comment.
I don't see this section in the doc for other languages, I think Copilot may have synthesised it entirely ... but it looks really helpful, and as far as I can tell, correct.
There was a problem hiding this comment.
As someone with no familiarity with rust, it looks helpful to me. (Assuming it's correct.)
…for-rust.rst Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
Add the 'Publish data extension files in a CodeQL model pack to share' section, matching the structure used in C#, C++, Go, and Java docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new Rust language guide page describing how to write CodeQL data extensions for Rust library modeling, and wires it into the Rust docs index.
Changes:
- Added a new documentation page describing Rust-specific modeling concepts (canonical paths, access paths, and extensible predicates).
- Added the new page to the Rust language guide toctree and link list.
Show a summary per file
| File | Description |
|---|---|
| docs/codeql/codeql-language-guides/customizing-library-models-for-rust.rst | New guide page explaining how to create Rust library models with data extensions. |
| docs/codeql/codeql-language-guides/codeql-for-rust.rst | Adds the new guide page to the Rust documentation navigation. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 2
| It would also be possible to merge the two rows into one by using a comma-separated list in the second value: | ||
|
|
||
| .. code-block:: yaml | ||
|
|
||
| extensions: | ||
| - addsTo: | ||
| pack: codeql/rust-all | ||
| extensible: summaryModel | ||
| data: | ||
| - ["<std::path::Path>::join", "Argument[self,0]", "ReturnValue", "taint", "manual"] | ||
|
|
||
| This row defines flow from both the receiver and the first argument to the return value. The second value ``Argument[self,0]`` is shorthand for specifying an access path to both ``Argument[self]`` and ``Argument[0]``. | ||
|
|
||
| .. note:: | ||
|
|
||
| When using ``Argument[self]`` to refer to the receiver, the ``Reference`` token may need to be appended to follow through the ``&self`` or ``&mut self`` reference to the underlying value. This depends on whether the data you want to track is on the reference itself or on the value behind the reference. |
There was a problem hiding this comment.
This merged example changes the meaning compared to the two-row version above. In the two-row version, the receiver flow is modeled from Argument[self].Reference (and the text states this is needed because join takes &self), but the merged form drops .Reference entirely. Either remove this merged example, or adjust it so it preserves the same access-path semantics (and clarify any limitations if the shorthand can’t express per-operand tokens).
| It would also be possible to merge the two rows into one by using a comma-separated list in the second value: | |
| .. code-block:: yaml | |
| extensions: | |
| - addsTo: | |
| pack: codeql/rust-all | |
| extensible: summaryModel | |
| data: | |
| - ["<std::path::Path>::join", "Argument[self,0]", "ReturnValue", "taint", "manual"] | |
| This row defines flow from both the receiver and the first argument to the return value. The second value ``Argument[self,0]`` is shorthand for specifying an access path to both ``Argument[self]`` and ``Argument[0]``. | |
| .. note:: | |
| When using ``Argument[self]`` to refer to the receiver, the ``Reference`` token may need to be appended to follow through the ``&self`` or ``&mut self`` reference to the underlying value. This depends on whether the data you want to track is on the reference itself or on the value behind the reference. | |
| In this case, the two rows should not be merged into one by using a comma-separated list in the second value. | |
| The receiver flow is modeled as ``Argument[self].Reference``, while the first argument is modeled as | |
| ``Argument[0]``. Since these access paths are different, keeping them as separate rows preserves the | |
| correct semantics for ``Path::join``. | |
| .. note:: | |
| When using ``Argument[self]`` to refer to the receiver, the ``Reference`` token may need to be appended to follow through the ``&self`` or ``&mut self`` reference to the underlying value. This depends on whether the data you want to track is on the reference itself or on the value behind the reference. If different operands require different access-path tokens, model them using separate rows instead of a single comma-separated shorthand. |
| - **Field[**\ ``type::field``\ **]** selects a named field of a struct or enum variant. For example, ``Field[core::option::Option::Some(0)]`` selects the first positional field of the ``Some`` variant. | ||
| - **Field[**\ ``type(i)``\ **]** selects the ``i``-th positional field of a tuple struct or tuple enum variant. For example, ``Field[core::result::Result::Ok(0)]`` selects the value inside ``Ok``. |
There was a problem hiding this comment.
The description of Field[...] is internally inconsistent: the first bullet claims type::field and then gives an example that is not a named field (it’s a variant + positional index), and the second bullet describes type(i) but the example uses type::Variant(i). Please align the prose with the actual expected syntax (and update the examples accordingly) so readers can reliably construct correct Field[...] access paths.
| - **Field[**\ ``type::field``\ **]** selects a named field of a struct or enum variant. For example, ``Field[core::option::Option::Some(0)]`` selects the first positional field of the ``Some`` variant. | |
| - **Field[**\ ``type(i)``\ **]** selects the ``i``-th positional field of a tuple struct or tuple enum variant. For example, ``Field[core::result::Result::Ok(0)]`` selects the value inside ``Ok``. | |
| - **Field[**\ ``type::field``\ **]** selects a named field of a struct or enum variant. For example, ``Field[my_crate::Config::path]`` selects the named field ``path``. | |
| - **Field[**\ ``type(i)``\ **]** selects the ``i``-th positional field of a tuple struct, and **Field[**\ ``type::Variant(i)``\ **]** selects the ``i``-th positional field of a tuple enum variant. For example, ``Field[core::result::Result::Ok(0)]`` selects the value inside ``Ok``. |
Summary
Adds a new documentation page: Customizing library models for Rust, following the pattern of existing documentation for other languages:
What's included
The documentation covers Rust-specific concepts:
sourceModel,sinkModel,summaryModel,neutralModelwith Rust's simplified 3-5 column schema (vs Java/Go's 9-10 column schema)crate::module::function,<Type>::method,<Type as Trait>::method,<_ as Trait>::method)Reference(for&T),Future(for async),Fieldwith Rust enum variant syntaxsqlxreqwest::getstd::env::varreqwest::Response::text(async)std::path::Path::join(multiple inputs)Iterator::map(higher-order, wildcard trait)Option::mapChanges
docs/codeql/codeql-language-guides/customizing-library-models-for-rust.rstdocs/codeql/codeql-language-guides/codeql-for-rust.rst— added toctree entry and description