Skip to content

docs: Add 'Customizing library models for Rust' documentation#21727

Open
coadaflorin wants to merge 5 commits intomainfrom
docs/customizing-library-models-for-rust
Open

docs: Add 'Customizing library models for Rust' documentation#21727
coadaflorin wants to merge 5 commits intomainfrom
docs/customizing-library-models-for-rust

Conversation

@coadaflorin
Copy link
Copy Markdown
Contributor

Summary

Adds a new documentation page: Customizing library models for Rust, following the pattern of existing documentation for other languages:

What's included

The documentation covers Rust-specific concepts:

  • Extensible predicates: sourceModel, sinkModel, summaryModel, neutralModel with Rust's simplified 3-5 column schema (vs Java/Go's 9-10 column schema)
  • Canonical paths: How Rust identifies callables using fully-qualified paths (crate::module::function, <Type>::method, <Type as Trait>::method, <_ as Trait>::method)
  • Rust-specific access path tokens: Reference (for &T), Future (for async), Field with Rust enum variant syntax
  • Examples using real models from the codebase:
    • SQL injection sink with sqlx
    • Remote source from reqwest::get
    • Environment variable source from std::env::var
    • Flow summary through reqwest::Response::text (async)
    • Flow summary through std::path::Path::join (multiple inputs)
    • Flow summary through Iterator::map (higher-order, wildcard trait)
    • Neutral model for Option::map
  • Reference sections for predicates, access paths, source/sink/summary kinds, and threat models

Changes

  • New file: docs/codeql/codeql-language-guides/customizing-library-models-for-rust.rst
  • Modified: docs/codeql/codeql-language-guides/codeql-for-rust.rst — added toctree entry and description

Add documentation for customizing library models for Rust using data
extension files. This follows the pattern of existing documentation for
other languages (Java, Python, Ruby, Go, C#, C++, JavaScript).

The documentation covers:
- Rust-specific extensible predicates (sourceModel, sinkModel,
  summaryModel, neutralModel) with their simplified schema
- Canonical path syntax for identifying Rust functions and methods
- Examples using real models from the codebase (sqlx, reqwest,
  std::env, std::path, Iterator::map)
- Access path token reference (Argument, Parameter, ReturnValue,
  Element, Field, Reference, Future)
- Source and sink kind reference
- Threat model integration

Also updates codeql-for-rust.rst to include the new page in the
toctree.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add barrierModel and barrierGuardModel sections to the Rust library
models documentation, following the pattern established in PR #21523
for other languages.

Includes:
- New extensible predicate descriptions in the overview
- Example: barrier for SQL injection using escape_sql
- Example: barrier guard for path injection using is_safe_path
- Reference material for both barrierModel and barrierGuardModel

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially reviewed. I need to continue from "Examples of custom model definitions", then check final rendering and links. We will also want a docs team review at some point.

Comment thread docs/codeql/codeql-language-guides/codeql-for-rust.rst Outdated
Comment thread docs/codeql/codeql-language-guides/customizing-library-models-for-rust.rst Outdated
- **Free functions**: ``crate::module::function``, for example ``std::env::var`` or ``std::fs::read_to_string``.
- **Inherent methods**: ``<Type>::method``, for example ``<std::fs::File>::open``.
- **Trait methods with a concrete type**: ``<Type as Trait>::method``, for example ``<std::fs::File as std::io::Read>::read_to_end``.
- **Trait methods with a wildcard type**: ``<_ as Trait>::method``, for example ``<_ as core::clone::Clone>::clone``. This form matches any type that implements the trait and is useful for modeling broadly applicable trait methods.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this section in the doc for other languages, I think Copilot may have synthesised it entirely ... but it looks really helpful, and as far as I can tell, correct.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone with no familiarity with rust, it looks helpful to me. (Assuming it's correct.)

coadaflorin and others added 3 commits April 17, 2026 15:11
…for-rust.rst

Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
Add the 'Publish data extension files in a CodeQL model pack to share'
section, matching the structure used in C#, C++, Go, and Java docs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coadaflorin coadaflorin marked this pull request as ready for review April 20, 2026 14:25
@coadaflorin coadaflorin requested a review from a team as a code owner April 20, 2026 14:25
Copilot AI review requested due to automatic review settings April 20, 2026 14:25
@coadaflorin
Copy link
Copy Markdown
Contributor Author

@geoffw0 if this looks good to you, I'll ask the doc teams if they can take a quick look and then I'lll try to see if I can merge at a similar time to @owen-mc so we have the docs up for all languages that can use barriers

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new Rust language guide page describing how to write CodeQL data extensions for Rust library modeling, and wires it into the Rust docs index.

Changes:

  • Added a new documentation page describing Rust-specific modeling concepts (canonical paths, access paths, and extensible predicates).
  • Added the new page to the Rust language guide toctree and link list.
Show a summary per file
File Description
docs/codeql/codeql-language-guides/customizing-library-models-for-rust.rst New guide page explaining how to create Rust library models with data extensions.
docs/codeql/codeql-language-guides/codeql-for-rust.rst Adds the new guide page to the Rust documentation navigation.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment on lines +250 to +265
It would also be possible to merge the two rows into one by using a comma-separated list in the second value:

.. code-block:: yaml

extensions:
- addsTo:
pack: codeql/rust-all
extensible: summaryModel
data:
- ["<std::path::Path>::join", "Argument[self,0]", "ReturnValue", "taint", "manual"]

This row defines flow from both the receiver and the first argument to the return value. The second value ``Argument[self,0]`` is shorthand for specifying an access path to both ``Argument[self]`` and ``Argument[0]``.

.. note::

When using ``Argument[self]`` to refer to the receiver, the ``Reference`` token may need to be appended to follow through the ``&self`` or ``&mut self`` reference to the underlying value. This depends on whether the data you want to track is on the reference itself or on the value behind the reference.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merged example changes the meaning compared to the two-row version above. In the two-row version, the receiver flow is modeled from Argument[self].Reference (and the text states this is needed because join takes &self), but the merged form drops .Reference entirely. Either remove this merged example, or adjust it so it preserves the same access-path semantics (and clarify any limitations if the shorthand can’t express per-operand tokens).

Suggested change
It would also be possible to merge the two rows into one by using a comma-separated list in the second value:
.. code-block:: yaml
extensions:
- addsTo:
pack: codeql/rust-all
extensible: summaryModel
data:
- ["<std::path::Path>::join", "Argument[self,0]", "ReturnValue", "taint", "manual"]
This row defines flow from both the receiver and the first argument to the return value. The second value ``Argument[self,0]`` is shorthand for specifying an access path to both ``Argument[self]`` and ``Argument[0]``.
.. note::
When using ``Argument[self]`` to refer to the receiver, the ``Reference`` token may need to be appended to follow through the ``&self`` or ``&mut self`` reference to the underlying value. This depends on whether the data you want to track is on the reference itself or on the value behind the reference.
In this case, the two rows should not be merged into one by using a comma-separated list in the second value.
The receiver flow is modeled as ``Argument[self].Reference``, while the first argument is modeled as
``Argument[0]``. Since these access paths are different, keeping them as separate rows preserves the
correct semantics for ``Path::join``.
.. note::
When using ``Argument[self]`` to refer to the receiver, the ``Reference`` token may need to be appended to follow through the ``&self`` or ``&mut self`` reference to the underlying value. This depends on whether the data you want to track is on the reference itself or on the value behind the reference. If different operands require different access-path tokens, model them using separate rows instead of a single comma-separated shorthand.

Copilot uses AI. Check for mistakes.
Comment on lines +548 to +549
- **Field[**\ ``type::field``\ **]** selects a named field of a struct or enum variant. For example, ``Field[core::option::Option::Some(0)]`` selects the first positional field of the ``Some`` variant.
- **Field[**\ ``type(i)``\ **]** selects the ``i``-th positional field of a tuple struct or tuple enum variant. For example, ``Field[core::result::Result::Ok(0)]`` selects the value inside ``Ok``.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of Field[...] is internally inconsistent: the first bullet claims type::field and then gives an example that is not a named field (it’s a variant + positional index), and the second bullet describes type(i) but the example uses type::Variant(i). Please align the prose with the actual expected syntax (and update the examples accordingly) so readers can reliably construct correct Field[...] access paths.

Suggested change
- **Field[**\ ``type::field``\ **]** selects a named field of a struct or enum variant. For example, ``Field[core::option::Option::Some(0)]`` selects the first positional field of the ``Some`` variant.
- **Field[**\ ``type(i)``\ **]** selects the ``i``-th positional field of a tuple struct or tuple enum variant. For example, ``Field[core::result::Result::Ok(0)]`` selects the value inside ``Ok``.
- **Field[**\ ``type::field``\ **]** selects a named field of a struct or enum variant. For example, ``Field[my_crate::Config::path]`` selects the named field ``path``.
- **Field[**\ ``type(i)``\ **]** selects the ``i``-th positional field of a tuple struct, and **Field[**\ ``type::Variant(i)``\ **]** selects the ``i``-th positional field of a tuple enum variant. For example, ``Field[core::result::Result::Ok(0)]`` selects the value inside ``Ok``.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants