Skip to content

Asset scraping

Houdini is versatile in the way asset filenames are specified. They could contain:

  • Houdini variables
  • Environment variables
  • Channel references
  • Expressions
  • Tokens, such as UDIMSs
  • Any combination of the above

This makes it difficult to evaluate the actual set of files on disk that should be made available to a render farm. We'd either have to figure out how each of those components should be resolved, or evaluate all parameters for all frames, which becomes unacceptably slow for even simple scenes. The method we've chosen involves a little of both.

Evaluate a single frame then generate a glob.

Rather than evaluate parameters for all frames, we evaluate each parameter for ONE frame, and then use a heuristic, based on the evaluated string, to help find the files on disk that could contribute to the parameter at other times and for any tokens such as UDIMS.

The heuristic is a regular expression designed to find the patterns that indicate varying parts of the filename, such as the frame number.

Steps

  • Find all file reference parameters with Houdini's fileReferences() python command.
  • Evaluate each of them, using parm.eval() for one frame only.
  • Use the regular expression to replace varying patterns with asterisks * to make it globbable.
  • Use the python glob package to find all files on disk that match.

Example

Consider a parameter with the raw value:

$HIP/tex/2/armour.$F4_<u>-<v>.<custom_tok>.jpg

where custom_tok is a prim-var, $F4 is the frame number, u and v are UDIMs.

This evaluates to:

/share/projects/troy/houdini/tex/2/armour.0036_<u>-<v>.<custom_tok>.jpg

Render time variables are not evaluated.

It's fair to assume that there are more frames on disk, and those files will have varying udim values and several variations of strings representing objects. We want to construct a glob to return all those files. It should look like this:

/share/projects/troy/houdini/tex/2/armour.*_*-*.*.jpg

The default regular expression achieves this:

(_|\.|-)\d+(_|\.|-)|<\w+>

Breaking it down:

There are 2 parts separated by a |. 1. The first part (_|\.|-)\d+(_|\.|-) finds frame numbers surrounded either side by any one of the delimiters _ . -. | means OR, so (_|\.|-) means any one of those characters. The . has to be escaped. Brackets are used to group the options. \d+ means one or more digits. Then the same delimiter options are also to the right. This part of the regular expression finds the following patterns: * .0022. * _1_ * -010_ * .99999_

Notice, it won't find `2` or `/2/` because the delimiters don't match, so in our example, it won't replace the directory named `2`, and therefore won't look in a directory named ```/share/projects/troy/houdini/tex/3/```
  1. The second part <\w+> finds any string consisting of letters and underscores and surrounded by angle brackets. Letters may be mixed case. Therefore, all of the following match.
  2. <u>
  3. <UDIM>
  4. <uVal>
  5. <custom_tok>

When the example pattern is globbed against the filesystem, it will find the following files or similar:

/share/projects/troy/houdini/tex/2/armour.1001_01-01.sword.jpg
/share/projects/troy/houdini/tex/2/armour.1001_01-02.sword.jpg
/share/projects/troy/houdini/tex/2/armour.1002_01-01.sword.jpg
/share/projects/troy/houdini/tex/2/armour.1002_01-02.sword.jpg
/share/projects/troy/houdini/tex/2/armour.1001_01-01.shield.jpg
/share/projects/troy/houdini/tex/2/armour.1001_01-02.shield.jpg
/share/projects/troy/houdini/tex/2/armour.1002_01-01.shield.jpg
/share/projects/troy/houdini/tex/2/armour.1002_01-02.shield.jpg

The regular expression mentioned above is on a parameter that is exposed to the user: asset_regex. It is locked by default, but a TD may choose to modify it if the default doesn't find everything, or finds too much.

The best way to check the results of the asset scan is to look in the preview panel and click the do_asset_scan button. Then scroll down to the upload_files section to see the results.

Exclusions

Since the asset scan operation is based on Houdini's fileReferences() method, it can sometimes find files that are not needed for the submission. To ignore those files, use the Asset scan excludes parameter to enter a list of exclusion patterns. These are Unix-style wildcard patterns and should be separated by commas.

Pattern Meaning
* matches everything
? matches any single character
[seq] matches any character in seq
[!seq] matches any character not in seq

For example, to exclude files ending with sword.jpg and files in any folder called backups, use the following pattern:

*sword.jpg, */backups/*