Asset scraping¶
Houdini is versatile in the way asset filenames are specified. They could contain:
- Houdini variables
- Environment variables
- Channel references
- Expressions
- Tokens, such as UDIMSs
- Any combination of the above
This makes it difficult to evaluate the actual set of files on disk that should be made available to a render farm. We'd either have to figure out how each of those components should be resolved, or evaluate all parameters for all frames, which becomes unacceptably slow for even simple scenes. The method we've chosen involves a little of both.
Evaluate a single frame then generate a glob.¶
Rather than evaluate parameters for all frames, we evaluate each parameter for ONE frame, and then use a heuristic, based on the evaluated string, to help find the files on disk that could contribute to the parameter at other times and for any tokens such as UDIMS.
The heuristic is a regular expression designed to find the patterns that indicate varying parts of the filename, such as the frame number.
Steps¶
- Find all file reference parameters with Houdini's
fileReferences()
python command. - Evaluate each of them, using
parm.eval()
for one frame only. - Use the regular expression to replace varying patterns with asterisks
*
to make it globbable. - Use the python glob package to find all files on disk that match.
Example¶
Consider a parameter with the raw value:
$HIP/tex/2/armour.$F4_<u>-<v>.<custom_tok>.jpg
where custom_tok
is a prim-var, $F4
is the frame number, u
and v
are UDIMs.
This evaluates to:
/share/projects/troy/houdini/tex/2/armour.0036_<u>-<v>.<custom_tok>.jpg
Render time variables are not evaluated.
It's fair to assume that there are more frames on disk, and those files will have varying udim values and several variations of strings representing objects. We want to construct a glob to return all those files. It should look like this:
/share/projects/troy/houdini/tex/2/armour.*_*-*.*.jpg
The default regular expression achieves this:
(_|\.|-)\d+(_|\.|-)|<\w+>
Breaking it down:
There are 2 parts separated by a |
.
1. The first part (_|\.|-)\d+(_|\.|-)
finds frame numbers surrounded either side by any one of the delimiters _ . -
.
|
means OR, so (_|\.|-)
means any one of those characters. The .
has to be escaped. Brackets are used to group the options.
\d+
means one or more digits.
Then the same delimiter options are also to the right. This part of the regular expression finds the following patterns:
* .0022.
* _1_
* -010_
* .99999_
Notice, it won't find `2` or `/2/` because the delimiters don't match, so in our example, it won't replace the directory named `2`, and therefore won't look in a directory named ```/share/projects/troy/houdini/tex/3/```
- The second part
<\w+>
finds any string consisting of letters and underscores and surrounded by angle brackets. Letters may be mixed case. Therefore, all of the following match. <u>
<UDIM>
<uVal>
<custom_tok>
When the example pattern is globbed against the filesystem, it will find the following files or similar:
/share/projects/troy/houdini/tex/2/armour.1001_01-01.sword.jpg
/share/projects/troy/houdini/tex/2/armour.1001_01-02.sword.jpg
/share/projects/troy/houdini/tex/2/armour.1002_01-01.sword.jpg
/share/projects/troy/houdini/tex/2/armour.1002_01-02.sword.jpg
/share/projects/troy/houdini/tex/2/armour.1001_01-01.shield.jpg
/share/projects/troy/houdini/tex/2/armour.1001_01-02.shield.jpg
/share/projects/troy/houdini/tex/2/armour.1002_01-01.shield.jpg
/share/projects/troy/houdini/tex/2/armour.1002_01-02.shield.jpg
The regular expression mentioned above is on a parameter that is exposed to the user: asset_regex
. It is locked by default, but a TD may choose to modify it if the default doesn't find everything, or finds too much.
The best way to check the results of the asset scan is to look in the preview panel and click the do_asset_scan button. Then scroll down to the upload_files
section to see the results.
Exclusions¶
Since the asset scan operation is based on Houdini's fileReferences()
method, it can sometimes find files that are not needed for the submission. To ignore those files, use the Asset scan excludes parameter to enter a list of exclusion patterns. These are Unix-style wildcard patterns and should be separated by commas.
Pattern | Meaning |
---|---|
* | matches everything |
? | matches any single character |
[seq] | matches any character in seq |
[!seq] | matches any character not in seq |
For example, to exclude files ending with sword.jpg
and files in any folder called backups, use the following pattern:
*sword.jpg, */backups/*