Commit c6bafcdc authored by Imdad Sardharwalla's avatar Imdad Sardharwalla Committed by GitHub Enterprise
Browse files

REFORM-1005 Update scan.py for integration into ReForm Preprocess (#29)

* Update scan.py to be more flexible with output files

  Previously, scan.py insisted that tests were separated into "C++" and "other"
  tests. While fine for ReForm, this does not fit the gtest model we want to use
  in ReForm Preprocess. scan.py can now be configured to take any number of
  categories of tests, with no restriction on extensions.

* Add ability for scan.py to use a format

  Rather than taking the argument 'command' and prescribing the format

    command(<rel_path>)

  scan.py now takes a 'format' argument, in which instances of {relative_path}
  and {safe_relative_path} are respectively replaced with the relative path and
  a 'safe' version of the relative path (i.e. non-alphanumeric characters
  replaced by _). The above format would be written

    command({relative_path})

  This allows for much greater flexibility in the generated files.
parent 0ecc9eeb
......@@ -639,52 +639,89 @@ Preprocess.
## Scanning for tests
🔴 NOTE: This section will be added to and improved as part of <a
href="https://jira.autodesk.com/browse/REFORM-1005"
target="_blank">REFORM-1005</a>.
Some test frameworks do not automatically compile a list of tests. In this case,
Base/Test/tools/scan.py may be of use. It provides a Python script that exposes
the function `scan()` with the following signature:
```python
def scan(_input_dir, _output_dir,
_cpp_exts, _cpp_cmd, _cpp_out_filename,
_other_exts, _other_cmd, _other_out_filename)
def scan(_input_dir, _output_dir, _params_list, _sort_by_size=True)
```
This function scans an input directory (`_input_dir`) for test files and writes
their paths into `_cpp_out_filename` and `_other_out_filename` in the output
directory (`_output_dir`). These output files are designed to be consumed by
CMake and contain lines of the form:
This function recursively scans an input directory (`_input_dir`) for test files
and writes their relative paths in a prescribed format into files in the output
directory (`_output_dir`).
[comment]: # (Use Java highlighing to simulate CMake highlighting for this block of code.)
```java
cmd(relative_test_path)
```
Individual types of tests are defined by the `_params_list` argument, which must
be of the form:
where `cmd` is either `_cpp_cmd` or `_other_cmd`.
Test files fall into two categories:
```python
[
{
"name": "NAME",
"exts": {"EXT1", "EXT2", ...},
"frmt": "FORMAT",
"file": "FILE"
},
...
]
```
* C++ tests: those with a C++ extension (in the set `_cpp_exts`); and
* Other tests: those with an allowed extension (in the set `_other_exts`)
* `NAME` is the name of the test type, used to display statistics about the
tests found;
* `EXT1`, `EXT2`, ... are allowed file extensions for the current test type;
* `FORMAT` describes the format of the line that is output for each test file:
- `{relative_path}` is replaced by the path of the test file relative to
`_input_dir`,
- `{safe_relative_path}` is replaced by a 'safe' version of `{relative_path}`
(all non-alphanumeric characters replaced by an underscore);
* `FILE` is the path of the output file (relative to `_output_dir`).
Any test "[name].[ext]" with an associated .ignore file "[name].[ext].ignore"
will be ignored by the scan.
If `_sort_by_size == True` (default, slower), the tests will be sorted in
descending order by size before being output. If `False`, they will be sorted
alphabetically (faster).
<details>
<summary>Why are tests sorted?</summary><br>
The list of tests generated by scan.py is compared against the output file
(`_output_dir/FILE`) already stored on disk. This file is only updated if the
content differs. Sorting the test lists makes comparing the content very
straightforward.
The reasoning behind the option to sort by file size is that tests with
larger file sizes are likely to take longer, and therefore should be run
first (in a system that runs multiple tests in parallel) to improve overall
test time. For test frameworks that run the tests in the listed order, it
therefore makes sense to sort in descending order by file size.
</details>
### Example usage
scan.py could be used to write the file "cpp_tests_list.txt" with lines of the
form:
If `_params_list` is set to
```python
[{
"name": "C++",
"exts": {".cpp"},
"frmt": "add_cpp_test({relative_path})",
"file": "cpp_tests_list.txt"
}]
```
scan.py will search for files in `_input_dir` with the extension `.cc` and write
the file "cpp_tests_list.txt" in `_output_dir` with lines of the form:
[comment]: # (Use Java highlighing to simulate CMake highlighting for this block of code.)
```java
add_cpp_test([test_path])
```
The CMake file that builds the test framework could use the following code to
include all of the tests:
This could then be included by a CMake file that builds the test framework using
the following code to include all of the tests:
[comment]: # (Use Java highlighing to simulate CMake highlighting for this block of code.)
```java
......
# (C) Copyright 2021 by Autodesk, Inc.
import os
import sys
from pathlib import Path
......@@ -9,32 +8,65 @@ def get_file_extension(_filename):
return os.path.splitext(_filename)[1]
# Creates a list of tests to be be included in CMakeList.txt. If the test file
# already exists, read it in and compare it to the generated test list. Only
# write the generated test list if it differs from the existing list. This
# prevents CMake from reconfiguring if nothing has changed.
def make_test_list(_test_set, _input_dir, _output_dir, _filename, _cmd):
# Sort tests by file size (descending). Tests with larger file sizes are
# likely to take longer, and therefore should be run first to improve
# overall test time. Note: this is only relevant for the first test run, as
# CTest performs its own reordering of the tests for subsequent runs.
sorted_by_size = sorted(_test_set, key=lambda f:(os.stat(f).st_size, f), reverse=True)
# A character is 'safe' if it is alphanumeric or an underscore
def is_safe_char(_c):
return _c.isalnum() or _c == '_'
# Replace non-alphanumeric characters with an underscore
def get_safe_string(_string):
return ''.join('_' if not is_safe_char(c) else c for c in _string)
# Replace instances of '{relative_path}' and '{safe_relative_path}' as described
# in the comments above scan().
def format_string(_string, _rel_path):
# Replace all instances of '\' with '/' in the path to ensure we have a
# standard path separator for all OSs.
rel_path = _rel_path.replace('\\', '/')
formatted_string = _string.replace('{relative_path}', rel_path)
# First check if {safe_relative_path} is in the string to avoid computing
# get_safe_string() if not necessary.
if '{safe_relative_path}' in formatted_string:
formatted_string = formatted_string.replace(
'{safe_relative_path}', get_safe_string(rel_path))
return formatted_string
'''
Creates a list of tests in the format described by _format.
If the output file already exists, read it in and compare it to the newly
generated list. Only write formatted list if it differs from the existing data.
This prevents build systems from reconfiguring if nothing has changed.
'''
def make_test_list(_test_set, _input_dir, _output_dir, _filename, _format, _sort_by_size):
# The test list *must* be sorted in order to reliably compare against the
# list already stored on disk.
#
# Tests may be sorted either in descending order by file size, or
# alphabetically. The reasoning behind the option to sort by file size is
# that tests with larger file sizes are likely to take longer, and therefore
# should be run first (in a system that runs multiple tests in parallel) to
# improve overall test time.
if _sort_by_size:
sorted_test_list = sorted(_test_set, key=lambda f: (os.stat(f).st_size, f), reverse=True)
else:
sorted_test_list = sorted(_test_set)
# Set output path
output_path = os.path.join(_output_dir, _filename)
# Generate a new test list with each line of the form:
# <cmd>(<file>)
# where <file> is relative to _input_dir.
# Note: CMake requires '/' as its path separator and thus we replace all
# instances of '\' with '/' in the path.
new_test_list = ''
for file in sorted_by_size:
relative_file = os.path.relpath(file, _input_dir)
new_test_list += _cmd + '(' + relative_file.replace ('\\', '/') + ')\n'
output_path = os.path.abspath(os.path.join(_output_dir, _filename))
# Generate the formatted test list.
formatted_test_list = ''
for test in sorted_test_list:
formatted_test_list += format_string(_format, os.path.relpath(test, _input_dir)) + '\n'
# Remove some issues with the final EOL characters
new_test_list = new_test_list.rstrip()
formatted_test_list = formatted_test_list.rstrip()
# Attempt to read existing file
try:
......@@ -44,31 +76,59 @@ def make_test_list(_test_set, _input_dir, _output_dir, _filename, _cmd):
existing_test_list = ''
# Write new list if it is different to existing list
if (new_test_list == existing_test_list):
if (formatted_test_list == existing_test_list):
print('No changes were made to "' + _filename + '".')
else:
with open(output_path, 'w') as fw:
fw.write(new_test_list)
fw.write(formatted_test_list)
print('"' + _filename + '" has been updated.')
'''
This function scans an input directory (_input_dir) for test files and writes
their paths into _cpp_out_filename and _other_out_filename in the output
directory (_output_dir). These output files are in a format that can be consumed
by CMake/CTest.
Test files fall into two categories:
* C++ tests: those with a C++ extension (in the set _cpp_exts); and
* Other tests: those with an allowed extension (in the set _other_exts)
This function recursively scans an input directory (_input_dir) for test files
and writes their relative paths in a prescribed format into files in the output
directory (_output_dir).
Individual types of tests are defined by the _params_list argument, which must
be of the form:
[
{
"name": "NAME",
"exts": {"EXT1", "EXT2", ...},
"frmt": "FORMAT",
"file": "FILE"
},
...
]
* NAME is the name of the test type, used to display statistics about the tests
found;
* EXT1, EXT2, ... are allowed file extensions for the current test type;
* FORMAT describes the format of the line that is output for each test file:
- {relative_path} is replaced by the path of the test file relative to
_input_dir,
- {safe_relative_path} is replaced by a 'safe' version of {relative_path};
* FILE is the path of the output file (relative to _output_dir).
Any test "<name>.<ext>" with an associated .ignore file "<name>.<ext>.ignore"
will be ignored by the scan.
An example _params_list argument could be:
[{
"name": "C++",
"exts": {".cpp"},
"frmt": "add_cpp_test({relative_path})",
"file": "cpp_tests_list.txt"
}]
If _sort_by_size == True (default, slower), the tests will be sorted in
descending order by size before being output. If False, they will be sorted
alphabetically (faster).
'''
def scan(_input_dir, _output_dir,
_cpp_exts, _cpp_cmd, _cpp_out_filename,
_other_exts, _other_cmd, _other_out_filename):
def scan(_input_dir, _output_dir, _params_list, _sort_by_size=True):
# Check _input_dir is valid
if not Path(_input_dir).is_dir():
raise ValueError('Error: invalid test directory ("' + _input_dir + '").')
......@@ -79,33 +139,35 @@ def scan(_input_dir, _output_dir,
print("Scanning for tests...", end='', flush=True)
ignored_tests = set()
cpp_tests_all = set()
other_tests_all = set()
ignored_tests = set()
tests_list = [set() for _ in range(len(_params_list))]
# Scan all test input directory files and sort them into different sets.
for p, d, f in os.walk(_input_dir):
for p, _, f in os.walk(_input_dir):
for file in f:
if file.endswith('.ignore'):
# Tests to be ignored (remove .ignore suffix before storing)
ignored_tests.add(os.path.join(p, file[:-7]))
elif get_file_extension(file) in _cpp_exts:
# C++ tests
cpp_tests_all.add(os.path.join(p, file))
elif get_file_extension(file) in _other_exts:
# Any other tests
other_tests_all.add(os.path.join(p, file))
else:
# Sort tests (based on extensions) into the relevant sets
for tests, params in zip(tests_list, _params_list):
if get_file_extension(file) in params["exts"]:
tests.add(os.path.join(p, file))
break
print(" ...complete.")
print(" ...complete.\n", flush=True)
# Remove ignored tests from the sets of matching tests
cpp_tests = cpp_tests_all.difference(ignored_tests)
other_tests = other_tests_all.difference(ignored_tests)
tests_list = [tests.difference(ignored_tests) for tests in tests_list]
# Display statistics
print("Number of C++ tests =", len(cpp_tests))
print("Number of other tests =", len(other_tests))
print("Total number of tests =", len(cpp_tests) + len(other_tests))
make_test_list(cpp_tests, _input_dir, _output_dir, _cpp_out_filename, _cpp_cmd)
make_test_list(other_tests, _input_dir, _output_dir, _other_out_filename, _other_cmd)
tests_count = 0
for tests, params in zip(tests_list, _params_list):
print("Number of", params["name"], "tests =", len(tests))
tests_count += len(tests)
print("Total number of tests =", tests_count)
# Write tests lists to the appropriate files if necessary
print()
for tests, params in zip(tests_list, _params_list):
make_test_list(tests, _input_dir, _output_dir, params["file"], params["frmt"], _sort_by_size)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment