-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type annotations #118
Add type annotations #118
Conversation
- Add a typing configuration file with pyright set to strict. - Add annotations for normalizers.py and validators.py.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…ll as a py.typed file. - The py.typed file is very early, but it's necessary to verify type completeness via `pyright --verifytypes`. Seems like the right kind of typing check, but that's assuming pyright and not mypy is ultimately used. - Reference: https://microsoft.github.io/pyright/#/typed-libraries?id=verifying-type-completeness - Add typing testenv to tox.ini, which will run `pyright --verifytypes rfc3986`. - Add one more matrix slot to the GitHub workflow to run the above typing check in CI on the lowest supported version of python on Ubuntu. - I only added it for Ubuntu because this package and its dependencies are pure python, so the types shouldn't change between operating systems.
…s/rfc3986 into feature/just-annotations
- TODO: Consider changing the insides of the URIBuilder.add_* methods to use `type(self)(...)` instead of `URIBuilder(...)`. That way, subclassing is supported. Would result in the return annotations being Self as well. - Added trailing commas to function parameters in normalizers.py and validators.py for functions with multi-line signatures.
- I'm not sure if these functions are truly necessary if the library no longer supports Python 2, but the use of it with `encoding` is prevalant enough around the library that it seems worth typing. - The overloads are necessary to account for None being a passthrough value. - Two cases of this are `ParseResult.copy_with()` and `ParseResultBytes.copy_with()`. The `attrs_dict` dictionary in those methods is allowed to have None, and None is allowed for all of the component parameters (from what I can tell). Thus, whether intentional or not, `compat.to_(bytes|str)()`'s ability to let values pass through without modification, so long as they aren't bytes, is depended upon.
- These are passed into `urllib.parse.urlencode()`, so the annotation for that was copied from typeshed and modified slightly. - `_QueryType` in typeshed uses `typing.Sequence` to account for different types of sequence values being passed in, but `URLBuilder.extend_query_with()` in builder.py uses `isinstance(query_items, list)` as flow control, so Sequence is too wide to account for what that method allows. Thus, the type was modified to use `typing.List` in place of `typing.Sequence` where necessary. - Arguably, that isinstance check should be changed to check against `Sequence` instead, but I'd prefer having that double-checked, and this PR's scope is currently mostly limited to annotations anyway. This can be revisited later if necessary. - TODO: Ask if `username is None` check is still needed in `URLBuilder.add_credentials()` if the `username` parameter is annotated with `str`.
for more information, see https://pre-commit.ci
- Ignore need for docstrings in function overloads. - Ignore a line being too long due to a pyright: ignore + explanation. - Moving the explanation away from that line doesn't make the line short enough to pass. Removing the error code as well would be enough, but removing both has 2 downsides: a) The lack of explanation on the same line makes it harder to immediately see why the pyright: ignore was added when grepping. b) Having an unrestricted pyright: ignore (or type: ignore) can cover up other typechecking errors, which could cause a problem down the line. - Having a pyright: ignore, a noqa, and fmt: off/on for this one line isn't clean, but I'd like a second opinion on how to handle it.
for more information, see https://pre-commit.ci
Something I'm unsure about: the use of If the public API is meant to take only utf-8-encoded strings, then the compat functions can be eliminated, the encoding parameters everywhere can be made superfluous, and many things can be annotated as |
- `uri` can't be eagerly imported within `misc` without causing a circular import, but is necessary for getting the most currently correct parameter type.
Something to consider: the stdlib counterparts to Some ideas for alternatives:
1 feels like a way to avoid using a generic (due to runtime limitations) in a place where it makes sense to use it, and makes the classes harder to maintain and possibly extend as a result. 2 would be most faithful to the inheritance hierarchy and implementation of the stdlib counterparts, I think. 3 would be a significant change in API, assuming the namedtuple interface and inheritance was considered a guaranteed part of the API. Imo, 2 is the least breaking but also the least forward-moving, if that makes sense. Not my choice, ultimately, but I figured I'd lay out what options I could think of. EDIT: These circumstances somewhat applies to |
…ified from typeshed.
Followup regarding my ideas for Same sentiment applies to some degree for the compat functions and "str vs. Union[bytes, str]" questions. (I'll note that discussion in Discord indicated that I'd misunderstood how str and bytes work in Python, and now I'm thinking the compat functions are necessary, the encoding parameters are necessary, and the main API functions should be documented as accepting bytes and str). Didn't meant to pile everything on in here at once. Hoping this reduces the pressure of the questions & comments I've thrown out so far. |
So yes, this project started in 2014 when the deprecation of 2.7 was far far away. I think that there's benefit to accepting both as someone may be dealing with raw bytes from a socket and not want to deal with round-tripping or knowing the encoding the other end intended (as enough specifications and implementations predate good utf8 support). The doc strings should be clarified and the signatures should be clear about this too. |
I think 2 is likely better until we can match the typeshed behavior "inline" by dropping everything before 3.11. I know it has limitations but I'd rather keep the code as consistent as possible first then iterate on it to get to a better typing place if necessary. |
… well. - This provides parity with the way urllib.parse is typed and the runtime implementation, since bytearray isn't a str and also has a `decode(encoding)` method. - Adjusted compat functions to also accept bytearray for the same reasons. - Adjusted the parameter types of the other functions called in api.py that are part of the `(U|I)RIReference` interfaces to make sure api.py fully type-checks. - TODO: Consider making a typealias for `t.Union[str, bytes, bytearray]` and using that everywhere to avoid verbosity? - TODO: For the **kwargs functions in api.py and `URLMixin.is_valid()`, consider enumerating the possible keyword-only parameters?
- Substituted namedtuple inheritance with common base `typing.NamedTuple` subclass in misc.py, since these classes share almost the exact same interface. - Added a _typing_compat.py module to be able to import typing.Self, or a placeholder for it, in multiple other modules without bloating their code. - Added basic method annotations to the two reference classes. - Not annotations-related: - Move the __hash__ implementation over to IRIReference from URIMixin to be congruent with URIReference. - Made the __eq__ implementations more similar to avoid different behavior in cases of inheritance (rare as that might be). - Added overloads to `normalizers.normalize_query` and `normalizers.normalize_fragment` to clearly indicate that None will get passed through. This behavior is relied upon by the library currently. - Note: The runtime-related changes can be reverted and reattempted later if need be. Still passing all the tests currently.
Ah, CI is hitting the conflict between |
…er and close enough to urlllib.parse's interfaces. - bytearray would end up pervading everything, I think, to the point where it's not worth it. I was hasty in adding those initially.
- After review, we don't actually need a generic namedtuple to make this work in a type stub. Inline annotations seemingly work fine so far and don't significantly change the runtime. This might be option 4, which can hold until 3.11 is the lowest supported version. - However, in the meantime, `ParseResultMixin` can be made generic in a helpful way with `typing.AnyStr`. - `port` is a strange case and might be annotated too narrowly or incorrectly; based on some of the ways that it's populated, it might sometimes be an int. - Prefixed `_typing_compat.Self` imports with underscore to avoid poluting namespace with public variables.
I'm going to try option 4 real quick, which is throwing class-level annotations in there to see if it's enough (see changes to parseresult.py in my most recent PR at this point). I might've overestimated the scale of the problem if this works, but we'll see. Figured it's worth a shot. |
- Made int | str order consistent. - The ParseResultMixin shim properties are now marked with t.Optional, since what they wrap can be None.
@@ -19,6 +19,10 @@ jobs: | |||
- os: windows-latest | |||
python: '3.12' | |||
toxenv: py | |||
# typing | |||
- os: ubuntu-latest | |||
python: '3.8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still bitter there's no way to tell GHA that "Hey, I have this list elsewhere of things I want to run against, can you just pick the 'earliest'/'oldest'/'smallest' so that when I update the list I don't have to update every goddamn reference to remove the oldest?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a novice at best when it comes to GHA and just went off of what the rest of the workflow looked like, so I have no idea if the technology exists, lol. It is a bit annoying, but I figured if a better way is found and it does get refactored, this would temporarily showcase what the output would look like in CI for pyright --verifytypes
and either help or hurt the case for switching to and/or including mypy :D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm just griping aloud, not hoping you would magically fix GitHub's product. 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when you think this is ready for merge/final review
It feels close as a first step. One thing I wanted to check, though: is it okay if I haven't adjusted any of the documentation to account for these annotations yet (e.g. the types in the docstrings)? I could try to put such changes in this one or defer it to another PR. |
…ment without warnings from a type checker. - Also add a noqa to _mixin.URIMixin.resolve_with, since the extra `if TYPE_CHECKING`` statement pushed it over the complexity limit. Co-authored-by: Ian Stapleton Cordasco <[email protected]>
Other than some minor nits, this seems ready for final review. I think I've mostly managed to avoid functional changes, so it shouldn't break anyone in theory. |
…omponent_is_valid` to avoid testing via `int(...)`. - Also makes the linters and type-checker happier.
I think https://github.com/sigmavirus24/github3.py/blob/a66800d1ba5e9bc4fee026c94404aeb82b6c0b6d/pyproject.toml#L97-L100 is the example config that mostly gets us to the happy medium between reorder-python-imports, black, and the flake8-import-order style I prefer/enforce here |
I'll try adjusting the tooling config to switch to that, then.
I'll try adjusting the tooling config to that, then. |
…ong with some other minor changes. - Update .pre-commit-config.yaml and tox.ini in accordance to the above. - Run `pre-commit autoupdate` while I'm at it. Can be reverted if need be. - Somewhat centralize black and isort config in pyproject.toml to avoid having to keep it in sync in multiple places.
for more information, see https://pre-commit.ci
Regarding the other failing CI checks: Would you be receptive to enhancing the current coverage config by replacing it with, say, covdefaults? I might be preaching to the choir here, but it would help hitting 100% coverage easier in future commits easier, imo:
Just an idea that came to mind while looking at the GH actions results. |
I'm not opposed to covdefaults in a future change but for now things have been working fine as it is, so this seems more of a signal of missing test coverage but I haven't looked deeply again at this. My availability is spotty for several weeks so this either needs to pass with very minimal additional changes or we need look at covdefaults in a separate PR that lands before this one and then rebase this after that merges. |
…G` blocks and b) lines that are only ellipses, as well as some temporary pragma comments in _typing_compat.py. This seems to account for most of the missing coverage with the current configuration, excluding line 447 in validators.py.
…s/rfc3986 into feature/just-annotations
…comment for justification.
Okay, those should take care of the coverage metrics without any egregious changes. cc @sigmavirus24. |
Ah, right, pyright doesn't support less than 100% coverage. We could easily create a small wrapper script that reads the percentage from the output of Alternatively, it could be ignored for now. Not my best work, but something like this could be thrown in a """This script is a shim around `pyright --verifytypes` to determine if the
current typing coverage meets the expected coverage. The previous command by
itself won't suffice, since its expected coverage can't be modified from 100%.
Useful while still adding annotations to the library.
"""
import argparse
import json
import subprocess
from decimal import Decimal
PYRIGHT_CMD = ("pyright", "--verifytypes", "rfc3986", "--outputjson")
def validate_coverage(inp: str) -> Decimal:
"""Ensure the given coverage score is between 0 and 100 (inclusive)."""
coverage = Decimal(inp)
if not (0 <= coverage <= 100):
raise ValueError
return coverage
def main() -> int:
"""Determine if rfc3986's typing coverage meets our expected coverage."""
parser = argparse.ArgumentParser()
parser.add_argument(
"--fail-under",
default=Decimal("75"),
type=validate_coverage,
help="The target typing coverage to not fall below (default: 75).",
)
parser.add_argument(
"--quiet",
action="store_true",
help="Whether to hide the full output from `pyright --verifytypes`.",
)
args = parser.parse_args()
expected_coverage: Decimal = args.fail_under / 100
quiet: bool = args.quiet
try:
output = subprocess.check_output(
PYRIGHT_CMD,
stderr=subprocess.STDOUT,
text=True,
)
except subprocess.CalledProcessError as exc:
output = exc.output
verifytypes_output = json.loads(output)
raw_score = verifytypes_output["typeCompleteness"]["completenessScore"]
actual_coverage = Decimal(raw_score)
if not quiet:
# Switch back to normal output instead of json, for readability.
subprocess.run(PYRIGHT_CMD[:-1])
if actual_coverage >= expected_coverage:
print(
f"OK - Required typing coverage of {expected_coverage:.2%} "
f"reached. Total typing coverage: {actual_coverage:.2%}."
)
return 0
else:
print(
f"FAIL - Required typing coverage of {expected_coverage:.2%} not "
f"reached. Total typing coverage: {actual_coverage:.2%}."
)
return 1
if __name__ == "__main__":
raise SystemExit(main()) EDIT: Prior art in trio, though their script is more complicated and searches the pyright output for other data. EDIT 2: Edited script draft to hopefully be clearer. |
@Sachaa-Thanasius let's do that here and get CI green |
I just threw the script in the tests directory for now. Easy to change based on need or preference. |
Thanks so much for working on this! |
No problem. Thanks for all the feedback! |
First off, apologies for the last PR. It was unreviewable, not productive to have as an open PR, and overall bad form. I hope I can show I've learned from my mistakes and thus that this one is a far sight better.
The scope of this PR is just adding a bit of typing configuration, *some typing checks, and some type annotations to make things more reasonable to review and manageable to iterate on. More than willing to take feedback on how to go about this, but I've started small (at least, I hope it's small). *Any guidance would be appreciated.
(I used #105 and some of the comments made in there to inform some starting choices. For example,
import typing as t
was suggested as a way to keep annotation verbosity low (or so I'm guessing) while maintaining namespace separation (which was actually said), so I used that here.)EDIT: Voids #115, closes #119.