Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #249 - Add strip to get() and getall() #260

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions parsel/selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,30 +200,36 @@ def re_first(
return el
return default

def getall(self) -> List[str]:
def getall(self, *, strip: bool = False) -> List[str]:
"""
Call the ``.get()`` method for each element is this list and return
their results flattened, as a list of strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new option needs to be mentioned in the docstring.

"""
return [x.get() for x in self]
data = [x.get() for x in self]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just pass strip to get() instead of essentially reimplementing the new code from get() here?
(granted, the code is almost trivial, but still)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is a Selector, at that point neither default nor strip exist anymore
Only SelectorList recognizes those two fields

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About adding the strip argument to the other overloads, wouldn't that remove the purpose of the overload on those?
Wouldn't having an overloaded default + strip be just going back to the non-overloaded method signature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is a Selector, at that point neither default nor strip exist anymore

Aren’t you also adding strip to Selector? Can’t you use data = [x.get(strip=True) for x in self]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That ended up looking harder than it should when I tried.
Also as far as I could see, default also is not on the Selector, so for standardization, I believe adding it would also be required. And in that case I don't understand why it wasn't added up until now, it gives me the feeling it is not a trivial task given the current implementation of get on Selector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are changing Selector.get() to have strip and it already has default so I think there shouldn't be any problems with using it, and it indeed doesn't look too complicated to me :)

Also if you have issues with typing I can hep, I'm working on improving parsel typing right now.

this_doesnt_work = sel.xpath("//ul/li[position()>1]")[0].get(default="6") # Selector doesnt have default field

sel.xpath(...).get(default=...) works in normal parsel without your changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About adding the strip argument to the other overloads, wouldn't that remove the purpose of the overload on those?

No, the purpose of those overloads is to show that the type of the return value depends on whether default is None or not.

Wouldn't having an overloaded default + strip be just going back to the non-overloaded method signature?

I believe just adding strip to the existing overloads is enough, no new overloads are needed.

Copy link
Member Author

@felipeboffnunes felipeboffnunes Oct 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sel.xpath(...).get(default=...) uses SelectorList, not Selector. The problem is that by indexing [0] then you go to a Selector, and it does not have default .
but either way I still have to give it a look

Also, it seems a little unintuitive to be using the [0] when doing a get operation, as you are already asking the SelectorList to get the first instance it found. I used it on my tests because I saw some of the tests use this kind of indexing to prove Selector behaves correctly, and in this case, I believe you should be able to do both default and strip directly on a Selector, but you can't for a Selector at the moment so maybe this needed further discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, you are right. Selector.get() is much simpler than SelectorList.get(). Well, I think it makes sense and is easy to add strip= to Selector.get().

it seems a little unintuitive to be using the [0] when doing a get operation, as you are already asking the SelectorList to get the first instance it found

Yeah, it's probably unusual to write <SelectorList>[0].get(). It's OK to have a Selector and call a get) on it though.

I believe you should be able to do both default and strip directly on a Selector, but you can't for a Selector at the moment so maybe this needed further discussion.

Not so sure about default=, but maybe there are use cases where it's useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep this as is, strip for Selector.get() is less useful and can be implemented separately.

if strip:
return [x.strip() if x else x for x in data]
return data

extract = getall

@typing.overload
def get(self, default: None = None) -> Optional[str]:
def get(self, default: None = None, strip: bool = ...) -> Optional[str]:
pass

@typing.overload
def get(self, default: str) -> str:
def get(self, default: str, strip: bool = ...) -> str:
pass

def get(self, default: Optional[str] = None) -> Optional[str]:
def get(
self, default: Optional[str] = None, strip: bool = False
) -> Optional[str]:
"""
Return the result of ``.get()`` for the first element in this list.
If the list is empty, return the default value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new option needs to be mentioned in the docstring

"""
for x in self:
return x.get()
value = x.get()
return value.strip() if strip and value else value
return default

extract_first = get
Expand Down
47 changes: 47 additions & 0 deletions tests/test_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,53 @@ def test_selectorlist_get_alias(self) -> None:
self.assertEqual(sel.xpath("//ul/li").get(), '<li id="1">1</li>')
self.assertEqual(sel.xpath("//ul/li/text()").get(), "1")

def test_selector_get_strip(self) -> None:
body = '<ul><li id="1">1</li><li id="2"> 2 </li><li id="3">3</li></ul>'
sel = self.sscls(text=body)

self.assertEqual(
sel.xpath("//ul/li[position()>1]").get(), '<li id="2"> 2 </li>'
)
self.assertEqual(
sel.xpath("//ul/li[position()>1]").get(strip=True),
'<li id="2"> 2 </li>',
)
self.assertEqual(
sel.xpath("//ul/li[position()>1]/text()").get(), " 2 "
)
self.assertEqual(
sel.xpath("//ul/li[position()>1]/text()").get(strip=True), "2"
)

def test_selector_getall_strip(self) -> None:
body = (
'<ul><li id="1">1</li><li id="2"> 2 </li><li id="3"> 3</li></ul>'
)
sel = self.sscls(text=body)

self.assertEqual(
sel.xpath("//ul/li").getall(),
[
'<li id="1">1</li>',
'<li id="2"> 2 </li>',
'<li id="3"> 3</li>',
],
)
self.assertEqual(
sel.xpath("//ul/li").getall(strip=True),
[
'<li id="1">1</li>',
'<li id="2"> 2 </li>',
'<li id="3"> 3</li>',
],
)
self.assertEqual(
sel.xpath("//ul/li/text()").getall(), ["1", " 2 ", " 3"]
)
self.assertEqual(
sel.xpath("//ul/li/text()").getall(strip=True), ["1", "2", "3"]
)

def test_re_first(self) -> None:
"""Test if re_first() returns first matched element"""
body = '<ul><li id="1">1</li><li id="2">2</li></ul>'
Expand Down