Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Auth not working #1495

Open
jannisborgers opened this issue Sep 5, 2024 · 5 comments
Open

Basic Auth not working #1495

jannisborgers opened this issue Sep 5, 2024 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@jannisborgers
Copy link

I found this thread after I tried lychee on multiple of my sites when deploying them to a staging environment and always had problems of no results.

Basic auth is a must in these situations, so I was happy that lychee supports it.

As an additional complication, the CMS we’re using sends an X-Robots-Tag: none response header in staging environments, as it is deliberately not supposed to be indexed. Is that something that lychee supports, or does it ignore that header? From the messages in the above thread, I could not find out if robots.txt is ignored at the moment, or not.

Right now, I get the following response:

🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors

The format I’m using is:

lychee --basic-auth 'user:password https://subdomain.domain.tld' https://subdomain.domain.tld

There are about 500 links on that page, I verified with curl that the basic auth is working correctly. It returns the HTML response.

@mre
Copy link
Member

mre commented Sep 5, 2024

lychee ignores all response headers and robots.txt because we're not indexing the page. The problem must be elsewhere.

Can you save the html into a file and use that as the input?

lychee -vvv foo.html

If that works, it's because the website doesn't serve the HTML to lychee. In that case you can try curl as a user agent.
https://lychee.cli.rs/troubleshooting/network-errors/#try-a-different-user-agent

If that doesn't work, lychee might have issues passing the URLs from the document. In that case, could you post an expert of the HTML file?

@jannisborgers
Copy link
Author

Hi @mre — thanks for the quick response!

Locally

I tried it on the index.html file locally, with the exact command you provided, and it yields:

🔍 529 Total (in 1s) ✅ 6 OK 🚫 518 Errors 💤 4 Excluded

The errors come from the static html usign the password-protected URLs, so this is the result I was expecting from the local version.

cURL user-agent

I tried using lychee with the cURL user-agent like described, but it still yields:

🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors

But using curl like this works and returns the HTML correctly:

curl user:password https://subdomain.domain.tld

So the way I see it, cURL itself is working, but lychee isn’t, even when using cURL as a user agent.

Basic auth problem?

I suspect that the basic auth of lychee is the source of the problem. I tested on other staging sites that had basic auth and they all came back empty-handed. The output is the same as when I omit --basic-auth on sites with basic auth:

lychee https://protected-subdomain.domain.tld
🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors

Other tools work

I used linkchecker as an alternative, as it also provides basic-auth functionality, and it works correctly on the same URL:

linkchecker -u username -p password https://subdomain.domain.tld

Either I’m using lychee’s --basic-auth flag wrong, or that functionality is not working correctly.

@mre
Copy link
Member

mre commented Sep 6, 2024

Oh, right, I should have read your initial message correctly.

Basic auth syntax is actually:
'example.com user:pwd'
Your version is the other way round.
Please try that. If it doesn't work, add the user agent as an additional parameter as well. If that doesn't work, it's a bug.

We should probably add a documentation page or document the syntax here: https://lychee.cli.rs/troubleshooting/network-errors/

@mre mre added the question Further information is requested label Sep 6, 2024
@ul8
Copy link

ul8 commented Sep 8, 2024

@mre I have the exact same issue. I'm using the basic auth param in the right order, tried both with https:// and without. Works fine on sites without auth.

@mre mre changed the title No results for staging site. X-Robots-Tag: none a problem? Basic Auth not working Sep 9, 2024
@mre
Copy link
Member

mre commented Sep 9, 2024

Indeed. I tried it myself and it doesn't work as advertised; sorry for the inconvenience.

Here's what I did:

  1. Created a webserver with basic auth which serves some links behind the auth:
import http.server
import socketserver
import base64
import os

# Set username and password for basic auth
USERNAME = 'testuser'
PASSWORD = 'testpass'

class BasicAuthHandler(http.server.SimpleHTTPRequestHandler):
    def do_GET(self):
        # Check for Authorization header
        auth_header = self.headers.get('Authorization')
        if auth_header is None:
            self.send_response(401)
            self.send_header('WWW-Authenticate', 'Basic realm="Test realm"')
            self.end_headers()
        elif auth_header.startswith('Basic '):
            # Verify credentials
            credentials = base64.b64decode(auth_header[6:]).decode('utf-8')
            username, password = credentials.split(':')
            if username == USERNAME and password == PASSWORD:
                # Serve the requested file
                return http.server.SimpleHTTPRequestHandler.do_GET(self)
        
        self.send_response(401)
        self.end_headers()

# Create a simple HTML file with links
html_content = """
<!DOCTYPE html>
<html>
<body>
    <h1>Test Links</h1>
    <ul>
        <li><a href="https://www.example.com">Example.com</a></li>
        <li><a href="https://www.google.com">Google.com</a></li>
        <li><a href="https://www.github.com">GitHub.com</a></li>
    </ul>
</body>
</html>
"""

# Write the HTML content to a file
with open('index.html', 'w') as f:
    f.write(html_content)

# Set up and start the server
PORT = 8000
Handler = BasicAuthHandler

with socketserver.TCPServer(("", PORT), Handler) as httpd:
    print(f"Serving at port {PORT}")
    print(f"Username: {USERNAME}")
    print(f"Password: {PASSWORD}")
    httpd.serve_forever()

Then I started the server

python test.py

and then I ran lychee

lychee -vvv --basic-auth 'http://localhost:8000 testuser:testpass' http://localhost:8000
🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors

I saw an error on the Python server:

python test.py
Serving at port 8000
Username: testuser
Password: testpass
127.0.0.1 - - [09/Sep/2024 12:09:46] "GET / HTTP/1.1" 401 -
127.0.0.1 - - [09/Sep/2024 12:09:46] "GET / HTTP/1.1" 401 -

curl works as expected

curl -v -u testuser:testpass http://localhost:8000

So, something is off. Either it doesn't work at all, or I forgot how to use it.

It's strange, because we have tests for it:

lychee/lychee-bin/tests/cli.rs

Lines 1370 to 1429 in 53d234d

async fn test_basic_auth() -> Result<()> {
let username = "username";
let password = "password123";
let mock_server = wiremock::MockServer::start().await;
Mock::given(basic_auth(username, password))
.respond_with(ResponseTemplate::new(200))
.mount(&mock_server)
.await;
// Configure the command to use the BasicAuthExtractor
main_command()
.arg("--verbose")
.arg("--basic-auth")
.arg(format!("{} {username}:{password}", mock_server.uri()))
.arg("-")
.write_stdin(mock_server.uri())
.assert()
.success()
.stdout(contains("1 Total"))
.stdout(contains("1 OK"));
Ok(())
}
#[tokio::test]
async fn test_multi_basic_auth() -> Result<()> {
let username1 = "username";
let password1 = "password123";
let mock_server1 = wiremock::MockServer::start().await;
Mock::given(basic_auth(username1, password1))
.respond_with(ResponseTemplate::new(200))
.mount(&mock_server1)
.await;
let username2 = "admin_user";
let password2 = "admin_pw";
let mock_server2 = wiremock::MockServer::start().await;
Mock::given(basic_auth(username2, password2))
.respond_with(ResponseTemplate::new(200))
.mount(&mock_server2)
.await;
// Configure the command to use the BasicAuthExtractor
main_command()
.arg("--verbose")
.arg("--basic-auth")
.arg(format!("{} {username1}:{password1}", mock_server1.uri()))
.arg("--basic-auth")
.arg(format!("{} {username2}:{password2}", mock_server2.uri()))
.arg("-")
.write_stdin(format!("{}\n{}", mock_server1.uri(), mock_server2.uri()))
.assert()
.success()
.stdout(contains("2 Total"))
.stdout(contains("2 OK"));
Ok(())
}

That said, the tests could be better, though. We don't have any negative tests (e.g. when the credentials are not provided) and we also don't check the return code, which should be 200 in case of success and 401 in case of error.

@mre mre added bug Something isn't working help wanted Extra attention is needed and removed question Further information is requested labels Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants