Skip to content

Fastest way to fetch the web content(HTML stream) from server, supports:redirects, auto decode(e.g.:Chinese), gzip, cookie, proxy...

License

Notifications You must be signed in to change notification settings

Tjatse/req-fast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

req-fast

Build Status NPM version

This module is designed to be the fast, lightweight way to fetch the web content(HTML stream) from specific server. it supports:

  • Follow Redirects
  • Automatic Decoding Content Encodings(Avoid Messy Codes, Especially Chinese)
  • Cookies
  • JSON Response Auto Handling
  • Gzip/Deflate Encoding(Automatic Decompress)
  • Proxy

Installation

$ npm install req-fast --production

Debug

$ DEBUG=reqfast.* node ...

Usage

var req = require('req-fast');
req(options, callback);

Options

When options is instance of String, it means the URL of server that to be requested.

req('http://www.google.com', function(err, resp){
  // code goes here...
});

Otherwise it should be an object, including:

  • uri || url Url to which the request is sent.

  • method Http method, GET as default, but if data was set and this value was undefined, it will be POST. And it could be one of OPTIONS, GET, HEAD, POST, PUT, PATCH, DELETE, TRACE and CONNECT.

  • timeout Set a timeout (in milliseconds) for the request, 60000(60 seconds) by default.

  • dataType Type of data that you are expecting send to server, this property effects on POST, PUT, PATCH method only. It could be below values:

    • json content-type equals application/json.
    • form content-type equals application/x-www-form-urlencoded.
  • data Data to be sent to the server, it should be key/value pairs. If the method is not set to POST, it will be converted to a query string, and appended to the url.

  • agent A value indicating whether automatic generating browser-like user-agent or not, i.e.:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.101 Safari/537.36, true as default.

    Once user-agent was generated, the Process finished with exit code 0 thing will not happen unless triggered manually, i.e.: COMMAND+C or process.exit(0).

  • charset Set charset of content encodings if necessary.

    This option takes top priority of decoding chunks, if not set, the charset in response.headers['content-type'] will be used at first, then the charset on <meta ... />.

  • disableRedirect A value indicating whether disable following redirect or not, if this value was set to true, the maxRedirects will has no effect.

  • maxRedirects The maximum number of redirects to follow(3 as default).

  • disableGzip Request compressed content from server and automatic decompress response content, if this option sets to true, this feature will be disabled.

  • trackCookie A value indicating whether gathering all the cookies when following redirect or not, false by default, false means gathering the cookie of last request only.

  • cookies It should be key/value pairs.

  • headers Http headers, it should be key/value pairs, and by default:

    {
      'connection': 'keep-alive',
      'accept': 'text/html, text/javascript, application/json, application/xhtml+xml, application/xml;q=0.9, */*;q=0.8',
      'pragma': 'no-cache',
      'cache-control': 'no-cache'
    }

    You can override the aboving in headers.

  • proxy The proxy including all the options from tunnel proxy:

    • host A domain name or IP address of the server to issue the proxy request to.
    • port Port of remote proxy server..
    • localAddress Local interface if necessary.
    • proxyAuth Basic authorization for proxy server if necessary, i.e. username:password.
    • headers An object containing request headers.

Callback

Function to be called if the request succeeds or fails. The function gets passed two argument:

  • error The Error instance. if succeeds, this value should be null. If status is not okay, error.message should be one of http.STATUSCODES.
  • response the response object, including:
    • body The response body. If response.headers['content-type'] equals application/json, the data(response.body) back from server will be parsed as JSON automatic, otherwise is String.
    • cookies The response cookies(key/value pairs).
    • headers The response headers(key/value pairs).
    • redirects The urls redirect(Array).
    • statusCode The response status code.

see test or examples folder for a complete example

Streaming

Stream is amazing in node.js, if you are interesting on it, read John's Blog. You can add listeners on the returning Stream if you want.

var rs = req([options]);
rs.on('data', function(chunk){
  // ...
});
rs.on('end', function(resp){
  // ...
});
rs.on('error', function(error, response){
  // ...
});
rs.on('abort', function(){
  // ...
});

Pipe to file

In my project downloading millions of files from servers, using pipe could improving performance, the file downloading from server chunk by chunk, but not read whole file to memory then download once, it sucks.

var fs = require('fs');
req('http://example.com/beauty.gif').pipe(fs.createWriteStream('download/001.gif'));

Http Status

All the http statuses will be handled, but you'd better check status carefully.

req('http://example.com', function(err, resp){
  if(err){
    // get status error;
  }
  // statusCode always exist except STREAM `error` was caught.
  var status = resp && resp.statusCode;
})

Proxy

req({
  url: 'http://example.com',
  proxy: {
    host: '127.0.0.1',  // host
    port: 8082,         // port
    proxyAuth: 'user:password'  // authentication if necessary.
  }
}, function(err, resp){
  // code goes here
});

Benchmark

It's comparing with request module, in order to avoid the influence of network, all the requests are sent to localhost. The test cases are just for referencing, it's not trustworthy ^^.

Run Server

node --harmony benchmark/server.js

Elapsed Time

node --harmony benchmark/elapsed_time.js
A sample of 1000 cases:

module  avg min max
request 0.005ms 0ms 2ms
reqfast 0.001ms 0ms 1ms

completed

Memory Usage

node --harmony benchmark/memory_usage.js
A sample of 1000 cases:

module  avg     min max
request 204.8b  0b  110592b
reqfast 8.192b  0b  4096b

completed

GC effects these a lot, and I do not believe the result of process.memoryUsage().rss, request should performances better maybe.

Tests

Most tests' requests are sent to httpbin, so if you wanna run the test, please make sure you can resolve the host(httpbin). Run test:

npm test

Thanks

Appreciate to andris9. I've used fetch for a long time, it's very fast and simple to use.

my ES Spider needs speed up, request is very powerful, but too heavy/slow to me, and can not automatic decode encodings, especially Chinese.

Unfortunately andris9 could not maintain his repository any more, it have bugs, also I can fix them in my project, but it's fussy. One more, I need a PROXY feature.

License

Copyright 2014 Tjatse

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Fastest way to fetch the web content(HTML stream) from server, supports:redirects, auto decode(e.g.:Chinese), gzip, cookie, proxy...

Resources

License

Stars

Watchers

Forks

Packages

No packages published