Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Protocol for bootstrapping when primary Hackage instance is unreachable #171

Open
hvr opened this issue Oct 6, 2016 · 1 comment
Open

Comments

@hvr
Copy link
Member

hvr commented Oct 6, 2016

I wasn't sure whether to file this here or in Cabal's issue tracker, but I think this method can be generalised, so I'm documenting it here for now:

cabal-install can't bootstrap automatically currently when hackage.haskell.org is unreachable (either because it's down or because of routing/firewalling issues), even though one if its mirrors may be reachable without issue.

To this end, I propose the following simple best-effort fallback scheme:

When bootstrapping hackage-security, and the configured repository url ${URL} (e.g. hackage.haskell.org) is not reachable, a DNS TXT lookup on _mirrors.${URL} shall be attempted looking for RFC1464-compliant entries of mirror urls with the keys ${IDX}.urlbase (where ${IDX} is a non-negative integer), and attempt to bootstrap from each of those mirrors urls (in the order of their ${IDX} value) until one succeeds (and giving up when all urls have been tried).

For implementing a prototype, I've created such a DNS RR:

$ dig _mirrors.hackage.haskell.org TXT

; <<>> DiG 9.10.3-P4-Ubuntu <<>> _mirrors.hackage.haskell.org TXT
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62373
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1500
;; QUESTION SECTION:
;_mirrors.hackage.haskell.org.  IN  TXT

;; ANSWER SECTION:
_mirrors.hackage.haskell.org. 300 IN    TXT "0.urlbase=http://hackage.fpcomplete.com/" "1.urlbase=http://objects-us-west-1.dream.io/hackage-mirror/"

;; Query time: 2 msec
;; SERVER: 69.20.0.164#53(69.20.0.164)
;; WHEN: Thu Oct 06 16:32:18 UTC 2016
;; MSG SIZE  rcvd: 170

Moreover, I've created a simple parser for parsing nslookup's output (which appears to be the common denominator tool which is provided by default on Windows, OSX, IBM AIX, and Linux systems) which I've tested on the platforms I had access to:

#! /usr/bin/env runghc

import Data.List
import Data.Char
import Control.Monad
import System.Environment
import System.Process (readProcess)
import Text.Read

-- | Parse output of @nslookup -query=TXT $HOSTNAME@ tolerantly
parseNsLookupTxt :: String -> Maybe [(String,[String])]
parseNsLookupTxt = go0 [] []
  where
    -- approximate grammar:
    -- <entries> := { <entry> }
    -- (<entry> starts at begin of line, but may span multiple lines)
    -- <entry> := ^ <hostname> TAB "text =" { <qstring> }
    -- <qstring> := string enclosed by '"'s ('\' and '"' are \-escaped)

    -- scan for ^ <word> <TAB> "text ="
    go0 []  _  []                                = Nothing
    go0 res _  []                                = Just (reverse res)
    go0 res _  ('\n':xs)                         = go0 res [] xs
    go0 res lw ('\t':'t':'e':'x':'t':' ':'=':xs) = go1 res (reverse lw) [] (dropWhile isSpace xs)
    go0 res lw (x:xs)                            = go0 res (x:lw) xs

    -- collect at least one <qstring>
    go1 res lw qs ('"':xs) = case qstr "" xs of
      Just (s, xs') -> go1 res lw (s:qs) (dropWhile isSpace xs')
      Nothing       -> Nothing -- bad quoting
    go1 res lw [] _  = Nothing -- missing qstring
    go1 res lw qs xs = go0 ((lw,reverse qs):res) [] xs

    qstr acc ('\n':_) = Nothing -- We don't support unquoted LFs
    qstr acc ('\\':'\\':cs) = qstr ('\\':acc) cs
    qstr acc ('\\':'"':cs)  = qstr ('"':acc) cs
    qstr acc ('"':cs) = Just (reverse acc, cs)
    qstr acc (c:cs)   = qstr (c:acc) cs
    qstr _   []       = Nothing

mirrorsDnsName :: String
mirrorsDnsName = "_mirrors.hackage.haskell.org"

extractMirrors :: String -> [String]
extractMirrors s0 = map snd $ sort vals
  where
    vals = [ (kn,v) | (h,ents) <- maybe [] id $ parseNsLookupTxt s0
                    , h == mirrorsDnsName
                    , e <- ents
                    , Just (k,v) <- [splitRfc1464 e]
                    , Just kn <- [isUrlBase k]
                    ]

    isUrlBase :: String -> Maybe Int
    isUrlBase s
      | isSuffixOf ".urlbase" s, not (null ns), all isDigit ns = readMaybe ns
      | otherwise = Nothing
      where
        ns = take (length s - 8) s

splitRfc1464 :: String -> Maybe (String,String)
splitRfc1464 = go ""
  where
    go _ [] = Nothing
    go acc ('`':c:cs) = go (c:acc) cs
    go acc ('=':cs)   = go2 (reverse acc) "" cs
    go acc (c:cs)
      | isSpace c = go acc cs
      | otherwise = go (c:acc) cs

    go2 k acc [] = Just (k,reverse acc)
    go2 k acc ['`'] = Nothing
    go2 k acc ('`':c:cs) = go2 k (c:acc) cs
    go2 k acc (c:cs) = go2 k (c:acc) cs

main :: IO ()
main = do
    fns <- getArgs

    if null fns
    then do
      output <- readProcess "nslookup" ["-query=TXT", mirrorsDnsName] ""
      print (extractMirrors output)
    else do
      forM_ fns $ \fn -> do
        output <- readFile fn
        print (fn,extractMirrors output)

    return ()

Its output is simply

["http://hackage.fpcomplete.com/","http://objects-us-west-1.dream.io/hackage-mirror/"]
@hvr
Copy link
Member Author

hvr commented Oct 7, 2016

after a short conversation with @dcoutts the conclusion is that I'm going to integrate this into cabal-install real-soon-now(tm), no changes in hackage-security needed for now

hvr added a commit to hvr/cabal that referenced this issue Oct 7, 2016
This way `cabal` can bootstrap secure repos even if the primary Hackage
instance is currently unreachable, as long as there's at least one
reachable and working secure mirror available.

NB: This new code-path is only used for the initial bootstrap. Once the
repository cache has been bootstrapped, its @mirrors.json@ meta-data is
used instead.

See also haskell/hackage-security#171
hvr added a commit to hvr/cabal that referenced this issue Oct 8, 2016
This way `cabal` can bootstrap secure repos even if the primary Hackage
instance is currently unreachable, as long as there's at least one
reachable and working secure mirror available.

NB: This new code-path is only used for the initial bootstrap. Once the
repository cache has been bootstrapped, its @mirrors.json@ meta-data is
used instead.

See also haskell/hackage-security#171
hvr added a commit to hvr/cabal that referenced this issue Oct 8, 2016
This way `cabal` can bootstrap secure repos even if the primary Hackage
instance is currently unreachable, as long as there's at least one
reachable and working secure mirror available.

NB: This new code-path is only used for the initial bootstrap. Once the
repository cache has been bootstrapped, its `mirrors.json` meta-data is
used instead.

See also haskell/hackage-security#171
hvr added a commit to hvr/cabal that referenced this issue Oct 8, 2016
This way `cabal` can bootstrap secure repos even if the primary Hackage
instance is currently unreachable, as long as there's at least one
reachable and working secure mirror available.

NB: This new code-path is only used for the initial bootstrap. Once the
repository cache has been bootstrapped, its `mirrors.json` meta-data is
used instead.

See also haskell/hackage-security#171
hvr added a commit to hvr/cabal that referenced this issue Oct 9, 2016
This way `cabal` can bootstrap secure repos even if the primary Hackage
instance is currently unreachable, as long as there's at least one
reachable and working secure mirror available.

NB: This new code-path is only used for the initial bootstrap. Once the
repository cache has been bootstrapped, its `mirrors.json` meta-data is
used instead.

See also haskell/hackage-security#171
hvr added a commit to hvr/cabal that referenced this issue Oct 9, 2016
This way `cabal` can bootstrap secure repos even if the primary Hackage
instance is currently unreachable, as long as there's at least one
reachable and working secure mirror available.

NB: This new code-path is only used for the initial bootstrap. Once the
repository cache has been bootstrapped, its `mirrors.json` meta-data is
used instead.

See also haskell/hackage-security#171
hvr added a commit to hvr/cabal that referenced this issue Oct 10, 2016
This way `cabal` can bootstrap secure repos even if the primary Hackage
instance is currently unreachable, as long as there's at least one
reachable and working secure mirror available.

NB: This new code-path is only used for the initial bootstrap. Once the
repository cache has been bootstrapped, its `mirrors.json` meta-data is
used instead.

See also haskell/hackage-security#171
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant