Skip to content
Bryan Robbins edited this page Jan 17, 2015 · 2 revisions

User Scenario

I am a user of getDataGen.com. I would like to generate data for the testing a web form with 3 fields:

  • Account Number
  • Account Type
  • Account Balance

My experience should be something like this:

  • I tell the system that I need three variables to be generated.
  • I tell the system the types of values I would like to see for each variable:
    • Account Number
      • Valid 9-digit sequences of digits
      • Invalid sequences of length less than 9
      • Blank sequences
      • Sequences with non-digits
    • Account Type
      • Valid value "R" for regular accounts
      • Valid value "S" for special accounts
      • Invalid value of a single character other than R or S
      • Blank value
      • Sequences of characters with length longer than 1
    • Account Balance
      • Valid account balances, which are positive floating point numbers with two digits behind the decimal
      • Negative account balances, which are negative floating point values
      • Blank value
      • Invalid sequences including alphabetic characters
      • Invalid sequences including less than 2 digits behind the decimal
  • I tell the system that I would like to get one data set per unique combination of variable value category.
  • I click "GO"
  • I receive my data as a text file.
  • I export my variable descriptions as a text file.

Requirements

  • There are three phases to using this UI: Configuration, Generation, and Acquisition.

    • During Configuration, the user describes the constraints of her data.
    • During Generation, the hosted environment generates data.
    • During Acquisition, the user acquires the generated data.
  • Configuration

    • During configuration, the user describes their desired output data in terms of Variables, Equivalence Classes, and Generation Technique.
    • Possible variable values are defined by one or more equivalence classes. Following from existing theory in software testing, all values from the same equivalence class are considered to be equal for purposes of generating output data.
    • Equivalence classes are defined by a Template and its Parameters.
    • The system shall provide, at a minimum, the following templates (with parameter lists in parentheses below):
      • Literal(Value)
      • RegularExpression(Expression)
          • Custom regular expressions will only be able to generate strings of length 200 or less. If the limit is exceeded, generation fails and stops immediately (to avoid issues with long-running or even infinite expression evaluation).
      • DigitSequence(Length)
    • The system shall allow the user to choose from the following possible generation strategies:
      • All Combinations
      • Pairwise Combinations
    • The system shall allow the user to specify a maximum number of lines to be generated (even though restricting output will prevent coverage goals from being achieved).
    • At any point while using the system, the user should be able to acquire via download a portable, textual representation of the current configuration.
    • Once configuration is complete, the system shall allow the user to indicate (e.g., via a button) that the Generation phases should be triggered.
  • Generation

    • By default, the system shall select values from each equivalence class at random during generation.
    • The output of generation shall be a List of Data Sets.
    • A single Data Set is a set of (variable, value) pairs, one pair per variable defined. A Data Set can also be represented as a Row, with one value per column. In this form, the order of columns (with one variable per column) must be pre-defined.
    • "All Combinations" generation should produce one output Data Set for every unique combination of equivalence classes across variables. For example, consider a variable A with equivalence classes A1, A2; variable B with B1, B2; and variable C with C1, C2. All Combinations generation over these variables and their equivalence classes produces 8 unique data sets: (A1, B1, C1), (A1, B1, C2), (A1, B2, C1), (A1, B2, C2), (A2, B1, C1), (A2, B1, C2), (A2, B2, C1), and (A2, B2, C2).
    • "All Pairs" generation should produce one output Data Set for every pair of equivalence classes across all variables. For the same scenario as above, there are 12 pairs to be covered by All Pairs generation: (A1, B1), (A1, B2), (A2, B1), (A2, B2), (A1, C1), (A1, C2), (A2, C1), (A2, C2), (C1, B1), (C1, B2), (C2, B1), (C2, B2). However, we can take advantage of the fact that a single data set covers three of these pairs. This leads to an All Pairs generation output such as: (A1, B1, C1), (A1, B2, C2), (A2, B1, C2), (A2, B2, C1).
    • In the hosted version of the tool, data generation will be limited to 1 million rows per use.
  • Acquisition

    • After generation is complete, the resulting data should be made available to the user via a downloadable file.
Clone this wiki locally