Using XPath with Loadrunner

Reg Expressions and Loadrunner

One of the critiques that can be often heard from script developers, is that Loadrunner does not support regular expressions to extract data from HTML responses.

True, the substring-method offered by web_reg_save_param() does not offer too much flexibility, although it is a very easy and powerful way to extract parts of a http response.

Unfortunately there are cases, when data extraction is not that trivial, e.g. the context of the data is also relevant and should be considered.

For such occassions the use of XPath can come handy. Luckily, Loadrunner supports XML processing, so if nothing helps, then the use of lr_xml_get_values() could be considered.

The below code is a test action created for an opensource webshop (OpenCart) I usually install for trainings. The script extracts the product detail link of the first featured product using XPath.

There are several peculiarities that should be considered:

  • The XML parser in Loadrunner is strict. HTML data is not necessary a well-formed XML. Poor HTML markup with e.g. missing XML closures can quickly ruin our efforts.
  • HTML escapes are interpreted as XML entities. E.g. OpenCart contains a copyright character in the footer (formed as ©). This leads to the following error message. Therefore it is desired to remove such notations before parsing.
    Action.c(33): Error: Entity 'copy' was not found (line 215, col <a href="http://commentgagnercasino.com/">Casino En Ligne</a> 100)
    Action.c(33): Error: "lr_xml_get_values" execution failed
    
  • We may experience issues with UTF-8 input. Therefore I turned off conversion of responses to/from UTF-8
    Action.c(33): Error: An exception occurred! Type:UTFDataFormatException, Message:invalid byte 1 (ˆ) of a 1-byte sequence. (line 1, col 1)
    Action.c(33): Error: "lr_xml_get_values" execution failed
    <pre>

    Disabling UTF-8 conversion in Loadrunner Virtual User Generator
  • Parsing big input files can affect performance (and Load generator capacity). If possible only a part of the HTML should be passed for XPath evaluation. Here, for simplicity, I extracted the body of the HTML document.
  • To spare script debuging time, I recommed to use some tool to proove that the planned XPath selector is correct. Recommended plugins for Firefox are FirePath for Firebug and Xpather.

Hopefully this technique will help Loadrunner script authors to write better semantical analysis (validation, parameter value extraction).

Action()

{

  char *content;

  char *p;

  web_set_max_html_param_len("1024000");  // set comfortable max size for parameters

  web_reg_save_param("htmlbody", "LB=<body>", "RB=</body>", LAST); // extract HTML body

  web_url("opencart",
    "URL=http://sutapp.mydomain.com/opencart/",
    "TargetFrame=",
    "Resource=0",
    "RecContentType=text/html",
    "Referer=",
    "Snapshot=t2.inf",
    "Mode=HTML",
    LAST);

  content = lr_eval_string("<body>{htmlbody}</body>"); // put the <body> tags back
  p = (char*) strstr(content, "&copy;");  // remove footer's &copy; entity problem
  if (p) {
    *p = ' ';
  }

  lr_save_string(content, "htmlbody");

  lr_xml_get_values("XML={htmlbody}",  // extract an URL from the HTML body with XPath
    "ValueParam=productUrl",
    "Query=//*/div[@class=\"box-product\"]/div/div[@class=\"name\"][email protected]",
    LAST);

  lr_log_message("The extracted url: %s", lr_eval_string("{productUrl}"));

  return 0;
}

3 thoughts on “Using XPath with Loadrunner

  1. with LR11 we now have

    web_reg_save_param_ex() – which replaces the deprecated web_reg_save_param() function

    web_reg_save_param_regexp() – adding regular expressions

    and

    web_reg_save_param_xpath() – works nicely with xml

  2. Hi,
    I am trying to capture an array of values using lr_xml_get_values but i am not getting the values. Can someone help me how to write the xpath to capture an array of values and also let me know how can we get the count of the values in an array. Please do the needful ASAP its very urgnt plssssssssssssss…………………

Leave a Reply

Your email address will not be published. Required fields are marked *