Explaining IBM Watson NLP Prediction Outcomes using IBM Watson OpenScale

4 min readApr 29, 2023

The Set..

IBM Watson Natural Language Processing is a suite of powerful APIs that helps organizations to extract insights from unstructured data sources, such as text documents, emails, product reviews, etc. Insights as in, Language detection, Sentiment analysis, Entity recognition, Keyword extraction, Relationship extraction etc.

For example, if Shakespeare were to provide a hotel review where he stayed, then it would be something like this (well, as per ChatGPT) -

Hark, fair Grand Hotel! A wondrous abode,
Whose staff so kind did bear a heavy load.
Our spacious chamber, with views so divine,
Did make us feel as if in Heaven’s shrine.
Facilities great, with food to delight,
Highly recommended, though bath was slight.

Using IBM Watson NLP, the sentiment analysis of this hotel review would be a positive sentiment.

Then the obvious question would be, what words contributed to this positive sentiment? How do we ensure the quality and performance of the model over a period of time? How does these models fit in the overall AI Governance lifecycle?

This is where IBM Watson OpenScale comes in — if one were to find the words that are contributing to this sentiment prediction, then one can use IBM Watson OpenScale, to get the explainability of this sentiment analysis by knowing those words (like Spacious, Fair, Grand, ..) that are contributing positively and negatively to this sentiment.

Explaining the Watson NLP Sentiment Analysis Outcomes using IBM Watson OpenScale

Now, let’s dig into how to establish the configuration bridge between IBM Watson OpenScale and IBM Watson NLP for monitoring the sentiment outcomes, and evaluate one such explanation of the sentiment with IBM Watson OpenScale.

The Configuration..

IBM Watson NLP provides a series pre-trained models, accessible as part of a python library, for processing on unstructured text to perform language processing tasks. Some examples include Sentiment, Entity extraction, Tone classification, Text classification, etc, as explained here.

As an example, this article does sentiment analysis on the incoming text using the Sentiment and Syntax Models.

The code that does this work is then wrapped as a python function deployed. Like this ..

def detect_sentiment():
    import watson_nlp
    import watson_nlp.data_model as dm
    from watson_nlp.toolkit import bert_utils, fileio
    from watson_nlp.toolkit import sentiment_analysis_utils as utils

    syntax_model = watson_nlp.load('syntax_izumo_en_stock')
    sentiment_model = watson_nlp.load('sentiment_sentence-bert_multi_stock')    

    def construct_predictions_response(document_sentiment):
        label = document_sentiment['label']
        positive_cumulative = 0
        neutral_cumulative = 0
        negative_cumulative = 0
        sentiment_mentions = document_sentiment['sentiment_mentions']
        for sentiment_mention in sentiment_mentions:
            sentimentprob = sentiment_mention['sentimentprob']
            positive_cumulative = positive_cumulative + sentimentprob['positive']
            neutral_cumulative = neutral_cumulative + sentimentprob['neutral']
            negative_cumulative = negative_cumulative + sentimentprob['negative']
        positive_prob = positive_cumulative / len(sentiment_mentions)
        neutral_prob = neutral_cumulative / len(sentiment_mentions)
        negative_prob = negative_cumulative / len(sentiment_mentions)

        sentimentprob = [positive_prob, neutral_prob, negative_prob]
        predictions_response = [label, sentimentprob]
        return predictions_response    

    def sentiment_prediction(review_text):
        syntax_result = syntax_model.run(review_text, parsers=('token', 'lemma', 'part_of_speech'))
        sentence_sentiments = sentiment_model.run_batch(syntax_result.get_sentence_texts(), syntax_result.sentences)
        document_sentiment = utils.predict_document_sentiment(sentence_sentiments, 
                                                        sentiment_model.class_idxs).to_dict()
        return document_sentiment
    
    def score(input):
        prediction_values = []
        values = input["input_data"][0]["values"]
        for value in values:
            review_text = value[0]
            document_sentiment = sentiment_prediction(review_text)
            predictions_response = construct_predictions_response(document_sentiment)
            prediction_values.append(predictions_response)

        scoring_response = {}
        fields = ["prediction", "probability"]        
        scoring_response['predictions'] = [
            {
                "fields" : fields,
                "values" : prediction_values
            }
        ]        
        return scoring_response    
    return score

The Configuration — OpenScale and NLP Models

This python function is then deployed to IBM Watson Machine Learning. And once it is deployed, then it is a matter to few clicks/lines of code to configure the python function deployment with OpenScale for monitoring.

As part of the monitoring configuration, we did the configuration for Explainability and Quality monitors.

The workflow depicted above is all coded here in this notebook.

The Evaluations..

In order to perform the evaluation, one needs to log the text and its predicted sentiment with OpenScale and run the explainability. For evaluating the Quality of the model, one needs to log the predicted sentiment and the ground truth sentiment with OpenScale Feedback Logging, and once done, can evaluate the Quality monitor to get the outcome as below.

In Summary

The notebook here does the following:

Develops the code to accept a set of texts and uses the Sentiment and Syntax models from Watson NLP
Build a python function and deploys it to Watson Machine Learning
Monitors the deployed python function with OpenScale.

Resources

Notebook to build, deploy and monitor the NLP Model — Monitoring
IBM Natural Language Processing Library
IBM Watson OpenScale and SDK
IBM Watson Machine Learning and SDK

That’s all for now!