Microsoft Copilot

Tested iOS App version 20240206.2.180000000 – Chose option to ‘Use GPT-4’ – March 2024 

Answer to Question 1 

Screenshot of Microsoft Copilot answer to Question 1 – When was the Battle of Hastings? – showing that it provides useful information, sources and an image. I found that the Explore button over the image didn’t actually work on this answer (or on others) and so took a point off for UX/UI.

CoPilot-Q1-answer

Accuracy Issues 

Like the other AI Chatbots, this one had an issue with the follow up question to question 6, which was – How many times has Francis Ford Coppola won the Best Director Oscar? incorrectly saying twice, like the others do, when it is in fact one.

There was also an accuracy issue for the follow up question for Question 9 – Who was the first manager of Tottenham Hotspur FC? – the actual answer provided was correct – Frank Brettell but the explanatory text was a bit garbled and wrong as seen in the screenshot. In this case, it says he left the club in 1899 but that he led Tottenham to win the Southern League in 1899-1900 and the FA Cup in 1900-01 – something which doesn’t make sense. 

CoPilot-Q9-follow-up-question-issue-highlighted

Microsoft Copilot scores were as follows:

  • Ease of Use / Usability / UX / UI – Score = 6
  • Fairly low score of 6 as quite a few issues with the UI – for answers with image at the top, there’s an Explore button which doesn’t work, its also sometimes difficult to see the whole answer as there are issues scrolling up and down within the answer content, plus the options in the answer – such as Copy, Share – sometimes don’t respond to tapping. 
  • Accuracy – Score = 8
  • Some points off for some errors in answers to some of the questions, plus one of the answers explanatory text was a bit garbled 
  • Response time – Score = 8
  • Often a little bit slow, so some points off
  • Error handling – Score = 10
  • Worked well with garbled questions and nonsense questions and garbage data
  • Source quotation/Provenance – Score = 8
  • Gave 3-5 sources where relevant and most of the ones I tapped were relevant and worked ok, but some had the UI issues mentioned above, so a couple of points off
  • Working with the response e.g. further options – Score = 7
  • Provides text, images, sources and share options for some of the answers, but the UX/UI issues mentioned above make these difficult to use/see 
  • Overall rating and score – Overall Score = 7.5
  • Good responses, although sometimes a little slow with them, provides sources and good text and descriptions where relevant. Some accuracy errors and some poor UX/UI issues knock the score down a fair bit. 
  • Ranking = 2
  • Overview – Better responses than some of the others, but some accuracy errors and UX/UI errors keep the score at 7.5

More info on Microsoft Copilot – main website

According to the Wikipedia page on Microsoft Copilot, it uses the Microsoft Prometheus model which, according to Microsoft, uses a component called the Orchestrator, Bing and OpenAI’s GPT-4.


All the posts

I have created a post for each AI Chatbot test in order to fully explore the test results, along with a Results post, where I crown the winning AI Chatbot. You can jump straight there or read each test post via the links below.

Testing the AI Chatbots

Testing the AI Chatbots: Questions

Testing the AI Chatbots: Perplexity AI

Testing the AI Chatbots: OpenAI ChatGPT

Testing the AI Chatbots: Microsoft Copilot

Testing the AI Chatbots: Google Gemini

Testing the AI Chatbots: Results


Note: The heading images used in these posts were created via Bing Image Creator 

About my testing services: iOS App Testing / Android App Testing / Website Testing / AI Chatbot Testing