Wednesday, March 9, 2011

C++ Console Application to Get Comments from a Microsoft Word File

Update 2019-02-19:  Many thanks to Brian Thomas @ Kutana Software for helping getting this run. The C++ project is on Github at https://github.com/travelmarx/travelmarx-blog/tree/master/ExtractComments.

Update 2019-02-11:  Based on a comment received (see below), we reviewed this code and found a number of problems.

  1. That the presentation of the code below is bad having been escaped poorly when inserted.
  2. The code itself (when corrected for escape problems) when put into VS 2017 gave a number of syntax errors.
  3. The status of OPC has changed such that getting the C++ project to build in VS wasn't obvious. Our trouble was around getting the right System.IO.Packaging and WindowsBase references - we think. Some of the original SDK Samples for Windows 7 are in Github here.
What to do? First, here is the code on Github that was intended. This addressed problems 1 and 2. We made some slight changes to the code to remove syntax errors (using SysAllocString in several spots) as well as changing stdafx.h to pch.h. The name of the precompiled library isn't fixed, just that for C++ console apps in VS 2017, pch.h was used.

For problem 3, we are working on a possible solution (see update above). If you  are interested in running extracting comments using .NET (C# and VB), see How to: Retrieve comments from a word processing document (Open XML SDK).

Begin Initial Post

Output from Comment Extraction

The goal of this post is to show how to construct a C++ console application that will extract comments from a Word document. This post builds on a previous post which showed extracting comments from a Microsoft Word document (2007 or greater). In the previous post, Getting Comments from a Microsoft Word File: Leveraging the OPC Format, we did the extraction by changing the extension of the Word document and accessing the files directly in the ZIP structure. In this post, we take the Word document as is and use a console application written in C++/COM and leveraging the OPC API to directly access the comments. The code shown here was run in Visual Studio 2010 on Windows 7.

The key to the console application logic is to understand the document parts of the Word XML format. When we crack open the Word ZIP file we could get the comments file directly. Using the API we have to follow the pattern set out in the API. The pattern for a Word document is discussed here on MSDN and here. The main document part (../word/document.xml) is the main part in the package and that the comments part (../word/comments.xml) has a relationship to the main document part that can be used to obtain the coments. On our first try, we kept trying to get the comments part directly from the package relationships which didn't work. However, once we got the document part from the package (see the FindPartByRelationshipType method in the program below), we then could use the same logic to get the comments part from the document part.

A crucial part of the console application are the definitions of content types and relationship types of parts to parts. These definitions are defined in the header file (ExtractComments.h) for this application. For example, the content type of the comments part is:

application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml

The relationship of the comments part to the document part:

http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments

Note: In this console application we did not deal with the fact that comments in a Word document can contain more than just text. In the previous post we did deal with hyperlinks as example of content besides text in comments. These improvements to this code would need to be added here. Specifically, if you look at the ECMA-376 part1 for the docx format, you can find the details of what a comment can contain and it includes charts, diagrams, hyperlinks, images, video, and embedded content.

The code shown here was build starting from the SDK samples provides with the OPC SDK Samples for Windows 7. In particular we started from the SetAuthor project inside of the AllOPCSamples.zip. We changed the SetAuthor program to suit our purpose here. The console application takes a file name as an argument. In Visual Studio, set the file name under the configuration properties of the project as shown below.

Visual Studio Console App Configuration

The code is shown below and as well as links for downloading it. Before getting to the code here is a sketch of the pseudo-logic of the code. We use the syntax of (x,y) -> z to mean x and y are used to return z. A bit simplistic, but helps clarify what is coming in and what is going out.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
//pseudo-code
wmain
    COM Initilization of Thread
    CoCreateInstance of Opc Factory : () -> factory
    Load Package : (factory, fileName) -> package
    Find Document Part in Package : (package) -> documentPart
    Find Comments Part in Package : (package, documentPart) -> commentsPart
    Print Core Properties (package) -> output
    Print Comments (commentsPart) -> output
 
Load Package
(factory, fileName) -> package
    Create Stream on File : (factory, fileName, options) -> sourceFileStream
    Read Package from Stream : (factory, sourceFileStram, options) -> package
 
Find Document In Package
(package) -> documentPart
    relationshipType = http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument
    contentType = application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml
    Find Part by Relationship Type : (package, NULL, relationshipType, contentType) -> documentPart
 
Find Core Properties Part
(package) -> documentPart
    relationshipType = http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties
    contentType = application/vnd.openxmlformats-package.core-properties+xml
    Find Part by Relationship Type : (package, NULL, relationshipType, contentType) -> documentPart
 
Find Comments in Package
(package, documentPart) -> commentsPart
    relationshipType = http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments
    contentType = application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml
    Find Part by Relationship Type* : (package, documentPart, relationshipType, contentType) -> commentsPart
 
Find Part By Relationship Type
(package, parentPart, relationshipType, contentType) -> part
    Get Part Set : (package) -> partSet
    Get Relationship Set
    if (parentPart == NULL) then (package) -> packageRels
    else (parentPart) -> packageRels
    Get Enumerator for Type : (packageRels, relationshipType) -> packeRelsEnum
    Get Current : (packageRelsEnum) -> currentRel
    Resolve Target Uri to Part : (currentRel) -> partUri
    Part Exists : (partSet, partUri) -> partExists
    if (partExists) {
        Get Current Part : (partSet, partUri) -> currentPart
        Get Current Part Content Type : (currentPart) -> currentContentType
        if (currentContentType equals contentType)
        { // found the part }
    }
 
Resolve Target URI to Part
(relationship) -> resolvedUri
 
Print Comments
(commentsPart) -> output
    Get DOM from Part : (commentsParts, namespace) -> commentsDom
    Select Nodes : (commentsDom) -> commentsNodeList
    for each {
        Get Attributes of Comment Node
          Get Text of Comment Node
    }
 
Get Text of Comment Node
(node) -> output
 
Get Attributes of Comment Node
(node) -> output
 
Print Core Properties
(package) -> output
    Find Core Properties : (package) -> corePropertiesPart
    Get DOM from Part : (corePropertiesPart, namespace) -> corePropertiesDom
    Select Single Node : (corePropertiesDom, nodeName) -> nodeFound
    // work with nodeFound
 
Get DOM from Part
(part, namespace) -> XmlDocument


The header file for the console application can be downloaded here and is shown below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include "msopc.h"
#include "msxml6.h"
#include "stdafx.h"
 
HRESULT LoadPackage(IOpcFactory *factory, LPCWSTR packageName, IOpcPackage **outPackage);
HRESULT FindDocumentInPackage(IOpcPackage *package, IOpcPart  **documentPart);
HRESULT FindCommentsInPackage(IOpcPackage *package, IOpcPart  *parentPart, IOpcPart  **documentPart);
HRESULT FindPartByRelationshipType(IOpcPackage *package, IOpcPart *parentPart, LPCWSTR relationshipType, LPCWSTR contentType, IOpcPart **part);
HRESULT ResolveTargetUriToPart(IOpcRelationship *relativeUri, IOpcPartUri **resolvedUri);
HRESULT PrintCoreProperties(IOpcPackage *package);
HRESULT PrintComments(IOpcPart *part);
HRESULT GetAttributesOfCommentNode(IXMLDOMNode *node);
HRESULT GetTextofCommentNode(IXMLDOMNode *node);
HRESULT FindCorePropertiesPart(IOpcPackage *package, IOpcPart **part);
HRESULT DOMFromPart(IOpcPart *part, LPCWSTR selectionNamespaces, IXMLDOMDocument2 **document);
 
static const WCHAR g_officeDocumentRelationshipType[] =
static const WCHAR g_wordProcessingContentType[] =
    L"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml";
static const WCHAR g_corePropertiesRelationshipType[] =
static const WCHAR g_corePropertiesContentType[] =
    L"application/vnd.openxmlformats-package.core-properties+xml";
static const WCHAR g_commentsRelationshipType[] =
static const WCHAR g_commentsContentType[] =
 L"application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml";
static const WCHAR g_corePropertiesSelectionNamespaces[] =
    L"xmlns:dc='http://purl.org/dc/elements/1.1/' "
    L"xmlns:dcterms='http://purl.org/dc/terms/' "
    L"xmlns:dcmitype='http://purl.org/dc/dcmitype/' "
static const WCHAR g_commentsSelectionNamespaces[] =
    L"xmlns:o='urn:schemas-microsoft-com:office:office' "
 L"xmlns:v='urn:schemas-microsoft-com:vml' "
 L"xmlns:w10='urn:schemas-microsoft-com:office:word' "


The main code file for the console application can be download here and is shown below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
// ExtractComments.cpp : Defines the entry point for the console application.
 
#include "ExtractComments.h"
#include "stdio.h"
#include "windows.h"
#include "shlobj.h"
#include <iostream>
#include "util.h"
using namespace std;
 
int wmain(int argc, wchar_t* argv[])
{
 if (argc != 2)
 {
  wprintf(L"Usage: ExtractComments.exe <filename>\n");
  exit(0);
 }
 wprintf(L"Starting.\n");
 LPCWSTR pFileName = argv[1];
 HRESULT hr = CoInitializeEx(NULL, COINIT_MULTITHREADED);
 
 if (SUCCEEDED(hr))
 {
  IOpcPackage * package = NULL;
  IOpcPart * documentPart = NULL;
  IOpcFactory * factory = NULL;
  hr = CoCreateInstance(
      __uuidof(OpcFactory),
      NULL,
      CLSCTX_INPROC_SERVER,
      __uuidof(IOpcFactory),
      (LPVOID*)&factory
      );
  if (SUCCEEDED(hr))
  {
   wprintf(L"Created factory.\n");
   hr = ::LoadPackage(factory, pFileName, &package);
   // See command arguments in project properties for specification of file to read.
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Loaded package.\n");
   hr = ::FindDocumentInPackage(package, &documentPart);
 
  }
  IOpcPart *commentsPart;
  if (SUCCEEDED(hr))
  {
   wprintf(L"Found document in package.\n");
   hr = ::FindCommentsInPackage(package, documentPart, &commentsPart);
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Found comments in package.\n");
   hr = ::PrintCoreProperties(package);
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Found core properties in package.\n");
   hr = ::PrintComments(commentsPart);
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Found comments in package.\n");
  }
 
  // Release resources
  if (package)
  {
   package->Release();
   package = NULL;
  }
 
  if (documentPart)
  {
   documentPart->Release();
   documentPart = NULL;
  }
 
  if (factory)
  {
   factory->Release();
   factory = NULL;
  }
  CoUninitialize();
 }
 return 0;
}
 
HRESULT LoadPackage(
 IOpcFactory *factory,
 LPCWSTR packageName,
 IOpcPackage **outPackage)
{
 IStream * sourceFileStream = NULL;
 HRESULT hr = factory->CreateStreamOnFile(
     packageName,
     OPC_STREAM_IO_READ,
     NULL,
     0,
     &sourceFileStream);
 if (SUCCEEDED(hr))
 {
  hr = factory->ReadPackageFromStream(
     sourceFileStream,
     OPC_CACHE_ON_ACCESS,
     outPackage);
 }
 if (sourceFileStream)
 {
  sourceFileStream ->Release();
  sourceFileStream = NULL;
 }
 return hr;
}
HRESULT FindDocumentInPackage(
 IOpcPackage *package,
 IOpcPart   **documentPart)
{
  return ::FindPartByRelationshipType(
  package,
  NULL,
  g_officeDocumentRelationshipType,
  g_wordProcessingContentType,
  documentPart);
 
}
HRESULT FindCommentsInPackage(
 IOpcPackage *package,
 IOpcPart   *documentPart,
 IOpcPart   **commentsPart)
{
  return ::FindPartByRelationshipType(
  package,
  documentPart,
  g_commentsRelationshipType,
  g_commentsContentType,
  commentsPart);
 
}
HRESULT FindCorePropertiesPart(
  IOpcPackage * package,
  IOpcPart **part)
{
 return ::FindPartByRelationshipType(
    package,
    NULL,
    g_corePropertiesRelationshipType,
          g_corePropertiesContentType,
    part);
}
HRESULT FindPartByRelationshipType(
 IOpcPackage *package,
 IOpcPart *parentPart,
 LPCWSTR relationshipType,
 LPCWSTR contentType,
 IOpcPart **part)
{
 *part = NULL;
 IOpcRelationshipSet * packageRels = NULL;
 IOpcRelationshipEnumerator * packageRelsEnum = NULL;
 IOpcPartSet * partSet = NULL;
 BOOL hasNext = false;
 
 HRESULT hr = package->GetPartSet(&partSet);
 
 if (SUCCEEDED(hr))
 {
  if (parentPart == NULL)
  {
   hr = package->GetRelationshipSet(&packageRels);
  }
  else
  {
   hr = parentPart->GetRelationshipSet(&packageRels);
  }
 }
 if (SUCCEEDED(hr))
 {
  hr = packageRels->GetEnumeratorForType(
    relationshipType,
    &packageRelsEnum);
 }
 if (SUCCEEDED(hr))
 {
  hr = packageRelsEnum->MoveNext(&hasNext);
 }
 while (SUCCEEDED(hr) && hasNext && *part == NULL)
 {
  IOpcPartUri * partUri = NULL;
  IOpcRelationship * currentRel = NULL;
  BOOL partExists = FALSE;
 
  hr = packageRelsEnum->GetCurrent(¤tRel);
  if (SUCCEEDED(hr))
  {
   hr = ::ResolveTargetUriToPart(currentRel, &partUri);
  }
  if (SUCCEEDED(hr))
  {
   hr = partSet->PartExists(partUri, &partExists);
  }
  if (SUCCEEDED(hr) && partExists)
  {
   LPWSTR currentContentType = NULL;
   IOpcPart * currentPart = NULL;
   hr = partSet->GetPart(partUri, ¤tPart);
   IOpcPartUri * name = NULL;
   currentPart->GetName(&name);
   BSTR displayUri = NULL;
   name->GetDisplayUri(&displayUri);
   wprintf(L"currentPart: %s\n", displayUri);
   if (SUCCEEDED(hr) && contentType != NULL)
   {
    hr = currentPart->GetContentType(¤tContentType);
    wprintf(L"contentType: %s\n", currentContentType);
    if (SUCCEEDED(hr) && 0 == wcscmp(contentType, currentContentType))
    {
     *part = currentPart;  // found what we are looking for
     currentPart = NULL;
    }
   }
   if (SUCCEEDED(hr) && contentType == NULL)
   {
    *part = currentPart;
    currentPart = NULL;
   }
   CoTaskMemFree(static_cast<lpvoid>(currentContentType));
   if (currentPart)
   {
    currentPart->Release();
    currentPart = NULL;
   }
  }
  if (SUCCEEDED(hr))
  {
   hr = packageRelsEnum->MoveNext(&hasNext);
  }
  if (partUri)
        {
            partUri->Release();
            partUri = NULL;
        }
 
        if (currentRel)
        {
            currentRel->Release();
            currentRel = NULL;
        }
 }
     if (SUCCEEDED(hr) && *part == NULL)
    {
        // Loop complete without errors and no part found.
        hr = E_FAIL;
    }
 
    // Release resources
    if (packageRels)
    {
        packageRels->Release();
        packageRels = NULL;
    }
 
    if (packageRelsEnum)
    {
        packageRelsEnum->Release();
        packageRelsEnum = NULL;
    }
 
    if (partSet)
    {
        partSet->Release();
        partSet = NULL;
    }
 return hr;
}
HRESULT ResolveTargetUriToPart(
 IOpcRelationship *relationship,
 IOpcPartUri **resolvedUri
 )
{
 IOpcUri * sourceUri = NULL;
 IUri * targetUri = NULL;
 OPC_URI_TARGET_MODE targetMode;
 HRESULT hr = relationship->GetTargetMode(&targetMode);
 if (SUCCEEDED(hr) && targetMode != OPC_URI_TARGET_MODE_INTERNAL)
 {
  return E_FAIL;
 }
 if (SUCCEEDED(hr))
 {
  hr = relationship->GetTargetUri(&targetUri);
 }
 if (SUCCEEDED(hr))
 {
  hr = relationship->GetSourceUri(&sourceUri);
 }
 if (SUCCEEDED(hr))
 {
  hr = sourceUri->CombinePartUri(targetUri, resolvedUri);
 }
 if (sourceUri)
 {
  sourceUri->Release();
  sourceUri = NULL;
 }
 if (targetUri)
 {
  targetUri->Release();
  targetUri = NULL;
 }
 return hr;
}
HRESULT PrintComments(
 IOpcPart *commentsPart)
{
 IXMLDOMDocument2 * commentsDom = NULL;
 
 HRESULT hr = ::DOMFromPart(
    commentsPart,
    g_commentsSelectionNamespaces,
    &commentsDom);
 if (SUCCEEDED(hr))
 {
  IXMLDOMNodeList * commentsNodeList = NULL;
  BSTR text = NULL;
  hr = commentsDom->selectNodes(
   L"//w:comment",
   &commentsNodeList);
  if (SUCCEEDED(hr) && commentsNodeList != NULL)
  {
   // Iterate through comment nodes
   long nodeListLength = NULL;
   hr = commentsNodeList->get_length(&nodeListLength);
 
   for (int i = 0; i < item =" NULL;" hr =" commentsNodeList-">get_item(i, &item);
    SUCCEEDED(hr) ? 0 : throw hr;
 
    ::GetAttributesOfCommentNode(item);
    ::GetTextofCommentNode(item);
   }
 
  }
  // Release resources
        if (commentsNodeList)
        {
            commentsNodeList->Release();
            commentsNodeList = NULL;
        }
 }
 // Release resources
    if (commentsPart)
    {
        commentsPart->Release();
        commentsPart = NULL;
    }
 
    if (commentsDom)
    {
        commentsDom->Release();
        commentsDom  = NULL;
    }
 
 return hr;
}
HRESULT GetTextofCommentNode(
 IXMLDOMNode *node
 )
{
 BSTR bstrQueryString1 = ::SysAllocString(L"w:p");
 BSTR bstrQueryString2 = ::SysAllocString(L"w:r");
 BSTR commentText = NULL;
 IXMLDOMNodeList *resultList1 = NULL;
 IXMLDOMNodeList *resultList2 = NULL;
 IXMLDOMNode *pNode, *rNode = NULL;
 
 long resultLength1, resultLength2;
 
 HRESULT hr = node->selectNodes(bstrQueryString1, &resultList1);
 SUCCEEDED(hr) ? 0 : throw hr;
 hr = resultList1->get_length(&resultLength1);
 if (SUCCEEDED(hr))
 {
  resultList1->reset();
  for (int i = 0; i <>get_item(i, &pNode);
   if (pNode)
   {
    //wprintf(L"--Found a w:p node.\n");
    wprintf(L"\n");
    pNode->selectNodes(bstrQueryString2, &resultList2);
    SUCCEEDED(hr) ? 0 : throw hr;
    hr = resultList2->get_length(&resultLength2);
    if (SUCCEEDED(hr))
    {
     resultList2->reset();
     for (int j = 0; j <>get_item(j, &rNode);
      if (rNode)
      {
       rNode->get_text(&commentText);
       //wprintf(L"----Found a w:r node. \n");
       wprintf(commentText);
      }
     }
    }
 
   }
  }
 }
 
 ::SysFreeString(bstrQueryString1);  ::SysFreeString(bstrQueryString2);
 bstrQueryString1 = NULL;            bstrQueryString2 = NULL;
 resultList1->Release();    resultList2->Release();
 resultList1 = NULL;     resultList2 = NULL;
 pNode->Release();     rNode->Release();
 pNode = NULL;      rNode = NULL;
 return hr;
}
HRESULT GetAttributesOfCommentNode(
 IXMLDOMNode *node
 )
{
 VARIANT commentAuthorStr, commentDateStr;
 BSTR bstrAttributeAuthor = ::SysAllocString(L"w:author");
 BSTR bstrAttributeDate = ::SysAllocString(L"w:date");
 
    // Get author and date attribute of the item.
 IXMLDOMNamedNodeMap *attribs = NULL;
    IXMLDOMNode *AttrNode = NULL;
 HRESULT hr = node->get_attributes(&attribs);
 if (SUCCEEDED(hr) && attribs)
 {
  attribs->getNamedItem(bstrAttributeAuthor, &AttrNode);
  if (SUCCEEDED(hr) && AttrNode)
  {
   AttrNode->get_nodeValue(&commentAuthorStr);
  }
  AttrNode->Release();
  AttrNode = NULL;
  attribs->getNamedItem(bstrAttributeDate, &AttrNode);
  if (SUCCEEDED(hr) && AttrNode)
  {
   AttrNode->get_nodeValue(&commentDateStr);
  }
  AttrNode->Release();
  AttrNode = NULL;
 }
 attribs->Release();
 attribs = NULL;
 
 wprintf(L"\n-------------------------------------------------");
 wprintf(L"\nComment::\nAuthor: %s, Date: %s\n", commentAuthorStr.bstrVal, commentDateStr.bstrVal);
 
 ::SysFreeString(bstrAttributeAuthor); ::SysFreeString(bstrAttributeDate);
 bstrAttributeAuthor = NULL;    bstrAttributeDate = NULL;
 
 return hr;
}
HRESULT PrintCoreProperties(
 IOpcPackage *package)
{
 IOpcPart * corePropertiesPart = NULL;
 IXMLDOMDocument2 * corePropertiesDom = NULL;
 
 HRESULT hr = ::FindCorePropertiesPart(
     package,
     &corePropertiesPart);
 if (SUCCEEDED(hr))
 {
  hr = ::DOMFromPart(
    corePropertiesPart,
    g_corePropertiesSelectionNamespaces,
    &corePropertiesDom);
 }
 if (SUCCEEDED(hr))
 {
  IXMLDOMNode * creatorNode = NULL;
  BSTR text = NULL;
  hr = corePropertiesDom->selectSingleNode(
    L"//dc:creator",
    &creatorNode);
  if (SUCCEEDED(hr) && creatorNode != NULL)
  {
   hr = creatorNode->get_text(&text);
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Author: %s\n", (text != NULL) ? text : L"[missing author info]");
  }
  // Release resources
        if (creatorNode)
        {
            creatorNode->Release();
            creatorNode = NULL;
        }
 
        SysFreeString(text);
 
  // put other code here to read other properties
 }
 // Release resources
    if (corePropertiesPart)
    {
        corePropertiesPart->Release();
        corePropertiesPart = NULL;
    }
 
    if (corePropertiesDom)
    {
        corePropertiesDom->Release();
        corePropertiesDom  = NULL;
    }
 return hr;
}
 
HRESULT DOMFromPart(
 IOpcPart * part,
 LPCWSTR selectionNamespaces,
 IXMLDOMDocument2 **document)
{
 IXMLDOMDocument2 * partContentXmlDocument = NULL;
 IStream * partContentStream = NULL;
 
 HRESULT hr = CoCreateInstance(
     __uuidof(DOMDocument60),
     NULL,
     CLSCTX_INPROC_SERVER,
     __uuidof(IXMLDOMDocument2),
     (LPVOID*)&partContentXmlDocument);
 if (SUCCEEDED(hr) && selectionNamespaces)
 {
  AutoVariant v;
  hr = v.SetBSTRValue(L"XPath");
  if (SUCCEEDED(hr))
  {
   hr = partContentXmlDocument->setProperty(L"SelectionLanguage", v);
  }
  if (SUCCEEDED(hr))
  {
   AutoVariant v;
   hr = v.SetBSTRValue(selectionNamespaces);
   if (SUCCEEDED(hr))
   {
    hr = partContentXmlDocument->setProperty(L"SelectionNamespaces", v);
   }
  }
 }
 if (SUCCEEDED(hr))
 {
  hr = part->GetContentStream(&partContentStream);
 }
 if (SUCCEEDED(hr))
 {
  VARIANT_BOOL isSuccessful = VARIANT_FALSE;
  AutoVariant vStream;
  vStream.SetObjectValue(partContentStream);
  hr = partContentXmlDocument->load(vStream, &isSuccessful);
  if (SUCCEEDED(hr) && isSuccessful == VARIANT_FALSE)
  {
   hr = E_FAIL;
  }
 }
 if (SUCCEEDED(hr))
 {
  *document = partContentXmlDocument;
  partContentXmlDocument = NULL;
 }
 // Release resources
    if (partContentXmlDocument)
    {
        partContentXmlDocument->Release();
        partContentXmlDocument = NULL;
    }
 
    if (partContentStream)
    {
        partContentStream->Release();
        partContentStream = NULL;
    }
 return hr;
}</lpvoid></filename></iostream>

5 comments:

  1. Thanks for the sample code. I had to copy what was on screen as the download link doesn't work any more. I got loads of compilation errors. In particular, at line 337 there's code that looks like this (which makes no sense):

    for (int i = 0; i < item =" NULL;" hr =" commentsNodeList-">get_item(i, &item);
    SUCCEEDED(hr) ? 0 : throw hr;

    ::GetAttributesOfCommentNode(item);
    ::GetTextofCommentNode(item);
    }

    But I think it should look like this instead:

    for (int i = 0; i < nodeListLength; ++i)
    {
    IXMLDOMNode *item = NULL;
    hr = commentsNodeList -> get_item(i, &item);
    SUCCEEDED(hr) ? 0 : throw hr;

    ::GetAttributesOfCommentNode(item);
    ::GetTextofCommentNode(item);
    }

    What's happened is that when the correct code was pasted in everything between matching pairs of < and > characters was treated as html, and got removed. Does that sound likely?

    ReplyDelete
  2. You are right, something messed up with code. I found the original and put it into Visual Studio and with a few changes have not syntax errors. Now just trying to work through build issue - we'll update this post soon.

    ReplyDelete
  3. See update added at start of post.

    ReplyDelete
  4. Is issue 3 still a problem? I got the original code to build and run in VS2017 - after I'd fixed the escaping issue - I can supply a source code zip if you like...

    ReplyDelete
  5. VS2017 and Windows 10? I had problems building that stumped me; I missing something. You can send the zip to travelmarx at live dot com. Thanks!

    ReplyDelete

All comments go through a moderation process. Even though it may not look like the comment was accepted, it probably was. Check back in a day if you asked a question. Thanks!