This class has a getAttribute method.
Assume that a DOMNode object $ref contained an anchor taken out of a DOMNode List. Then
$url = $ref->getAttribute('href');
would isolate the url associated with the href part of the anchor.
La classe DOMNode
(PHP 4 >= 4.1.0)
Synopsis de la classe
Propriétés
- nodeName
-
Retourne le nom, le plus précis, pour le type de noeud courant
- nodeValue
-
La valeur du noeud, suivant son type
- nodeType
-
Récupère le type du noeud. Une des constantes XML_xxx_NODE
- parentNode
-
Le parent de ce noeud
- childNodes
-
Un DOMNodeList qui contient tous les fils de ce noeud. S'il n'y a aucun fils, ce sera un DOMNodeList vide.
- firstChild
-
Le premier fils de ce noeud. S'il n'y a aucun noeud de ce type, retourne
NULL. - lastChild
-
Le dernier fils de ce noeud. S'il n'y a aucun noeud de ce type, retourne
NULL. - previousSibling
-
Le noeud précédant immédiatement ce noeud. S'il n'y a aucun noeud, retourne
NULL. - nextSibling
-
Le noeud suivant immédiatement ce noeud. S'il n'y a aucun noeud, retourne
NULL. - attributes
-
Un DOMNamedNodeMap contenant les attributs de ce noeud (si c'est unDOMElement) ou
NULLsinon. - ownerDocument
-
L'objet DOMDocument associé avec ce noeud.
- namespaceURI
-
L'espace de nom de l'URL pour ce noeud, ou
NULLs'il n'est pas spécifié. - prefix
-
Le préfixe de l'espace de nom de ce noeud, ou
NULLs'il n'est pas spécifié. - localName
-
Retourne la partie locale du nom qualifié du noeud.
- baseURI
-
La base de l'URL absolue du noeud, ou
NULLsi l'implémentation n'a pas réussi à obtenir l'URL absolue. - textContent
-
Cet attribut retourne le contenu texte de ce noeud et de ces descendants.
Notes
Note:
L'extension DOM utilise l'encodage UTF-8. Utilisez utf8_encode() et utf8_decode() pour traiter les textes encodés en ISO-8859-1 ou Iconv pour les autres encodages.
Sommaire
- DOMNode::appendChild — Ajoute un nouveau fils à la fin des fils
- DOMNode::C14N — Canonise des noeuds en une chaîne
- DOMNode::C14NFile — Canonise des noeuds en fichier
- DOMNode::cloneNode — Clone un noeud
- DOMNode::getLineNo — Lit le numéro de ligne d'un noeud
- DOMNode::getNodePath — Récupère un XPath pour un nœud
- DOMNode::hasAttributes — Vérifie si le noeud possède un attribut
- DOMNode::hasChildNodes — Vérifie si le noeud possède des enfants
- DOMNode::insertBefore — Ajoute un nouveau fils à la fin des enfants
- DOMNode::isDefaultNamespace — Vérifie si l'espace de nom spécifié est l'espace de noms par défaut ou non
- DOMNode::isSameNode — Indique si deux noeuds sont identiques
- DOMNode::isSupported — Vérifie si la fonctionnalité est disponible pour la version spécifiée
- DOMNode::lookupNamespaceURI — Retourne l'URI de l'espace de noms selon le préfixe
- DOMNode::lookupPrefix — Retourne le préfixe de l'espace de noms selon l'URI de l'espace de noms
- DOMNode::normalize — Normalise le noeud
- DOMNode::removeChild — Supprime un fils de la liste des enfants
- DOMNode::replaceChild — Remplace un fils
You cannot simply overwrite $textContent, to replace the text content of a DOMNode, as the missing readonly flag suggests. Instead you have to do something like this:
<?php
$node->removeChild($node->firstChild);
$node->appendChild(new DOMText('new text content'));
?>
This example shows what happens:
<?php
$doc = DOMDocument::loadXML('<node>old content</node>');
$node = $doc->getElementsByTagName('node')->item(0);
echo "Content 1: ".$node->textContent."\n";
$node->textContent = 'new content';
echo "Content 2: ".$node->textContent."\n";
$newText = new DOMText('new content');
$node->appendChild($newText);
echo "Content 3: ".$node->textContent."\n";
$node->removeChild($node->firstChild);
$node->appendChild($newText);
echo "Content 4: ".$node->textContent."\n";
?>
The output is:
Content 1: old content // starting content
Content 2: old content // trying to replace overwriting $node->textContent
Content 3: old contentnew content // simply appending the new text node
Content 4: new content // removing firstchild before appending the new text node
If you want to have a CDATA section, use this:
<?php
$doc = DOMDocument::loadXML('<node>old content</node>');
$node = $doc->getElementsByTagName('node')->item(0);
$node->removeChild($node->firstChild);
$newText = $doc->createCDATASection('new cdata content');
$node->appendChild($newText);
echo "Content withCDATA: ".$doc->saveXML($node)."\n";
?>
Just discovered that node->nodeValue strips out all the tags
For a reference with more information about the XML DOM node types, see http://www.w3schools.com/dom/dom_nodetype.asp
(When using PHP DOMNode, these constants need to be prefaced with "XML_")
For clarification:
The assumingly 'discoverd' by previous posters and seemingly undocumented methods (.getElementsByTagName and .getAttribute) on this class (DOMNode) are in fact methods of the class DOMElement, which inherits from DOMNode.
See: http://www.php.net/manual/en/class.domelement.php
It took me forever to find a mapping for the XML_*_NODE constants. So I thought, it'd be handy to paste it here:
1 XML_ELEMENT_NODE
2 XML_ATTRIBUTE_NODE
3 XML_TEXT_NODE
4 XML_CDATA_SECTION_NODE
5 XML_ENTITY_REFERENCE_NODE
6 XML_ENTITY_NODE
7 XML_PROCESSING_INSTRUCTION_NODE
8 XML_COMMENT_NODE
9 XML_DOCUMENT_NODE
10 XML_DOCUMENT_TYPE_NODE
11 XML_DOCUMENT_FRAGMENT_NODE
12 XML_NOTATION_NODE
And apparently also a setAttribute method too:
$node->setAttribute( 'attrName' , 'value' );
The issues around mixed content took me some experimentation to remember, so I thought I'd add this note to save others time.
When your markup is something like: <div><p>First text.</p><ul><li><p>First bullet</p></li></ul></div>, you'll get XML_ELEMENT_NODEs that are quite regular. The <div> has children <p> and <ul> and the nodeValue for both <p>s yields the text you expect.
But when your markup is more like <p>This is <b>bold</b> and this is <i>italic</i>.</p>, you realize that the nodeValue for XML_ELEMENT_NODEs is not reliable. In this case, you need to look at the <p>'s child nodes. For this example, the <p> has children: #text, <b>, #text, <i>, #text.
In this example, the nodeValue of <b> and <i> is the same as their #text children. But you could have markup like: <p>This <b>is bold and <i>bold italic</i></b>, you see?</p>. In this case, you need to look at the children of <b>, which will be #text, <i>, because the nodeValue of <b> will not be sufficient.
XML_TEXT_NODEs have no children and are always named '#text'. Depending on how whitespace is handled, your tree may have "empty" #text nodes as children of <body> and elsewhere.
Attributes are nodes, but I had forgotten that they are not in the tree expressed by childNodes. Walking the full tree using childNodes will not visit any attribute nodes.
This class apparently also has a getElementsByTagName method.
I was able to confirm this by evaluating the output from DOMNodeList->item() against various tests with the is_a() function.
Try canonicalization:
<?php
$dom = new DOMDocument;
$dom->loadHTMLFile('http://www.example.com/');
echo $dom->documentElement->C14N();
?>
Or output it to a file, using C14NFile()
Undocumented stuff ;)
If you have empty $node->textContent and $node->textValue, check if document that is loaded have UTF-8 encoding.
getAttribute() returns an empty string if the requested attribute doesn't exist in the node.
