Contourner un Xpath dynamique

Bonjour la communauté,

J’ai une problématique sur le scraping de ce site : https://global-industrie.com/fr/liste-des-exposants

J’ai tenté de scraper le site via Octoparse mais je bloque au niveau d’entrer à l’intérieur d’une page. Je pense avoir compris le problème. Le Xpath relatif semble être dynamique.

Quelqu’un a-t-il déjà eu cette problématique ou faut-il passer par du Python ?

Merci d’avance pour vos retours.

Bonjour,

Je te conseille cet homme @yarek il scrape tout ce qu’il trouve en javascript. Une vraie machine et sans acharnement :blush:

2 « J'aime »

Tu peux passer par les requêtes du genre :

curl "https://api.swapcard.com/graphql" -X POST -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0" -H "Accept: */*" -H "Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3" -H "Accept-Encoding: gzip, deflate, br" -H "Referer: https://global-industrie.com/" -H "content-type: application/json" -H "x-client-version: 2.1.35" -H "x-client-origin: app.swapcard.com" -H "x-client-platform: SDK" -H "X-Content-Language: fr_FR" -H "Origin: https://global-industrie.com" -H "DNT: 1" -H "Connection: keep-alive" -H "Sec-Fetch-Dest: empty" -H "Sec-Fetch-Mode: cors" -H "Sec-Fetch-Site: cross-site" -H "TE: trailers" --data-raw "^[{""operationName"":""EventExhibitorListViewConnectionQuery"",""variables"":{""withEvent"":true,""viewId"":""RXZlbnRWaWV3XzQyMDU2OQ=="",""eventId"":""RXZlbnRfMTAwODAwOQ=="",""endCursor"":""WyJhbXRlIl0=""},""extensions"":{""persistedQuery"":{""version"":1,""sha256Hash"":""b7d34d371267414bb97bcb6561e50248d030523308a211765fe3f69f4b506569""}}}^]"

et boucler dessus (1772 exposants).

Avec l’id indiqué, tu fais ensuite la requête :

curl "https://api.swapcard.com/graphql" -X POST -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0" -H "Accept: */*" -H "Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3" -H "Accept-Encoding: gzip, deflate, br" -H "Referer: https://global-industrie.com/" -H "content-type: application/json" -H "x-client-version: 2.1.35" -H "x-client-origin: app.swapcard.com" -H "x-client-platform: SDK" -H "X-Content-Language: fr_FR" -H "Origin: https://global-industrie.com" -H "DNT: 1" -H "Connection: keep-alive" -H "Sec-Fetch-Dest: empty" -H "Sec-Fetch-Mode: cors" -H "Sec-Fetch-Site: cross-site" -H "TE: trailers" --data-raw "^[{""operationName"":""EventExhibitorDetailsViewQuery"",""variables"":{""withEvent"":true,""skipMeetings"":true,""exhibitorId"":""RXhoaWJpdG9yXzc3NjQxNA=="",""eventId"":""RXZlbnRfMTAwODAwOQ==""},""extensions"":{""persistedQuery"":{""version"":1,""sha256Hash"":""89f227d490052f65e58416ddfe79c84856de601e2543f6655c50412bb2ac8a5d""}}}^]"

Et tu obtiens, pour chaque exposant toutes les données; Par exemple :

[
	{
		"data": {
			"exhibitor": {
				"id": "RXhoaWJpdG9yXzc3NjQxNA==",
				"name": "1LIFE GROUP",
				"type": "Exposant",
				"logoUrl": "https://cdn-api.swapcard.com/public/images/db2b9d05babd43e8851c25e914ac36a9.png",
				"websiteUrl": "https://www.1life.fr",
				"email": "[email protected]",
				"exhibitorMemberRole": "ANONYMOUS",
				"description": "Expert digital pour l'industrie, 1Life Group est une société française spécialisée dans la gestion d'entreprise et accompagne les industriels français dans leur transformation numérique depuis plus de 15 ans.Notre vocation ? Libérer le potentiel de nos clients industriels français en leur proposant des solutions simples, flexibles et évolutives qui leurs permettront de booster leur performance.Grâce à des solutions collaboratives, nous libérons nos clients de leurs contraintes en leur permettant de se concentrer sur le cœur de leur business : la productivité !",
				"htmlDescription": "<p>Expert digital pour l'industrie, <a href=\"https://www.1lifegroup.fr/\"><strong>1Life Group</strong></a> est une société française spécialisée dans la gestion d'entreprise et accompagne les industriels français dans leur transformation numérique depuis plus de 15 ans.</p><p>Notre vocation ? <strong>Libérer le potentiel de nos clients</strong> industriels français en leur proposant des solutions simples, flexibles et évolutives qui leurs permettront de <u><strong>booster leur performance.</strong></u></p><p>Grâce à des solutions collaboratives, nous libérons nos clients de leurs contraintes en leur permettant de se concentrer sur le cœur de leur business : <u><strong>la productivité !</strong></u></p>",
				"backgroundImageUrl": null,
				"withEvent": {
					"canTalkTo": false,
					"fields": [
						{
							"__typename": "Core_SelectField",
							"id": "RmllbGREZWZpbml0aW9uXzM5NDk5MA==",
							"name": "Univers",
							"placeholder": null,
							"isEditable": false,
							"value": {
								"id": "RmllbGRWYWx1ZV8xNTkyNzk5OQ==",
								"text": "Smart : digitalisation, automatisation, mécatronique",
								"__typename": "Core_SelectFieldValue"
							},
							"section": {
								"id": "RmllbGREZWZpbml0aW9uU2VjdGlvbl8zMTEyMQ==",
								"name": "Informations",
								"__typename": "Core_FieldDefinitionSection"
							}
						},
						{
							"__typename": "Core_SelectField",
							"id": "RmllbGREZWZpbml0aW9uXzM5NDk5MQ==",
							"name": "Village / Pavillon",
							"placeholder": null,
							"isEditable": false,
							"value": null,
							"section": {
								"id": "RmllbGREZWZpbml0aW9uU2VjdGlvbl8zMTEyMQ==",
								"name": "Informations",
								"__typename": "Core_FieldDefinitionSection"
							}
						},
						{
							"__typename": "Core_SelectField",
							"id": "RmllbGREZWZpbml0aW9uXzM5NTI1MA==",
							"name": "Région",
							"placeholder": null,
							"isEditable": false,
							"value": {
								"id": "RmllbGRWYWx1ZV8xNTc2MzYzOQ==",
								"text": "Auvergne-Rhône-Alpes",
								"__typename": "Core_SelectFieldValue"
							},
							"section": {
								"id": "RmllbGREZWZpbml0aW9uU2VjdGlvbl8zMTEyMQ==",
								"name": "Informations",
								"__typename": "Core_FieldDefinitionSection"
							}
						},
						{
							"__typename": "Core_SelectField",
							"id": "RmllbGREZWZpbml0aW9uXzM5NTI0OQ==",
							"name": "Pays",
							"placeholder": null,
							"isEditable": false,
							"value": {
								"id": "RmllbGRWYWx1ZV8xNTc2MzU5OA==",
								"text": "FRANCE",
								"__typename": "Core_SelectFieldValue"
							},
							"section": {
								"id": "RmllbGREZWZpbml0aW9uU2VjdGlvbl8zMTEyMQ==",
								"name": "Informations",
								"__typename": "Core_FieldDefinitionSection"
							}
						},
						{
							"__typename": "Core_LongTextField",
							"id": "RmllbGREZWZpbml0aW9uXzM5Njg0Nw==",
							"name": "Description",
							"placeholder": null,
							"isEditable": true,
							"value": {
								"id": "RmllbGRWYWx1ZV8xNzI4MzcyMg==",
								"longText": "Editeur/Intégrateur français de logiciels ERP dédiés à l'industrie.",
								"__typename": "Core_LongTextFieldValue"
							},
							"section": {
								"id": "RmllbGREZWZpbml0aW9uU2VjdGlvbl8zMTEyMQ==",
								"name": "Informations",
								"__typename": "Core_FieldDefinitionSection"
							}
						},
						{
							"__typename": "Core_MultipleTextField",
							"id": "RmllbGREZWZpbml0aW9uXzM5NjkwNA==",
							"name": "Marques",
							"placeholder": null,
							"isEditable": true,
							"values": [
								{
									"text": "1life",
									"id": "RmllbGRWYWx1ZV8xNjA3MzM4MA==",
									"__typename": "Core_TextFieldValue"
								},
								{
									"text": "myfab",
									"id": "RmllbGRWYWx1ZV8xNjA3MzM4MQ==",
									"__typename": "Core_TextFieldValue"
								},
								{
									"text": "arpeje",
									"id": "RmllbGRWYWx1ZV8xNjA3MzM4Mg==",
									"__typename": "Core_TextFieldValue"
								},
								{
									"text": "1lifegroup",
									"id": "RmllbGRWYWx1ZV8xNjA3MzM4NA==",
									"__typename": "Core_TextFieldValue"
								}
							],
							"section": {
								"id": "RmllbGREZWZpbml0aW9uU2VjdGlvbl8zMTEyMQ==",
								"name": "Informations",
								"__typename": "Core_FieldDefinitionSection"
							}
						},
						{
							"__typename": "Core_MultipleSelectField",
							"id": "RmllbGREZWZpbml0aW9uXzQ2ODI5OA==",
							"name": "Objectif de participation",
							"placeholder": null,
							"isEditable": true,
							"values": [],
							"section": {
								"id": "RmllbGREZWZpbml0aW9uU2VjdGlvbl8zMTEyMQ==",
								"name": "Informations",
								"__typename": "Core_FieldDefinitionSection"
							}
						},
						{
							"__typename": "Core_TreeField",
							"id": "RmllbGREZWZpbml0aW9uXzQ2MzQzMA==",
							"name": "Activités",
							"placeholder": "",
							"isEditable": true,
							"values": [
								{
									"id": "RmllbGRWYWx1ZV8xNzI5MDAzNA==",
									"path": [
										{
											"text": "Conception, simulation",
											"__typename": "Core_TreePathTranslationValue"
										},
										{
											"text": "CAO, CFAO, design, PLM, MES, ERP",
											"__typename": "Core_TreePathTranslationValue"
										}
									],
									"__typename": "Core_TreeFieldValue"
								}
							],
							"section": {
								"id": "RmllbGREZWZpbml0aW9uU2VjdGlvbl8zMTc3NA==",
								"name": "Nomenclature",
								"__typename": "Core_FieldDefinitionSection"
							}
						}
					],
					"editableFields": {
						"name": false,
						"logoUrl": true,
						"description": true,
						"address": true,
						"websiteUrl": true,
						"type": false,
						"socialNetworks": true,
						"phoneNumbers": true,
						"booth": true,
						"email": true,
						"bannerImage": false,
						"bannerVideo": false,
						"advertisements": true,
						"backgroundImage": true,
						"__typename": "Core_EditableExhibitorFields"
					},
					"booths": [
						{
							"id": "TWVldGluZ1BsYWNlXzUyMzEyNQ==",
							"category": null,
							"name": "2K71",
							"__typename": "Core_Location"
						}
					],
					"__typename": "Core_ExhibitorWithEvent",
					"isBookmarked": null,
					"advertisements": [],
					"products": {
						"subcategories": {
							"categories": [],
							"__typename": "Core_EventProductCategoriesResult"
						},
						"__typename": "Core_EventProducts"
					},
					"similarExhibitors": []
				},
				"address": {
					"street": "7 RUE DU 35ÈME REGIMENT D'AVIATION",
					"zipCode": "69500",
					"place": null,
					"country": "FRANCE",
					"city": "BRON",
					"state": null,
					"__typename": "Core_Address"
				},
				"socialNetworks": [
					{
						"profile": "1life-group/",
						"type": "LINKEDIN",
						"__typename": "Core_SocialNetwork"
					},
					{
						"profile": "1LifeGroup",
						"type": "YOUTUBE",
						"__typename": "Core_SocialNetwork"
					}
				],
				"phoneNumbers": [
					{
						"type": "LANDLINE",
						"formattedNumber": "+33 (0)4.81.09.07.00",
						"__typename": "Core_PhoneNumber"
					},
					{
						"type": "MOBILE",
						"formattedNumber": "+330676522027",
						"__typename": "Core_PhoneNumber"
					}
				],
				"banner": {
					"imageUrl": null,
					"embeddedVideo": null,
					"__typename": "Core_Banner"
				},
				"__typename": "Core_Exhibitor",
				"documents": [
					{
						"id": "RG9jdW1lbnRfNTM0MzQ5",
						"name": "Plaquette de présentation myfab",
						"url": "https://cdn-api.swapcard.com/public/files/b4acfaf860ae42fcb6bbee6a7fe2e7ff.pdf",
						"description": "",
						"type": "DOCUMENT",
						"__typename": "Core_Document"
					}
				],
				"isBookmarked": null
			},
			"linkedExhibitors": {
				"pageInfo": {
					"hasNextPage": false,
					"endCursor": null,
					"__typename": "Core_PageInfo"
				},
				"totalCount": 0,
				"nodes": [],
				"__typename": "Core_ExhibitorsConnection"
			},
			"members": {
				"totalCount": 0,
				"nodes": [],
				"pageInfo": {
					"hasNextPage": false,
					"endCursor": null,
					"__typename": "Core_PageInfo"
				},
				"__typename": "Core_EventPeopleConnection"
			},
			"membersAsAdmin": {
				"totalCount": 0,
				"nodes": [],
				"pageInfo": {
					"hasNextPage": false,
					"endCursor": null,
					"__typename": "Core_PageInfo"
				},
				"__typename": "Core_ExhibitorMembersConnection"
			},
			"plannings": {
				"nodes": [],
				"pageInfo": {
					"totalEdges": 0,
					"hasNextPage": false,
					"nextCursor": null,
					"__typename": "Core_PageInfoType"
				},
				"__typename": "Core_EdgeListPlanning"
			},
			"onDemandPlannings": {
				"nodes": [],
				"pageInfo": {
					"totalEdges": 0,
					"hasNextPage": false,
					"nextCursor": null,
					"__typename": "Core_PageInfoType"
				},
				"__typename": "Core_EdgeListPlanning"
			},
			"productCategories": [
				{
					"totalLimit": 0,
					"productCount": 0,
					"category": {
						"id": "UHJvZHVjdENhdGVnb3J5Xzk4NzE4",
						"name": "Awards",
						"__typename": "Core_ProductCategory"
					},
					"__typename": "Core_ExhibitorRootProductCategory"
				},
				{
					"totalLimit": null,
					"productCount": 0,
					"category": {
						"id": "UHJvZHVjdENhdGVnb3J5Xzk4NzIz",
						"name": "Nouveautés",
						"__typename": "Core_ProductCategory"
					},
					"__typename": "Core_ExhibitorRootProductCategory"
				},
				{
					"totalLimit": null,
					"productCount": 0,
					"category": {
						"id": "UHJvZHVjdENhdGVnb3J5Xzk4OTE4",
						"name": "Machines en fonctionnement",
						"__typename": "Core_ProductCategory"
					},
					"__typename": "Core_ExhibitorRootProductCategory"
				},
				{
					"totalLimit": 1,
					"productCount": 0,
					"category": {
						"id": "UHJvZHVjdENhdGVnb3J5Xzk4OTE5",
						"name": "Catalogues",
						"__typename": "Core_ProductCategory"
					},
					"__typename": "Core_ExhibitorRootProductCategory"
				},
				{
					"totalLimit": null,
					"productCount": 0,
					"category": {
						"id": "UHJvZHVjdENhdGVnb3J5XzExNDU1Nw==",
						"name": "Nos exposants se mobilisent",
						"__typename": "Core_ProductCategory"
					},
					"__typename": "Core_ExhibitorRootProductCategory"
				}
			]
		}
	}
]
1 « J'aime »

Merci @Morph pour ton aide !

1 « J'aime »

Ce sujet a été automatiquement fermé après 365 jours. Aucune réponse n’est permise dorénavant.