[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[],[],null,["# Discovery client data collection and security\n\nThis document addresses concerns and questions about installing the\nMigration Center discovery client in data centers. It emphasizes the\nimportance of security, compliance, and performance when discovering and\ncollecting data from customer IT assets in highly regulated environments.\n\nHow data collection is performed\n--------------------------------\n\nThe discovery client uses several methods to collect data from target\nmachines. The data collected varies depending on the method.\nAt guest level, data is collected using the collection scripts;\nat the hypervisor level, data is collected using the underlying platform APIs.\n\n### Discovery client service and process\n\nThe discovery client runs as a service called `GoogleMCDC` under a process\ncalled `mcdc_service.exe`.\n\n### Collection scripts\n\nAll the guest-level collection methods used by the discovery client run\ncollection scripts on the target machines. You can review the actual scripts\nused for collection at the following links:\n\n- Linux\n - [Full collection](https://storage.googleapis.com/mcdc-release/current/mcdc-linux-collect.sh)\n - [Performance collection](https://storage.googleapis.com/mcdc-release/current/mcdc-linux-collect-performnace.sh)\n- Windows\n - [Full collection](https://storage.googleapis.com/mcdc-release/current/mcdc-windows-collect.ps1)\n - [Performance collection](https://storage.googleapis.com/mcdc-release/current/mcdc-windows-collect-performance.ps1)\n\nThe collection scripts store the results in an archive file (zip or tar)\nwhich the discovery client then retrieves.\n\n### Collection mechanisms\n\nThe discovery client may use one or more of the collection mechanisms\ndescribed in the following sections to collect data from the target machines.\n\n#### SSH (Linux)\n\nDuring SSH collection, the following process occurs:\n\n1. An SSH session is initiated between the collector machine and the target server.\n2. A temporary directory is created under `~/.mcdc-temp/`.\n3. The collection script is copied to that directory.\n4. The collection script is executed.\n5. The result archive is fetched using SCP.\n6. The temporary directory is cleaned up.\n\n#### WMI (Windows)\n\nDuring WMI collection on Windows, the following process occurs:\n\n1. A WMI connection is initiated to the target machine.\n2. A temporary (volatile) registry key is created on the target machine under `HKLM:\\SOFTWARE\\Google\\Collector\\data`.\n3. The collection script is copied to the registry key.\n4. A temporary directory is created under `C:\\temp`.\n5. The collection script is written to the temporary directory.\n6. The collection script is executed.\n7. The result of the collection is written to the volatile registry key.\n8. The result is copied to the collector machine.\n\n#### VMware Guest Tools (Linux and Windows)\n\nDuring VMware collection for both Linux and Windows, the following process\noccurs:\n\n1. A temporary directory is created using VMware guest tools.\n2. The collection script is copied to that directory.\n3. The collection script is executed.\n4. The result archive is fetched using VMware guest tools.\n5. The temporary directory is cleaned up.\n\n### Periodic data collection\n\nThe discovery client collects data from all configured servers in a periodic\nmanner. There are two types of collection:\n\n- **Full collection:** Runs once a day for each server. This collection executes the full collection script that collects various information on the VM such as the hardware, environment, installed software, running processes, and more.\n- **Performance collection:** Runs every 10 minutes on each server. This collection executes the performance collection script that collects data on the CPU, memory, network and disk utilization.\n\nWhat data is collected\n----------------------\n\nThe collection scripts collect data about target VMs to understand how they are\nconfigured and what resources they use. This helps in assessing and planning\ntheir migration to the cloud.\n\nThe following list describes the data that is collected:\n\n- **System information** : The basic information that is crucial for determining the VM's size, performance requirements, and dependencies on specific hardware or drivers. It includes:\n - Operating system (version and release)\n - Hardware (CPU, memory, BIOS details)\n - Network configuration (network interfaces, IP addresses, routing tables)\n - Storage (disk drives, partitions, mount points)\n- **Installed software and services** : The scripts collect a list of installed packages and running services to understand the VM's software stack and its role. It includes:\n - Web servers (Apache, Tomcat, JBoss)\n - Databases (evidence of SQL Server is collected in the Windows script)\n - Other applications that might require specific configurations during migration.\n- **Application configurations**: The scripts also gather configuration files for web servers (IIS, Apache, Tomcat, JBoss, Wordpress). This helps in understanding the specific settings and dependencies of these applications, which is vital for ensuring a smooth transition to the cloud environment.\n- **VMWare and cloud environment detection**: Both the Linux and Windows scripts attempt to detect if the VM is already running in a cloud environment (AWS or Google Cloud), or in a VCenter cluster. They do this by making requests to the metadata servers of these cloud providers. If the VM is already in the cloud, the scripts collect relevant metadata suc as instance ID, instance type, and other details.\n- **Performance metrics:** The performance collection scripts measure resource utilization. This includes the following:\n - CPU\n - Memory\n - I/O operations\n - Networking\n- **Network connections:** The scripts collect open connections to help create a picture of the different dependencies on network resources.\n\nPerformance impact on target machines\n-------------------------------------\n\n### Resource utilization assessment\n\nThe resource utilization of the collection scripts on the target machine\ndepends on parameters such as the number of processes running, the number of\napplications deployed, the number of active network connections, and others.\n\nOn Windows, the collection script runs using the lowest priority available\nthrough the Threading API.\nOn Linux, a `nice` value of 5 is used to minimize interference with production\nworkloads, and ensure that they have higher priority over the collection script.\n\nA typical collection might take 5-20 seconds of high single-core CPU usage\non an unloaded machine. It might take longer if other workloads are present,\nbecause these workloads have higher priority.\n\n### Mitigation strategies\n\nThe discovery client provides a mechanism to prevent collection of specific\nservers during specific hours. This feature can be used to prevent the\ncollection from servers running critical workloads during peak hours.\n\nSecurity considerations\n-----------------------\n\n### Authentication and authorization\n\n#### Communication with target machines\n\n- The discovery client uses secure channels to authenticate and communicate with target machines. This includes SSH, WMI, VMware tools, and VCenter connections. The discovery client uses the built-in security measures as part of these protocols.\n- In SSH, the discovery client allows both username-password and key-based authentication. For a full list of the supported types of key pairs, see [Target asset requirements](/migration-center/docs/target-assets-requirements?version=v6#linux_machines).\n\n#### Communication with Google Cloud\n\n- Registered discovery clients communicate with Google Cloud Migration Center during their normal operation. The communication happens through a service account with the `roles/migrationcenter.discoveryClient` role binding. The service account is either created automatically, or provided by the user during the registration process.\n- The service account private key is encrypted on the discovery client machine using the encryption mechanism described in the following section.\n- All communication to Google Cloud is authenticated using this service account, and encrypted using SSL/TLS.\n\n### Data encryption\n\n- **In transit:** all discovery client communication channels use encryption to protect data in transit. This includes communication with the target machines using the different protocols (SSH/WMI), and communication with Google Cloud using HTTPS.\n- **At rest:** the discovery client PII, SPII and secrets are all encrypted at rest using the `AES128_GCM` algorithm and using the Windows DPAPI to securely store the encryption keys.\n\n### Intrusion detection and prevention\n\nAs discovery client is used to connect and run scripts on many\nVMs in your organization, it may trigger EDR or xDR alerts. This is highly\ndependent on the way your security tools are configured and the specific tools\nyou are using. Be aware and consider creating exemptions for the specific alerts\nand devices.\n\nLogging and supportability\n--------------------------\n\nThe discovery client collects logs during its operation to allow for\ndebugging and support. The discovery client logs are collected using two\nmechanisms:\n\n- **Local logs:** The logs are written to file under `C:\\ProgramData\\Google\\mcdc\\logs`. The log files are rotated and compressed.\n- **Cloud logs:** Registered clients also send the logs to Google Cloud so they can be used by the Google Cloud support team when customer issues are reported."]]